Main

Understanding the evolutionary relationships among species is fundamental to biology, not only as an account of speciation events but also as the basis for comparative analyses of trait evolution. However, for deep phylogenetic relationships, different studies often show incongruence across analyses5,6. Large amounts of data may be required to resolve certain relationships yet others can remain recalcitrant even with genome-scale efforts, particularly for rapid radiations7,8. Phylogenomic incongruence can point to statistical and systematic errors but is also increasingly linked to complex biological processes that accompany rapid diversification9,10. Prime examples of this problem are the phylogenetic relationships among modern birds (Neornithes), which are inconsistently resolved even with large-scale datasets1,2,3,11. The widespread incongruences in evolutionary histories across avian genomes1,12,13 has left the phylogenetic relationships of major extant groups unclear and possibly irresolvable14.

Modern birds comprise three major groups: ratites and tinamous (Palaeognathae), landfowl and waterfowl (Galloanseres) and all other living birds (Neoaves). The early Neoaves experienced rapid diversification into at least ten major clades15, the so-called ‘magnificent seven’ and three ‘orphans’12, encompassing 95% of extant species and a significant portion of their phylogenetic diversity. Due to the short internal branches between these clades, their relationships remain contentious1,2,3,16. Furthermore, the timing of the radiation of these major groups is debated17,18. The ‘mass survival’ scenario places the radiation before the Cretaceous–Palaeogene (K–Pg) mass extinction (66.043 ± 0.011 million years ago (Ma)19), requiring survival of multiple neoavian lineages through the global changes caused by the Chicxulub impact11,17,20. The alternative ‘big bang’ scenario implies a rapid diversification of neoavian groups following the mass extinction, driven by adaptive radiation into new habitats and in the absence of competitors21. Fossil evidence supports the scenario of morphological diversification following the K–Pg event22. Several molecular studies also supported rapid divergences1,2,3, yet wide credible intervals allowed for the possibility that some of the earliest neoavian divergences predated the K–Pg boundary23. Uncertain placement of key taxa and a wide range of time estimates also persist within Passeriformes, the largest avian order with over 6,000 living species3,24.

Efforts to resolve high-level avian phylogeny face two major challenges. First, it is difficult to obtain large numbers of orthologous loci with suitable properties for phylogenetic analyses. Many studies have been limited to conserved genomic regions such as protein-coding sequence (exons) and ultraconserved elements (UCEs)2,25. Conserved regions exhibit complex patterns of sequence evolution: for example, selection to maintain protein structure and function places constraints on exon evolution12. Standard models of sequence evolution practical for large datasets exhibit poor fit to these regions, and model misspecifications probably result in topological discrepancies across data types1,12,13. Analysis of large numbers of loci does not remove, but can instead reinforce, biases introduced by model violations1,7. In principle, data types under lower selective pressure such as introns and intergenic regions are preferable; intergenic regions are arguably ideal because they are less probably under strong selection13. The second challenge is collecting genomic data from sufficient numbers of species, given that dense taxon sampling can improve phylogenetic estimation26,27. Thus, the debate in avian phylogenetics has revolved around the trade-off between using diverse loci extracted from entire genomes but for few species (one genome per taxonomic order)1 or using a smaller number of potentially biased loci sampled from more species2,3. Both approaches have shortcomings. The most compelling solution is also the most challenging: to create comprehensive datasets with whole genomes sampled across many taxa that inform on deeper timescales.

Here, as one of the main missions of the ‘family phase’ of the Bird 10K Genomes Project (B10K)28, we generated a phylogeny for modern birds by sampling across genome assemblies of 363 species representing 218 families (92% of the total)4 (Supplementary Data). We analysed nearly 100 billion nucleotides (around 275 Mb for each species; Extended Data Fig. 1a), an alignment 50 times the size of the largest available dataset of 48 species1 (Extended Data Fig. 1b). As our main data source, we used evenly spaced sampling of intergenic regions across 10 kb windows of a whole-genome alignment4 (Extended Data Fig. 1c). We found that selection of a 1 kb locus within the first 2 kb of each window balanced phylogenetic informativeness against the inclusion of recombination within loci (Extended Data Fig. 1d and Methods). This resulted in 94,402 loci of 1 kb from which we removed those that overlapped with exon and intron regions, resulting in a set of 63,430 purely intergenic loci (in total, 63.43 megabase pairs). In addition to analysis of this main set we tested the effect of various factors, including additional introns and exons, describe the major sources of phylogenetic incongruence and identify the remaining cases of uncertainty.

Intergenic regions resolve deep branches

Our main phylogenetic tree (called ‘main tree’) was obtained by analysis of the 63,430 intergenic loci within a coalescent-based framework (Fig. 1 and Extended Data Figs. 2 and 3). We focus on this tree because the findings reported below show that intergenic regions reduce systematic error due to model misspecifications—results that match a priori expectations and previous analyses12,29. The use of a coalescent-based method30,31 accounts for well-documented incomplete lineage sorting (ILS) in early Neoaves1,32. A concatenated analysis of the same 63,430 loci (Extended Data Fig. 4) resulted in a similar tree that differed in only ten of the 360 branches (2.8%). In these topologies, 98.1% of nodes had full statistical support (main tree, three nodes below 1.00 posterior probability; concatenation, seven nodes below 100% bootstrap support). Although our main topology differed from that of all previous studies, it was more similar to the genome-wide ‘TENT’ tree from ref. 1 of 48 species than to the main topology from ref. 2, which was based mostly on protein-coding genes of 198 species (Extended Data Fig. 5).

Fig. 1: Relationships and divergence times for 363 bird species based on 63,430 intergenic loci.
figure 1

a, Topology simplified to orders with higher clade names following ref. 50. Numbers on branches represent local posterior probability if below 1. b, Time tree of all species. Grey bars represent 95% credible intervals for age estimation; dots indicate nodes with fossil calibrations; asterisks mark the three branches lacking full support. A tree with tip labels is shown in Extended Data Figs. 2 and 3.

Within Neoaves we resolve four major clades (Fig. 1a), three of which are Mirandornithes (grebes and flamingos), Columbaves (Columbimorphae (doves, sandgrouse and mesites) and Otidimorphae (cuckoos, bustards and turacos)), in addition to Telluraves (higher landbirds including Afroaves and Australaves). The fourth major clade is new and phenotypically diverse, containing Aequornithes (pelicans, tubenoses, penguins and loons), Phaethontimorphae (kagu, sunbittern and tropicbirds), Strisores (nightbirds, swifts and hummingbirds), Opisthocomiformes (hoatzin) and Cursorimorphae (shorebirds and cranes). This clade was supported in coalescent-based analyses of intergenic regions and UCEs, but not by exons, introns or in concatenated analysis of intergenic regions (Fig. 3d and Extended Data Fig. 4). We name this clade Elementaves because its lineages have diversified into terrestrial, aquatic and aerial niches, corresponding to the classical elements of earth, water and air, and several Phaethontimorphae have names derived from the sun, representing fire.

Most Neoaves diversified post K–Pg

To time calibrate our main tree we empirically generated calibration densities for 34 nodes using 187 fossil occurrences (Supplementary Information) and applied these in a Bayesian sequential-subtree framework (Methods). We estimated branch lengths from intergenic regions and excluded loci that had evolved at the lowest and highest rates, and also those with the greatest rate variation across lineages. Our analysis produced age estimates with 95% credible intervals that were considerably narrower than previously achieved (Extended Data Fig. 6a). The widest credible intervals were observed for nodes positioned furthest from the calibration points, including the secondary calibrations involved in subtree dating. The prospects for narrowing these intervals are promising, through future refinement and the addition of fossil-based age constraints. In contrast to a recent study proposing a diversification of Neoaves during the Upper Cretaceous11, we found that the early divergences in Neoaves were tightly associated with the K–Pg boundary (Fig. 1b). Only two divergences occurred before the boundary: Mirandornithes diverged from the remaining Neoaves 67.4 Ma (95% credible interval 66.2–68.9) and Columbaves diverged 66.5 Ma (95% credible interval 65.2–67.9). All subsequent divergences postdate the boundary, although the 95% credible interval of the divergence time between Telluraves and Elementaves and the crown age of Elementaves spans the K–Pg boundary. This evolutionary timeline, wherein only a few neoavian lineages diverged before the K–Pg event, is reflected in all alternative dating analyses (Methods and Extended Data Fig. 6b–e), highlighting the robustness of our estimated chronology. This lends more support to a post-K–Pg diversification of Neoaves than previous studies, where the 95% credible interval of between ten and 18 of the nodes allowed for pre-K–Pg divergences1,2,18,23.

Abundant discordance among gene trees

Assessing the level of incongruence between gene trees (GTs) across the tree, order-level relationships ranged from showing little or no discordance to high levels of discordance (measured by the quartet score; Fig. 2a). The percentage of GT quartets matching a species-tree branch at the ordinal level ranged from 99.9 to 33.7% (close to one in three, which corresponds to a polytomy). In particular, 14 nodes had quartet support below 37%. These are the same nodes that have proved difficult to resolve in past studies15. For 29 out of 33 nodes, the quartet support of the main topology was significantly higher than both alternatives (one-sided χ2 test with Benjamini–Hochberg multiple test correction), consistent with expectations under ILS models. We discuss the remaining nodes (26, 39, 43 and 49 in Fig. 2a) below.

Fig. 2: Explaining difficult placements.
figure 2

a, Gene tree discordance across the backbone of the main tree. Node colours and numbers represent the bar plots of quartet frequencies for three possible resolutions around each branch. b, Uncertainty at the base of Elementaves. Phaethontimorphae + Aequornithes had high local posterior probability (LocalPP), but global bootstrap resampling (GlobalBS) showed support for an alternative placement. Violin plots (points for the species-poor Phaethontiformes) show higher root–tip distances of Phaethontiformes, and particularly for Eurypygiformes, than Aequornithes, which may cause attraction to the long-branched Telluraves. Further, the placement of Opisthocomiformes is the only branch where a null hypothesis (H0) of a polytomy cannot be refuted. c, Addition of taxa occasionally affects topology and support. Across 41,918 GTs with at least one species from each group, the alternative placement of Afroaves + Accipitriformes had higher quartet support when only a few species were sampled but declined with increasing taxon sampling (left), particularly of Passeriformes. The main topology dominated when 138 or more passerines were sampled (middle, arrow). Support for Telluraves + Elementaves decreased with increasing taxon sampling (right).

Source Data

Mirandornithes is sister to other Neoaves

The placement of Mirandornithes (also called Phoenicopterimorphae33) as the sister lineage to the remaining Neoaves was supported by both the main tree and concatenation. Although this topology was reported previously3 it differs from the TENT tree from ref. 1, which grouped Mirandornithes and Columbimorphae into a clade called Columbea. In our main tree, Columbimorphae combined with Otidimorphae to form Columbaves. This clade has also been reported previously, albeit with low bootstrap support2. Mirarab et al.34 showed that a 21 Mb outlier region of chromosome 4 with abnormally strong signal for Columbea (potentially due to the effects of ancient interchromosomal rearrangements) is responsible for the previous recovery of Columbea. However, with additional taxon sampling of Otidimorphae and Columbimorphae, the effect of this outlier region gradually lessened in favour of an increasingly dominant signal from the rest of the genome that placed Mirandornithes as the sister to other Neoaves (Extended Data Fig. 7a). In the concatenated analysis, Mirandornithes and Columbimorphae are successive sister groups to the remaining neoavian clades but with limited support (bootstrap = 64; Extended Data Fig. 4). Finally, when analysing exons, Mirandornithes were placed deeper in Neoaves as sister to Aequornithes + Phaethontiformes (Extended Data Fig. 4), which may relate to previous association with mostly aquatic birds in studies analysing large portions of coding regions (sister to Charadriiformes2, Opisthocomiformes + Aequornithes + Phaethontimorphae11).

There is a rapid succession of nodes in this part of the tree, with only 0.92 Ma between the divergence of Mirandornithes and of Columbaves from other groups. Within Columbaves, Otidimorphae has been found in some studies1,2 but not in others3,12. Within Otidimorphae we resolved Otidiformes as the sister group to Cuculiformes, like some studies12 but unlike several others1,2,3. The difficulty in this case could be explained by the very short branch (0.57 Ma) separating Otidiformes and other Otidimorphae. Similarly, Columbiformes diverged from the remaining Columbimorphae within 0.26 Ma. These fast divergences partially explain why previous analyses with fewer data led to conflicting resolutions of these earliest neoavian branches.

Waterbirds are deep in a diverse clade

Unlike previous hypotheses that placed Phaethoquornithes (Aequornithes + Phaethontimorphae) as sister to landbirds1,3, the main tree placed Phaethoquornithes deep inside the diverse Elementaves (Fig. 1a). The ‘orphans’ Charadriiformes and Gruiformes were consistently grouped together (forming Cursorimorphae), as found in some other studies1,3. The placement of the third orphan, Opisthocomiformes, as the sister to this group (with a short branch of 0.58 Ma) was the sole instance across the entire phylogeny with statistically indistinguishable levels of GT support for all three possible configurations around this branch35 (node 43 in Fig. 2b), a noteworthy finding given the extensive amount of available data.

Whereas the main tree placed Phaethontimorphae as the sister to Aequornithes, further investigations showed a competing placement as the sister lineage to Telluraves. Both topologies have previously been reported1,2,3,12, with their difference attributed to the effects of using protein-coding (Phaethontimorphae + Aequornithes) versus non-coding regions (Phaethontimorphae + Telluraves)15. We found instead that both topologies have support in the intergenic data. Whereas Phaethontimorphae + Aequornithes had a slightly better quartet score, it was recovered in only 60% of trees resulting from random subsampling of half of the 63,430 loci (Extended Data Fig. 7b). The two alternative positions of Phaethontimorphae, which are three branches (9.1 Ma) away, each had full local support (posterior probability = 1.0). Nevertheless, global bootstrap support estimated from resampling of GTs showed uncertainty in the three nodes connecting the two placements (global bootstrap = 42–62; Fig. 2b). Two hypotheses could explain this non-local uncertainty, the first being ancient hybridization between ancestral Phaethontimorphae and Telluraves 3.96 Ma after their divergence. Alternatively, the high support for the alternative placement could be due to problems arising from long branches. Phaethontimorphae have around 25% longer terminal branches than Aequornithes (paired t-test across loci, P < 2.2 × 10−16), showing greater similarity to Telluraves in this regard (Fig. 2b). Consistent with this explanation, topological changes resulted from data filtering that targeted long branches (clocklikeness, stemminess, total coverage and tree length; Extended Data Fig. 7c).

Our main tree placed Strisores (also called Caprimulgiformes33) with Phaethoquornithes with moderate support (posterior probability = 0.90; Fig. 1a), but the concatenated tree grouped them as sister to Telluraves with low support (bootstrap = 32; Extended Data Fig. 4). Quartet frequencies did not follow an ILS-alone scenario, because moving Strisores to the base of Elementaves had quartet frequencies similar to the main tree (χ2 test, PBenjamini–Hochberg adjusted = 0.317, node 39), but the third alternative had lower frequency (P = 0.488 × 10−11). Possible explanations include hybridization or long branch attraction, because Strisores have 4–28% longer branches than the other Elementaves, which may attract them to the long-branched Telluraves (Fig. 2b). Previous studies also failed to find unequivocal support for the relationship of Strisores, placing it as sister to Otidimorphae1, Cursorimorphae11, Opisthocomiformes3 or all other Neoaves2. Within Strisores our tree positioned Caprimulgidae (nightjars), rather than Sedentaves (oilbird + potoos)12, as sister to all others (Extended Data Fig. 2), as found previously2,11.

Difficult placement of owls and hawks

Within Telluraves our main tree supported the proposed split into Australaves and Afroaves1,3 in contrast to other studies2,11. Our tree grouped Accipitriformes and Strigiformes as the sister to the remaining Afroaves, similar to previous coalescent-based analyses1. Concatenated analyses1,3, including ours, supported Accipitriformes alone as sister to the remaining Afroaves (Extended Data Fig. 4). This node also showed quartet frequencies that were statistically indistinguishable for two topologies (35 versus 34.6%, χ2 test, PBenjamini–Hochberg adjusted = 0.130), but the third was significantly lower (30.5%, P < 10−16; node 26 in Fig. 2a), contradicting expectations of ILS. Because we found no evidence of long branch attraction (Extended Data Fig. 7d), the non-ILS patterns could be indicative of ancestral hybridization36. In contrast to GTs, direct analysis of alignment sites using CoalHMM (Methods) supported an ILS-like pattern in which the two alternative topologies had similar scores (31.2 versus 29.6%). However, CoalHMM assumes ILS a priori and only a strong signal of hybridization can lead to inferring unbalanced quartet frequencies. Thus an ancestral hybridization event, albeit too weak to be detected by CoalHMM, remains plausible. In addition, we observed that the relationship between Accipitriformes and Strigiformes depended on the number of passeriform taxa sampled. The main topology was obtained only when at least 138 Passeriformes were included, whereas sampling fewer taxa of each order favoured Accipitriformes as the sister to the remaining Afroaves (Fig. 2c). This case demonstrates that the effect of taxon sampling of one group can extend to others and that these sampling effects are not easily predictable.

Insights into the passerine radiation

Our analyses of phylogenetic relationships among Passeriformes (perching birds) included 173 species in 121 families and seven fossil calibrations. The most recent common ancestor of Passeriformes was dated to 50.7 Ma (95% credible interval 48.3–53.0; Fig. 1). This estimate is broadly similar to those from other studies with broad taxon sampling (47–53 Ma (refs. 2,3,23,24)), whereas a previous genomic study that included only five passeriforms found a considerably younger age (39 Ma (ref. 1)). The split between Tyranni (Suboscines) and Passeri (Oscines) was estimated at 47.3 Ma (95% credible interval 45.1–49.8; Extended Data Fig. 3), in line with a previous study2, but 3–4 Ma older than other estimates3,24. Tyranni and Passeri were estimated to have started diversifying around the same time whereas other studies supported a 3 Ma difference between the onset of their diversification2,3. The three main clades of Tyranni (Eurylaimides, Tyrannides and Furnariides) were inferred to be 4–12 Ma younger than previously found37. In Passeri, the age of Corvides was estimated to 25.7 Ma (95% credible interval 23.8–27.7), agreeing with some previous estimates24 but over 5 Ma younger than others3. The divergence of a major subclade of Passerides (Sylviida + Muscicapida + Passerida) was inferred to have occurred shortly after the Palaeogene–Neogene boundary (22.4 Ma, 95% credible interval 20.6–24.2; Extended Data Fig. 3) whereas previous studies placed its divergence before the boundary3,23,24. This branch and some subsequent divergences occurred in close succession, indicating rapid diversification.

In Passeri, our tree differed from studies based on UCEs or 5′-untranslated region sequences3,24,38, including in the positions for Orioloidea, Malaconotoidea, Corvoidea, Mohouidae, Neosittidae, Regulidae and Urocynchramidae (Fig. 3d (asterisks) and Supplementary Information). Some of these difficulties also appear to be related to rapid diversification, seen for example in the extremely short internode of Mohouidae (0.18 Ma).

Fig. 3: Effect of increasing data quantity.
figure 3

ac, Species trees were reconstructed from subsets of GTs (1,000, 2,000, ..., 32,000) of the 63,430 intergenic regions in 50 replicates. a, The addition of loci increases similarity to the main tree (left) and increases the proportion of highly supported nodes (right). b, The main tree, with branches coloured according to the difficulty involved in consistently recovering the clade across subsets. Most branches were consistently obtained with only 1,000 GTs (grey); the remaining 40 branches required more loci. c, Increasing the number of loci decreases the number of possible sister groups. We recorded the number of unique sister groups for each node across subsets. Colours correspond to the difficulty (from b), and shading and number show the frequency, with which the main topology was obtained. The top row illustrates examples of easy nodes. in which the same sister group was consistently recovered with 2,000, 4,000 and 16,000 loci, respectively. The remaining plots show the most difficult nodes, in which multiple sister groups were supported even when 32,000 loci were subsampled. d, Ten selected species trees, data types used in each and the support for all challenging branches (labelled in b). Asterisks indicate relationships in Passeriformes that differ from previous studies. MNO, Malaconotoidea + Neosittidae + Orioloidea; MMNO, Mohouidae + MNO, PP, posterior probability; Q, quartiles.

Source Data

Rheas have conflicting placements

Outside of Neoaves we found support for different relationships of Rheiformes within Palaeognathae, a conflict previously attributed primarily to ILS39. Whereas our main topology found Rheiformes as the sister to Tinamiformes, analysis with CoalHMM put it as sister to Apterygiformes + Casuariiformes (Extended Data Fig. 7g), in agreement with that previous study39. We found that Rheiformes and Tinamiformes had a higher proportion of loci with high guanine–cytosine (GC) content than other taxa (Extended Data Fig. 7e). We observed that omission of loci with similar GC content for Tinamiformes and Rheiformes, but not for others, tended to reduce (but not eliminate) support for this clade (Extended Data Fig. 7g). These results suggest that the strong support for this grouping in our main tree was enhanced by biased GC content, leaving other placements of Rheiformes (for example, as sister to Apterygiformes + Casuariiformes, as recovered by CoalHMM) as plausible.

Effect of taxon sampling varies

The question of whether to sample more species or more genetic loci is pivotal in phylogenetic study design40. Whereas expansion of taxon sampling helps to mitigate the confounding impact of long branches within GTs26,41, its effects on species-tree inference are less clear. To investigate this question we randomly selected between one and ten species for each order and constrained the 63,430 intergenic GTs to the selected taxa before rescoring the species tree. These changes in taxon sampling affected ordinal relationships in only three cases (Extended Data Fig. 7f), with the aforementioned Accipitriformes + Strigiformes being the strongest example (Fig. 2c). More frequently we observed that increasing taxon sampling affected only the amount of GT discordance but not the topology (for example, Telluraves + Elementaves in Fig. 2c). Thus our results are relatively robust to taxon sampling, although with some exceptions.

Number of loci needed varies across nodes

As access to large numbers of loci becomes common, the choice of how many and which loci to select is a fundamental decision42. Using repeated subsets of the 63,430 dataset, we found that greater locus sampling resulted in trees more similar to the main tree and with higher support (Fig. 3a). The same trend was observed across all partitions of the genome (intergenic regions, introns, UCEs and exons; Extended Data Fig. 8a,b) and with other species trees as reference, except the purely exonic one (Extended Data Fig. 8c).

We assessed how many loci were required to consistently recover each clade of the main tree (Fig. 3b). We found that most clades (321 of 361, 89%) could be identified with just 1,000 loci. A minority of clades (30 of 361, 8%) needed substantially more, from 2,000 to 32,000 loci, before analyses could consistently support them (Fig. 3c). In the remaining ten clades (2.8%) increasing the number of loci reduced incongruence but did not consistently recover the main topology across replicates, even with 32,000 loci (Fig. 3c and Extended Data Fig. 9). Most of these difficult nodes were associated with short branches after the K–Pg boundary and within Corvides (Fig. 3b). For example, mousebirds (Coliiformes), placed in agreement with some studies1,2,3 in our main tree, had an alternative placement in 30% of subsets of 32,000 loci, consistent with previously reported difficulties1,14.

Strong effects of different locus types

Species trees built from GTs of different data types were substantially different, especially between protein- and non-coding data, akin to previous findings1,12,13. The species tree built from 14,355 exon loci (excluding the hypervariable third codon position) differed in 38 of 360 branches from the main tree (compared with six or seven differences for the other data types; Extended Data Fig. 4). Beyond dissimilarity to the main tree (Fig. 3d), trees inferred from exons were less internally consistent—they were more sensitive to subsampling than trees built from other data types (Extended Data Fig. 8a–c). Even when controlling for the number of GTs used in species-tree construction, exons produced more variable trees than other data types (Fig. 4a).

Fig. 4: Phylogenetic signal across the genome.
figure 4

a, Protein-coding regions yield more varied species trees when they are subsampled. Each heatmap cell shows the average Robinson–Foulds distance between 1,250 (diagonal, 1,225) pairs of species trees, each built from 2,000 GTs of different data types. Values in parentheses give the same metrics for 8,000 GTs, omitting UCEs with fewer loci. b, Effect of subsetting loci by data type and different metrics. The y axis represents the number of differences to the main tree; the x axis shows two metrics split into four quartiles, from low to high. Phylogenetic informativeness is the proportion of parsimony-informative sites. Clocklikeness is the coefficient of variation in root–tip distances, a measure of branch length heterogeneity. Extended Data Figure 8g shows other metrics. c, Patterns of phylogenomic incongruence along the genome. Using the 94,402 loci binned approximately every 500 kb, lines show Robinson–Foulds (RF) distances to the main tree (top), variance in GC content (middle) and recombination rate (bottom). Horizontal lines indicate genome-wide averages.

Source Data

We found that data types differed in regard to the risk of violating assumptions of phylogenetic models. A much higher proportion of exonic loci was found to be at risk of sequence saturation (30.83%) compared with the other data types (intergenic regions, 0.07%; UCEs, 0.34%; introns, 0.83%). The evidence for violation of stationarity was generally low, yet highest among exons (exons, 2.45% of loci failing the test; UCEs, 0.02%; intergenic regions, 0.07%; introns, 0.08%). Moreover, because individual exons of the same gene were combined into one locus, the assumption that phylogenetic loci are recombination free is expected to be more frequently violated by exonic loci. An exonic locus can span wide stretches of the genome because its individual exons are not contiguous (mean sequence length 16,964 base pairs, range 149–566,199) as opposed to loci of other data types (mean sequence length: introns, 2,543 base pairs; UCEs, 2,095 base pairs; intergenic regions, 897 base pairs). Because the increased length of exons increases the risk of within-locus recombination, analysis of only intergenic regions minimizes the risk of recombination and model violations.

We found that exonic loci had less phylogenetic information and were more variable in their signal than the other data types (Extended Data Fig. 8d,e). Exons also scored highest in a measure of phylogenetic estimation difficulty (Extended Data Fig. 8f), indicating that their GTs are less reliable than those of other data types. To examine whether exons had a misleading signal, we restricted species-tree inference to GTs with more signal, less gappy alignments, greater clocklikeness and greater total length. Unlike intergenic regions, in which subsampling did not systematically change the species trees, the use of more informative, less gappy and more clocklike exons reduced incongruence between the resulting species trees and the main tree (Fig. 4b and Extended Data Fig. 8g). Thus exons yield phylogenetic trees that are less reliable. This conclusion is consistent with earlier analyses based on fewer genomes1,12,13,29. Our results indicate that the damaging effects of model violation and limited signal of exons are not offset by increased taxon sampling, as one might hope2,43.

To investigate whether the confounding effects of exons could be swept out by other data, we gradually augmented purely intergenic loci (Extended Data Fig. 1b). The addition of 1 kb windows overlapping with introns (resulting in a total of 80,047 loci) led to the same topology (Fig. 3d). However, when windows overlapping with exons were added (94,402 loci), the resulting tree agreed with the main tree on the first four neoavian clades (Mirandornithes, Columbaves, Telluraves and Elementaves) but differed in five difficult branches (Fig. 3d and Extended Data Fig. 4). This 94,402-locus topology was also obtained when adding UCEs, purely intronic loci and purely exonic loci (not those overlapping with 1 kb windows) to either the 63,430 set (128,233 loci) or the 94,402 set (159,205 loci). Removal of loci that failed saturation and stationarity tests from the full set (153,789 loci remaining) returned the same tree, albeit with low support on branches conflicting with the main tree. These results indicate that the inclusion of exonic loci, even if these constitute just 10% of the data and are restricted to those that pass the testing of model fit, can affect the most unstable parts of the tree. This finding can partially explain the different topologies reported in other studies using a high proportion of coding regions2,11. By contrast, exclusion of introns did not make a difference topologically in our analyses. Nevertheless, we treat as uncertain the five branches that differ between purely intergenic regions and these alternative trees (Fig. 3d).

Discordance along chromosomes

Averaged over 500 kb windows, GT discordance levels were mostly consistent along chromosomes (31.4% normalized Robinson–Foulds distance to the main tree; Fig. 4c). However, we observed some notable troughs and peaks of GT discordance, particularly around the telomeres and some centromeres (relative to the chicken genome), agreeing with previous findings regarding telomeres1. Gene trees inferred from macrochromosomes (below 50 Mb) were slightly less distant to the main tree than intermediate chromosomes (12–40 Mb) and microchromosomes (average size 12 Mb; Extended Data Fig. 10a). The higher discordance near telomeres and across microchromosomes may be related to their elevated richness of genes, variation in GC content and higher recombination rates (Fig. 4c and Extended Data Fig. 10b–d) leading to higher local effective population size and challenging phylogenetic reconstruction. The Z chromosome had the lowest discordance (Extended Data Fig. 10a), consistent with its lower recombination rate. Species trees inferred from individual chromosomes resulted in topologies with 1–3% difference to the main tree, with most differences observed in microchromosomes followed by intermediate chromosomes (Extended Data Fig. 10a).

Implications for avian diversification

We next evaluated how well the new phylogenetic tree reflects avian morphology, testing the expectation that closely related species should resemble one another. We found that our main tree fits morphological traits better than the topology of ref. 2, even when controlling for taxon sampling (Fig. 5a), including the larger number of Passeriformes in our study (supplementary results given in Supplementary Information). Simulations considering the misplacement of taxa and convergent scenarios suggested that the higher phylogenetic signal in this comparison was more probably attributed to topological differences (Extended Data Fig. 11a).

Fig. 5: Biological implications of the new time tree.
figure 5

a, The main tree fits morphological traits well. We measured phylogenetic signal (Pagel’s lambda) for nine traits over 100 replicates and compared the fit based on (1) the main tree, (2) the ref. 2 topology and (3) the main tree with random species sampling to match the sample size used in ref. 2 (one-sided t-test with Bonferroni correction). b, The K–Pg and Palaeogene–Neogene transitions were associated with increased effective population sizes of some lineages. Shown are the midpoint ages of each branch compared with the ratio between its length in time units and in coalescent units, which is proportional to the effective population size of that branch and its generation time. Numbers correspond to selected nodes from Fig. 2a. c, Variations in body mass and relative brain size over time changed in different directions following the K–Pg event. Solid lines indicate mean values and ribbons mark 95% confidence intervals. The dashed parts of the reconstruction (from 25 Ma) indicate possible uncertainty due to the lack of within-family sampling (Extended Data Fig. 11g). d, Substitution rates increased around the K–Pg boundary. Estimated molecular rates for the intergenic regions are plotted against the midpoint age of each branch.

Source Data

Next we compared branch lengths in time units and coalescent units, which should be proportional to population size, ignoring the effect of varying generation time (Methods). We found a strong signal of increased population sizes on nearly half of the branches 0–2 Ma following the K–Pg transition (Fig. 5b), in agreement with an earlier analysis of insertions and deletions44. This pattern could be indicative of lineages undergoing density compensation, a transient increase in population size in response to ecological opportunity and release that may be associated with adaptive radiation45. Birds would have been well positioned to exploit landscapes newly devoid of competitors and predators following the K–Pg mass extinction because of their flight capabilities. Vagile insectivores and marine species such as Strisores and Aequornithes could have rapidly expanded into early-succession habitats. A less marked spike was also observed around the end of the Palaeogene (Fig. 5b). There was also an apparent gradual decline in the ratio of time and coalescent unit branch lengths by close to an order of magnitude over 60 Ma. A reduction in generation times could plausibly produce this result, possibly reflecting an increase in numbers of passerine families through time. There has also been a trend toward reduced inferred body sizes over this time (Fig. 5c), and it has long been appreciated that taxa with small body size have short generation times46.

Substitution rate estimates for the intergenic regions also showed a strong increase at and shortly after the K–Pg boundary (Fig. 5d), and a more diffuse increase near the boundary to the Neogene. The rate increase near the K–Pg boundary has been noted for other data types and attributed, at least in part, to the ‘Lilliput effect’ (refs. 47,48). This refers to decreases in body size in the wake of mass extinctions; those changes in body size would affect other life history traits, such as generation time. Consistent with this explanation, we found a decrease in reconstructed body size following the K–Pg event (Fig. 5c). This was accompanied by an increase in inferred relative brain size shortly before the K–Pg event, suggestive of strong selection for adaptability or behavioural flexibility, consistent with previous findings49. Shortly after the K–Pg event, the continuous changes of inferred relative brain size appear to have ceased (Fig. 5c). From around 35 Ma the reduction in reconstructed body mass does not seem to have been accompanied by an increase in relative brain size.

Across the tree we found that rapid evolutionary change occurred at the origin of major clades, throughout the diversification of some clades and along some isolated branches. Passeriformes exhibited a burst of body mass evolution at their most recent common ancestor (Extended Data Fig. 11b). Rates of evolution in relative brain size were more variable, with rapid evolutionary change in some clades (for example, Telluraves, vocal learning lineages such as parrots, corvids and hummingbirds)49. In addition, our data showed that the early burst was followed by sustained varied rates within these groups, especially in Passeri (Extended Data Fig. 11c).

Conclusions

Relationships along the backbone of Neoaves have long been contentious, with various analyses yielding incongruent results. At the heart of the disagreements has been a long-standing question: is it better to sample many taxa at a few loci (typically conserved regions, such as exons and UCEs) or sample many loci widely across the genome, even if available from fewer species? We can finally answer this question because our data provide both dense taxon sampling and many loci across the whole genome. We observed that the number of loci, in addition to sequence types (for example, exons, introns, intergenic regions or chromosome type), had a much greater effect on the inferred tree than taxon sampling. Nevertheless, increased taxon sampling was crucial in inferring more precise dates, and for studying traits, trajectories of population size and substitution rates. By focusing on intergenic regions, a source of data largely unused in the past, we minimized model violations and increased phylogenetic resolution. Nonetheless, our results also showed that several recalcitrant relationships remain, even with this wealth of data, due to challenges imposed by biological processes such as hybridization that are hard to model in deep time using phylogenetics. Overall, our results underscore the complexity of genome evolution and show methodologies that are likely to be useful for future phylogenomic studies focused on deep relationships.

Methods

Further details on methods are given in Supplementary Information. No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.

Selection of genomic regions for phylogenomic inference

For the main tree, we used putatively intergenic regions extracted from a Cactus whole-genome alignment4,51. We converted the HAL alignment to MAF format using chicken as the reference and extracted the best-aligned synteny blocks from each query species using 10 kb windows (https://github.com/Secretloong/Cactus_Alignments_Tools, using HALtools52 v.2.3), skipping regions that were repetitive in chicken or those present only in Galliformes. Among the first 2 kb of each window, the 1 kb portion with the most site-wise occupancy was selected to avoid portions with few sequences. The decision to use 1 kb loci from which to estimate GTs was made following preliminary assessments (Extended Data Fig. 1d). Therefore loci were 8–9 kb apart, reducing the risk of strong linkage53. We excluded fragmentary sequences (under 50% of the median length of all sequences of the locus) and loci with fewer than four sequences. This resulted in 94,402 loci for which we estimated GTs. Based on the chicken genomic annotation, we identified 1 kb loci which had overlap with exons (14,355 loci) or introns (16,617 loci) and created smaller datasets without these regions (Extended Data Fig. 1b). Subtraction of these from the total loci resulted in 63,430 purely intergenic loci, which were used to construct the main tree.

We also extracted loci of other data types and applied the filtering described above. This resulted in 44,846 intronic, 14,972 exonic and 4,985 UCE loci. Introns were extracted from the Cactus alignment following previously described procedures4, reconstructing individual GTs for each intron of the same gene. Protein-coding regions were obtained from genome annotations4 and all exons of the same gene were analysed as one locus; these were further filtered and aligned. This was done with an iterative PASTA54 v.1.8.5 pipeline that included TreeShrink55 v.1.3.1 to remove outlier sequences, alignment with MAFFT56 v.7.149b G-INS-i with a variable scoring matrix57 to isolate potentially unrelated segments and removal of these blocks. We excluded third codon positions because these were previously shown to be problematic1. UCE loci were extracted using PHYLUCE58 v.1.6.3 (commit 185b705) targeting 5,060 UCEs and 1,000 base pair flanking regions. After filtering, 5,006 UCE loci remained. Alignment and exclusion of outliers was conducted similar to the protein-coding regions but using MAFFT L-INS-i without removal of alignment segments.

Generation of GTs and species trees

A total of 159,205 GTs were estimated using maximum likelihood tree inference with Pargenes59 v.1.1.0, which uses substitution model selection through Modeltest-NG60 v.0.1.3 and RAXML-NG61 v.0.9.0, with ten random and ten parsimony starting trees and scaled branch lengths. To identify and collapse poorly supported branches before running ASTRAL we used IQTREE62 v.1.6.12 to perform parametric approximate likelihood ratio tests (aLRT), which are rapid tests of the three possible nearest-neighbour resolutions around a branch63 and are more computationally efficient than bootstrapping. Outputs from Pargenes were used for computing aLRT scores. Poorly supported branches were contracted to polytomies using newick-utilities64 v.1.6 if their aLRT value was below 0.95.

Collapsed GTs were summarized into a coalescent-based species tree using ASTRAL-MP65 v.5.14.5. Support was assessed using posterior probability. We also performed gene-only, multilocus bootstrapping (globalBS) for cases in which uncertainty is not local (for example, two placements many branches away both resulting in high quartet support), a scenario that can mislead local posterior probability support66. In addition we tested polytomy null hypotheses35 and evaluated the quartet score of the three alternative nearest-neighbour interchanges around each branch66. Quartet scores were visualized using DiscoVista67. We evaluated alternative species trees (for example, moving Phaethontimorphae) by scoring these trees against the same input GTs using ASTRAL.

For a concatenated analysis of the 63,430 loci under maximum likelihood we used RAXML-NG v.1.0.1, partitioning by locus (63,430 partitions) with their previously determined substitution models. We ran 20 independent searches from random starting trees and picked the highest-scoring tree. We then ran 50 tree searches on bootstrapping pseudo-replicate alignments, judged sufficient according to the extended majority rules (MRE) bootstrap convergence criterion68. To save time and energy we used a topological constraint for all tree searches (maximum likelihood and bootstrapping). This was a strict consensus of the ASTRAL trees (63,430 loci, exons, introns and UCEs) and of an initial maximum likelihood run on the 63,430 loci (based on ten tree searches with five random plus five parsimony starting trees, no bootstrapping). This consensus left the backbone nodes free to be inferred with constraining uncontroversial nodes within orders (317 nodes resolved, 45 collapsed).

Fossil calibration and molecular dating

We performed molecular dating using a Bayesian sequential-subtree approach69. This involved using date estimates from an initial analysis of a backbone tree (56 tips) containing two representatives of each of 11 subtrees. This provided secondary calibrations for subsequent dating analyses of 11 subtrees (19–42 tips each). The subtrees were then attached to the backbone to assemble a timetree of all 363 taxa.

We performed molecular dating using a subset of the 63,430 loci. For all loci we estimated phylograms in IQTREE70 v.2.0.4 under GTR + F + R4, fixed to the main topology and rooted with FastRoot71. We selected 10,494 loci with the lowest coefficient of variation in root–tip distances, thereby retaining the most clocklike loci. For locus partitioning we randomly divided loci into two groups of 5,247 within which we partitioned based on their macro-, intermediate and microchromosomal origin. The two locus groups were used for dating. Half of the loci were used to date the backbone tree and the other half to date the subtrees, thus avoiding data duplication in the likelihood.

For node-based calibrations we identified 34 clades with fossils fulfilling best-practice criteria72 (Supplementary Information). We used CladeDate73 to generate calibration densities empirically based on fossil occurrences (187 fossils) and estimators of distributions in which the truncation was the estimated age of the clade23,74. We used the Strauss and Sadler75 estimator for uniformly distributed fossil occurrences; otherwise, we excluded the Quaternary record or used estimators that do not assume sample uniformity73. The resultant distributions of clade ages were used to fit Student-skew distributions to parameterize calibration priors.

The posterior distributions of the ages of the 11 nodes in the backbone tree that corresponded to the root nodes of the subtrees were fitted with skew-t densities using the R function sn::st.mple v.2.0.0, under the BFGS method for parameter optimization76. The skew-t parameters were then used to specify the prior distributions of root ages for dating analyses of the subtrees.

Bayesian molecular dating was conducted using MCMCtree77 v.4.9h, with approximate likelihood calculation78 and under the GTR + G model. The analyses included all calibration priors, a minimum bound on root age based on an uncontroversial neornithine fossil79 and a soft maximum bound at 86.5 Ma. Nodes without calibrations followed a birth–death process prior80 (λ = μ = 1, sampling fraction ρ = 0.1), which gives an approximately uniform kernel. We used a relaxed clock with lognormally distributed rates across branches and a gamma-Dirichlet prior on rates across the three subsets of loci81. During Markov chain Monte Carlo sampling, samples were drawn every 2,500 steps over a total of 5.5 × 107 steps following 5 × 106 burn-in, run twice.

We performed four additional analyses with alternative settings (Extended Data Fig. 6): (1) uniform calibration priors with ranges spanning the 95% probability density of the original calibration prior, adding a soft maximum bound with a 2.5% tail of probability; (2) a Jurassic age bound with a relaxed maximum age bound of 201.3 Ma on the root; (3) a calibration subset of 23 calibrations that were considered to be the most reliable (Supplementary Information); and (4) a set of 10,494 loci randomly selected from the 63,430 set, split into two equal groups of 5,247 and randomly partitioned into three subsets of 1,749 loci.

Subsetting analyses

Taxon sampling

To investigate the effect of sampling multiple species across orders (which represent the most contentious branches), we successively reduced taxon sampling to 50, 25, 10, … 2 or 1 species per order. We randomly selected species from the existing GTs of the 63,430 locus set, retaining all if fewer than the desired number were available. We then scored the main tree against the taxon-reduced GTs to compute the normalized quartet support for the three topologies around each branch. These analyses showed substantial impact only for Accipitriformes, in which fewer than 50 species were required to recover the main relationship. Because only Passeriformes had fewer than 50 taxa, we inferred that their sampling affected the position of Accipitriformes. To test this we removed 1, 3, ... 171 of the 173 Passeriformes in random order and computed quartet scores with GTs restricted to that subset. Two replicates produced indistinguishable results.

Data quantity

Of the 63,430 loci included in the main analysis we randomly selected subsets of increasing numbers of GT up to maximally half of the available GTs (1,000, 2,000, … 32,000). Each subset was repeated 50 times and an ASTRAL tree was estimated for each. The subset topology was compared to the main tree by counting the number of differing branches (Robinson–Foulds distance/2) using TreeCmp82 v.2.0 and calculating the proportion of highly supported branches (posterior probability ≥ 0.95). We recorded whether each clade of the main tree was present in subset trees and counted how many different sister groups were present across the 50 replicates of each subset. We performed the same analyses for the other data types, maximally sampling about half of the available loci. This included exons (50 times sampling 1,000, 2,000, … 8,000 GTs), introns (1,000, 2,000, … 32,000) and UCEs (1,000, 2,000). We also performed the analyses using all non-coding (80,047 windows, introns and UCEs totalling 129,878 loci) GTs (1,000, 2,000, … 64,000).

Data type

We compared topological differences between trees for each data type, also controlling for the number of GTs used. We subsampled loci at random (50 times). The highest number of GT subsets present across all data types was 2,000 (limited by the number of UCEs). To show the effect of increasing loci we also performed the analysis for 8,000 loci, omitting comparisons with UCEs. We calculated mean pairwise Robinson–Foulds distances between resulting species trees.

Genomic characteristics

For GTs we calculated taxa number, tree length, tree diameter, stemminess, clocklikeness, mean branch support and proportion of branches with aLRT above 95 and above 99. For gene alignments we calculated locus length, total coverage, number and proportion of parsimony-informative sites and mean and s.d. of GC content (with seqkit83 v.2.2.0). We predicted the difficulty of phylogenetic estimation under maximum likelihood using Pythia84 v.1.0.0, which estimates whether the alignment is likely to result in multiple, topologically highly distinct yet statistically indistinguishable topologies. We divided loci into four equal-sized quantiles based on their values for each metric (20,011 loci based on 80,047 loci). We then estimated an ASTRAL tree for each quantile and calculated Robinson–Foulds distances to the main tree.

Analysis by chromosomes and chromosomal category

We built 16 species trees from GTs of the 80,047 loci according to their chromosomal assignment in chicken, excluding small chromosomes (fewer than 1,000 GTs, chr15, chr16, upwards from chr21). We also built species trees for each of the chromosome size categories of birds85—that is, macrochromosomes (49,686 GTs), intermediate chromosomes (11,592), microchromosomes (12,740) and the Z chromosome (5,672). To investigate discordance within and across chromosomes we calculated Robinson–Foulds distances to the main tree for each of the collapsed GTs from the 94,402 set, normalized to the numbers of nodes in each GT. We investigated potential genomic colocalization with the s.d. of GC content, because high deviations violate common model assumptions, and with recombination rates estimated for chicken86. We estimated mean values using the same bins as that study85 (approximately 500 kb).

Phylogenetic model adequacy

We tested for excessive amounts of non-stationary base composition using Foster’s posterior predictive simulations method87, adapted to maximum likelihood using a parametric bootstrap88. We also tested for misleading inferences due to substitution saturation using entropy tests on parsimony-informative sites89. For both tests, loci were defined as having a high risk of misleading inferences under scenarios in which all simulations yielded inaccurate inferences. We built an ASTRAL tree based on all loci that passed both tests (153,789 loci remaining).

Investigation of specific nodes

CoalHMM

CoalHMM was used to estimate ILS levels of two clades that were difficult to resolve in our main analyses, Rheiformes and Strigiformes + Accipitriformes. We filtered and split alignment blocks into 1 Mb chunks on which CoalHMM was run90. We tested potential placements of Rheiformes within Palaeognathae using one representative for each order (using the most contiguous genome) and for all chromosomes. CoalHMM was also run for potential placements of Strigiformes and Accipitriformes, using Passeriformes as the outgroup and Bucerotiformes to represent the remaining Afroaves. The best-fitting topology was chosen based on posterior probabilities. Under an ILS model and in the absence of phenomena such as ancient hybridization, the proportion of sites supporting topologies different from the species tree should be equal.

GC content within Palaeognathae

Because we suspected that convergent GC content between Tinamiformes and Rheiformes may affect GT estimation, we defined a measure of GC similarity (∆GC; Supplementary Information). This should be zero under the stationary models of evolution used for phylogenetic inference. Positive values deviate from the model uniting Tinamiformes + Rheiformes and negative values have the reverse effect. For 54,651 of the 63,430 loci that had all relevant species present, we calculated ∆GC and created nine subsets of loci. We ran ASTRAL on each subset, and all of them united Tinamiformes + Rheiformes. We computed a normalized quartet score around the branch to investigate whether subsets without high ∆GC had lower quartet support for Tinamiformes + Rheiformes.

Inference of effective population size

We compared the time tree with the coalescent unit lengths estimated by ASTRAL. For each internal branch we computed the ratio of the branch length in time units to coalescent unit length:

$$\frac{{\rm{t}}{\rm{i}}{\rm{m}}{\rm{e}}\,{\rm{u}}{\rm{n}}{\rm{i}}{\rm{t}}}{{\rm{c}}{\rm{o}}{\rm{a}}{\rm{l}}{\rm{e}}{\rm{s}}{\rm{c}}{\rm{e}}{\rm{n}}{\rm{t}}\,{\rm{u}}{\rm{n}}{\rm{i}}{\rm{t}}}=\frac{{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}\,{\rm{t}}{\rm{i}}{\rm{m}}{\rm{e}}\times {\rm{n}}{\rm{u}}{\rm{m}}{\rm{b}}{\rm{e}}{\rm{r}}\,{\rm{o}}{\rm{f}}\,{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}{\rm{s}}}{{\rm{n}}{\rm{u}}{\rm{m}}{\rm{b}}{\rm{e}}{\rm{r}}\,{\rm{o}}{\rm{f}}\,{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}{\rm{s}}/({2N}_{e})}=2\,{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}\,{\rm{t}}{\rm{i}}{\rm{m}}{\rm{e}}\times {N}_{e}.$$

Higher values are indicative of higher population size (Ne) or longer generation time. Ignoring changes to generation time, higher time to coalescent unit ratios can be attributed to larger Ne. Around the K–Pg boundary the generation times are presumed to have decreased, which makes the increases in our measured quantity indicative of even larger Ne growth than what would be inferred if generation times were assumed constant. Note that summary methods such as ASTRAL are known to underestimate coalescent unit length in the presence of high GT estimation error. However, we compare branches only to each other without claiming to estimate the true Ne. Thus, estimation error, if it is not particularly concentrated on specific nodes, should not affect the relative values.

Analysis of molecular evolutionary rates

Genome-wide evolutionary rates were estimated for each branch using the 63,430 loci. To minimize the estimation bias in substitution rates arising from discordance between the species tree and GTs91, we considered only those GT branches that were concordant with the main tree92. Each concordant branch length was divided by the time duration of the branch from the main time tree analysis, leading to a rate estimate for each species-tree branch for each locus.

Analysis of phylogenetic signal

Pagel’s lambda (λ)93 was measured for nine continuous morphological traits from AVONET94 on the main tree, the topology in ref. 2 and the main tree randomly subsampled to the sample size used in ref. 2 (n = 198). We also performed a comparison between trees pruned to the 124 families present in both studies. To account for the high proportion of Passeriformes in our study we also excluded all but one passerine from both trees. We calculated λ for each trait using 100 simulations using phylolm95. To investigate the potential effects of an incorrect tree topology we simulated traits on the main tree under a Brownian motion model using fastBM96 with λ = 0.96. We then randomly changed the position of 1, 5, 10 and 20% of taxa to represent incorrect relationships, repeated each 100 times, and estimated λ. To investigate the effect of convergent evolution we randomly selected species pairs consisting of one passeriform and one non-passeriform, representing 1, 5, 10 and 20% of taxa. We gave each species pair the same trait value, repeated 100 times, and estimated λ.

Analysis of body mass and brain size evolution

We obtained body mass data (log-transformed) for 363 species94,97 and estimated brain size (volume of the brain case) for 228 species based on endocast volume, or back-calculated it using brain volume = brain mass/1.036 (ref. 98). We used the average of males and females or mean unsexed values when available. For brain size we used missForest99 to impute missing values based on phylogenetic relationships. Relative brain size was calculated as the residual from a log–log phylogenetic generalized least-square regression of absolute brain size against body mass. Ancestral states of both traits were reconstructed by Evomap using a multiple-variance Brownian motion approach100. Variations were summarized by dividing the phylogeny into bins of 1 Ma and averaging in each over all branches.

The rates of evolution in both traits were analysed using BayesTraits101 v.4 with variable-rates models and default priors. Each analysis ran for 110 million iterations with a burn-in of 10 million in triplicates. We used the convergence diagnostic test of coda102 and selected the run with the highest mean marginal likelihood. We also compared the fit of three single-process models (Brownian motion, early burst and Ornstein–Uhlenbeck) using Geiger103 v.2. To compare model fit using Akaike information criterion (AIC) (Extended Data Fig. 11e), we used the mean of the rate-scaled trees of BayesTraits and calculated the likelihood of a Brownian motion model on this tree with the same trait data104. To investigate whether sampling one species per family could affect ancestral reconstructions, we modified tip values to reflect the family’s range in body size94 across 100 replicates (Extended Data Fig. 11f). We also confirmed that inclusion of the imputed brain size values did not change the shape of ancestral reconstructions (Extended Data Fig. 11g).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.