Tectonic collision and uplift of Wallacea triggered the global songbird radiation

Songbirds (oscine passerines) are the most species-rich and cosmopolitan bird group, comprising almost half of global avian diversity. Songbirds originated in Australia, but the evolutionary trajectory from a single species in an isolated continent to worldwide proliferation is poorly understood. Here, we combine the first comprehensive genome-scale DNA sequence data set for songbirds, fossil-based time calibrations, and geologically informed biogeographic reconstructions to provide a well-supported evolutionary hypothesis for the group. We show that songbird diversification began in the Oligocene, but accelerated in the early Miocene, at approximately half the age of most previous estimates. This burst of diversification occurred coincident with extensive island formation in Wallacea, which provided the first dispersal corridor out of Australia, and resulted in independent waves of songbird expansion through Asia to the rest of the globe. Our results reconcile songbird evolution with Earth history and link a major radiation of terrestrial biodiversity to early diversification within an isolated Australian continent.


Corvides "Basal
Oscines" Passerides C  AC  C  AC  AC  AC  C  ABCD  ABCD   ABCD  ABCD  AC  AC  AC  A  ACDEF  EH  B  AC  A  AC  AC  C  ABCDE  ABCDEFH  A  E  EG  A  ABCDE  A  EFI  EF  ABCDE  A  ACDEFI  ACD  C  ACDEFGHI  ADEFGH  A  A  A  A  A  B  EF  EF  ABC  AC  ACD  DGH  DGH  EGH  ACDEFGHI  ACDEFGHI  GH  EGH  GH  EFGH  F  F  ACDEFI  E  E  G  EFI  F  ACDEFI  H  EFH  EFGH  ACDEFGHI  F  EF  EF  EFGH  FGH  H  ACDEFGHI  F  F  F  ACDEFI  ACDEFI  I  ABCDEFHI  G  ACDEFHI  DE  ABCDEFGHI  ABCDEFGHI  EFI  EFI  EFGH  DEF  ADEFH  ADEFH  F  EGH  EGH  F  DEFH       Supplementary Figure 11. Phylogenetic relationships among oscines estimated using maximum likelihood with RAxML represented as cladograms based on: a) 3839 UCE loci that do not overlap with chicken protein-coding genes, and b) the incomplete matrix coded as purines and pyrimidines. Bootstrap support values are 100% unless otherwise indicated by numbers below nodes.
Red branches indicate conflicting relationships that are highly supported in both the tree and the maximum likelihood topology inferred from the incomplete matrix (Supp. Fig. 1); blue branches indicate conflicting relationships that are not well-supported in either the tree or the maximum likelihood tree inferred from the incomplete matrix (Supp. Fig. 1).

Supplementary Tables
Supplementary

Comparison of analytical methods
For ML analysis, a posteriori calculation of autoMRE indicated that bootstrapping converged after 50 replicates. In Bayesian analyses, the average standard deviation of split frequencies (ASDSF) dropped below 0.01 quickly (<< 1 million generations) and remained at this level until runs were terminated (after a minimum of 5 million generations with ASDSF below 0.01). After adjusting run settings (see Methods), chains swapped frequently (adjacent chains ~0.2-0.5), but topology proposals were rarely accepted (< 0.01). Likelihood values from four independent runs plateaued quickly and stabilized in the same range. All potential scale reduction factor (PSRF) values were close to one (0.99 < PSRF > 1.01) and all effective sample size values were greater than 200. Rapid convergence among independent runs and the rarity of successful topology proposals were likely caused by the strong phylogenetic signal from such a large matrix. We removed the first 25% of generations as burn-in and summarized the remaining runs in a majority rule consensus tree.
Analyses of the incomplete matrix (ML, Bayesian, and SVDquartets) produced highly concordant results, and most nodes received strong support ( Supplementary Figs. 1-2). Only a single node differed between ML and Bayesian analyses (Supplementary Fig. 1; see below). The species tree estimated by SVDquartets matched that of the concatenated analyses more closely than gene tree-based coalescent methods (GCM; Supplementary Fig. 2). Most conflicts between SVDquartets and concatenated methods involved nodes with low support in one or both methods. However, the relationships of three taxa (Paradisaea, Artamus, and Regulus) were strongly supported and differed from ML/Bayesian inference. All three of these relationships involved short internodes. Other discrepancies recovered in SVDQuartets, such as the placement of Mohoua and Daphoenositta, were weakly supported. Notably, relationships of major clades and deep lineages were congruent between SVDQuartets and concatenation methods.
Maximum likelihood analysis of the complete matrix produced a phylogeny mostly consistent with ML analysis of the incomplete matrix. As might be expected considering the amount of data in each matrix (4155 loci in the incomplete matrix vs. 515 loci in the complete matrix), some nodal support was lower with the complete matrix. However, two relationships with low or marginal support in the analysis of the incomplete matrix were resolved differently, but with higher support, with the complete matrix. The relationships of Psophodes and Eulacestoma were unresolved with the incomplete matrix, but these genera were inferred as sister taxa with 85% bootstrap support with the complete matrix. Analysis of the complete matrix resulted in Regulus as sister to the clade comprising Sitta, Troglodytes, and Certhia with 71% bootstrap support. Conversely, analysis of the complete matrix resulted in Regulus as sister to Dulus+Bombycilla with 85% bootstrap support. SVDQuartets analysis produced a third, moderately supported relationship for Regulus ( Supplementary Fig. 2). Thus, the relationships of Regulus are best considered unresolved. The last discrepancy between analysis of the complete and incomplete matrices involved a trio of genera (Locustella, Donacobius, and Oxylabes) that produced conflicting relationships in all analyses (see below).
ML analysis of the 3839 loci that did not contain protein-coding sequences recovered a topology identical to that obtained when we included all loci with the exception of three nodes that were not well-supported ( Supplementary Fig. 11a). Likewise, we obtained a similar topology from the "RY-coded" analysis as in our original analyses with the exception of 5 nodes, 3 of which were not well-supported ( Supplementary Fig. 11b). In these two additional analyses, the minor conflicts do not affect our biogeographic conclusions.
Gene tree-based coalescent methods (GCMs) produced species trees with much lower support than concatenated methods and SVDquartets ( Supplementary Fig. 3). We note two important patterns in the species-tree results of GCMs -strong conflict among species tree methods and a spurious, but predictable, placement of taxa that had shorter average sequence lengths. For example, the placement of Mohoua differed markedly between concatenated analyses and GCMs. Whereas concatenated analyses yielded strong support for Daphoenositta and Mohoua as sister taxa and embedded well within the Corvides, all GCMs placed Mohoua in a more basal position as sister to the Corvides, sister to the Passerides, or even sister to the Corvides+Passerides. DNA for Mohoua was extracted from the toepad of a museum study skin, rather than from fresh tissue which we used for all the other species. Although we recovered many UCE loci from the toe pad extraction, the sequences were notably shorter than those from other samples (Supplementary Data 1). Samples derived from fresh tissue that had shorter average locus lengths (e.g. Eulacestoma, Modulatrix, and Macrosphenus) also had anomalous placements in species trees ( Supplementary Fig. 3). We hypothesized that missing data was causing this discrepancy in phylogenetic placement, so we artificially shortened the sequences of other taxa that had strong relationships in both concatenated and species-tree analyses (Python code by C Oliveros). Shorter sequences had no effect on concatenated analyses, but they substantially altered results in summary species tree methods; taxa with shortened sequences were inferred to originate earlier in their clade, or even sister to the whole clade ( Supplementary Fig. 4). All disagreement between concatenated and GCM analyses involved A) sequence length disparity, B) weakly-supported nodes, or C) conflict among summary-species tree methods. Because none of the concatenated results were unambiguously contradicted by GCM analyses, for the remainder of the paper, we refer to the concatenated results.
All analyses produced conflicting results regarding the relationships among a trio of taxa: Locustella lanceolata, Donacobius atricapillus, and Oxylabes madagascariensis ( Supplementary Figs. 1-3). Maximum likelihood analysis of the concatenated, incomplete matrix produced strong support for Oxylabes as sister to the other two species, whereas Bayesian and SVDQuartets analysis of the same matrix, and ML analysis of the complete matrix, produced strong support for Locustella as sister to the other two species. This discrepancy was the only difference between Bayesian and ML analysis of the incomplete matrix. Three of the four GCMs (STAR, STEAC, and ASTRAL) produced strong support for the third possible topology, with Donacobius sister to the other two species (Supplementary Fig. 3). The fourth GCM (MP-EST) recovered a sister relationship between Locustella and Oxylabes, but Donacobius was more distantly related to the pair. Previous studies with greater taxon sampling within these groups, but using fewer loci, produced a variety of moderately supported results. Oliveros et al. 1 found Donacobius sister to Locustellidae+Bernieridae (our GCM results), whereas Alström et al. 2 and Johansson et al. 3 recovered Locustellidae sister to Donacobius+Bernieridae (our Bayesian and SVDquartets results). Notably, the results in Alström et al. 2 were only supported by one of the five loci examined.

Discussion of phylogenetic relationships
Below, we restrict our discussion of phylogenetic relationships to the concatenated results (Fig. 2, Supplementary Fig. 1), except where noted. We use higher level classification names of Cracraft 4 unless noted otherwise.
Our results at the base of the oscine phylogeny largely mirror previous studies 5, 6 11 and Jønsson et al. 12 found Mohoua sister to all other Corvides, but not sister to Daphoenositta, which was embedded in a subclade within the Corvides. Selvatti et al. 6 placed these genera far apart in different subclades of the Corvides, Jønsson et al. 9 placed them both in unresolved positions at the base of the Corvides, and Jetz et al. 10 placed them embedded within different parts of the Corvides, but with equivocal support. Claramunt and Cracraft 7 found Mohoua and Daphoenositta sister to Melanocharis and Vireo, respectively, which our results place far apart in the oscine radiation.
We find strong support for three large clades in Corvides: a novel clade we call superfamily Orioloidea, the whistlers and allies; Malaconotoidea, the shrike-like birds; and Corvoidea, the crows and allies ( Supplementary Fig. 1). We recover Eulacestoma, Psophodes, Oreoica, Falcunculus, Pachycephala, Oriolus, Oreocharis, Pteruthius, and Vireo in a strongly supported clade from Bayesian, maximum likelihood, and SVDQuartets analyses (Supplementary Figs. 1-2). Malaconotoidea and Corvoidea are subtended by long internodes, whereas the Orioloidea have an extremely short basal branch and many short internodes within the clade. Aggerbeck et al. 11 also found support for three clades within their "Core-Corvoidea," however, their placement of several lineages differed from our results. For example, they found Campephagidae (Coracina) was sister to the rest of the shrike-like birds (their clade Y), but we found strong support for the placement of Campephagidae outside the Malaconotoidea. Aggerbeck et al. 11 also found support for a clade similar to our Orioloidea (their clade X), but with additional taxa inside (e.g., Cinclosoma and Daphoenositta) that strongly conflict with our results. Jønsson et al. 9 recovered a weakly-supported clade comprising fewer members than our Orioloidea. Instead, they found equivocal support at the base of their "Core-Corvoidea" for taxa such as Eulacestoma, Oreocharis, and Psophodes. The Corvides topology of Jetz et al. 10 differs dramatically from our study ( Supplementary Fig. 5). For example, they recovered a clade of mostly Corvides taxa that also contained taxa such as Callaeidae (Philesturnus), Cnemophilidae, and Melanocharitidae, which we recovered as basal Passerides (see below). Furthermore, none of our major Corvides clades (e.g., Orioloidea, Malaconotoidea, and Corvoidea) are supported by Jetz et al. 10 Instead, they recovered taxa within our Orioloidea ( Supplementary Fig. 1) scattered throughout their Corvides clade ( Supplementary Fig.  5).
Jetz et al. 10 used a hybrid super-tree/super-matrix approach with topological constraints to reconstruct a phylogeny of all bird species, which subsequently formed the phylogenetic basis of several influential analyses of bird evolution [16][17][18][19][20] . Because Jetz et al. 10 included all bird species, our results can be compared directly to their phylogeny.
We computed a consensus tree from 1,000 trees with the "Hackett" constraints downloaded from birdtree.org/subsets, limiting the species to those we included in our analysis ( Supplementary Fig. 5). Many strongly supported differences are apparent at all levels of the oscine phylogeny, especially in Corvides, placement of transitional Oscine lineages, and basal relationships in Passerides. To quantify concordance between the two trees, we calculated the normalized Robinson-Foulds metric 21 in Paup, ver. 4.0a146 22 , with a result of 0.438. This value means that ~44% of the bipartitions found in the two trees are unique to only one of the trees. This large discrepancy likely relates to the disparate approaches of the two studies. Jetz et al. 10 analyzed a large, sparse super-matrix from relatively few markers whereas we analyzed a massive character matrix for a limited number of samples.

Discussion of divergence time estimates
Dates derived from the Jarvis et al. 23 secondary calibrations (Fig. 1)  We compared these results to oft-cited rates of mitochondrial DNA evolution in birds. Sequence capture techniques produce reads from non-target regions such as the mitochondrial genome. We assembled mitochondrial genomes from cleaned reads using the program ARC (https://ibest.github.io/ARC/) using the mitochondrial genome of Vidua chalybeata (GenBank AF090341) as a reference. We aligned contigs of mitochondrial genomes to the annotated reference genome using Geneious ver. 6.1.2. From these alignments, we extracted gene sequences of NADH subunit 2 (ND2) and cytochrome b (cytb) for each individual. We estimated rates of mitochondrial evolution in ND2 and cytb using the calibrated ultrametric UCE topology in BEAST 2.2 25 . We selected a separate uncorrelated lognormal relaxed clock for each gene. We partitioned each mitochondrial gene by codon position, and for each partition, we selected the GTR+G model and estimated base frequencies. We executed two independent 1x10 8 generation MCMC runs, each sampled every 1x10 5 generations. We removed the first 25% of posterior samples as burnin, and we assessed MCMC convergence and stationarity in Tracer ver. 1.6 26 . For each gene, we then calculated the mean substitution rate estimate and its 95% highest posterior density interval from the postburnin posterior distribution.
Using our topology and time estimates, we recovered a divergence rate of 2.3% per million years for cytochrome b (CI 2.20%-2.42%) and 3.2% per million years for ND2 (CI 3.08%-3.34%). These estimates are faster than the 2% average rate cited for birds but fall well within the range of rates estimated empirically from a variety of avian taxa 27,28 .

Discussion of ancestral range estimates
We focus discussion of biogeographic results on four important clades: all Oscines, the Corvides, the Passerides, and the Core-Passerides. Ancestral range estimates for these clades using their full distribution or their inferred origin produced almost identical results. Model selection with AIC indicated that the DEC-LIKE+j model was a better fit than DEC-LIKE, and therefore we present results of reconstructions using full clade distributions and the DEC-LIKE+j model (Fig. 2 Given the DEC-LIKE+j model, biogeographic analyses estimated the ancestral range of all oscines as Australia. An Australian origin of oscines has been consistently recovered in other studies as well 5,9,11 . The estimated ancestral ranges of Corvides and Passerides varied mainly depending on whether the 15 Ma constraint on New Guinea was used. Without the constraint, the ranges of both nodes were estimated to be New Guinea. However, when a New Guinea range prior to 15 Ma was disallowed, the ranges of both nodes were reconstructed as Australia (Fig. 1). A New Guinea origin for these clades have been found by other authors 9,11 but their results rely on two important, albeit questionable, assumptions. First, these studies assume an older age for these nodes (~30-45 Ma), which appears to be too old based on more recent and independent estimates of the timing of avian diversification 23,24 . The second assumption is that small, ephemeral proto-Papuan islands existed and were biologically relevant for ancestral range estimation. Our study is the first to estimate an Australian ancestral range for these two clades, which we believe is a more plausible alternative to the proto-Papuan origin hypothesis for these groups because of its consistency with paleogeographic reconstructions and the oscine fossil record (see below). Origin of the core-Passerides, the first major oscine clade to radiate outside of Australasia, also depended on presence or absence of the 15 Ma New Guinea constraint. With New Guinea constrained, we inferred a SE Asian origin of the core-Passerides, followed by a rapid radiation and multiple dispersals into the Palaearctic and sub-Saharan Africa. Without a constraint, the origin of this clade was reconstructed as sub-Saharan Africa.
Using alternate models (Supplementary Table 1), biogeographic reconstructions were similar to those inferred with the DEC-LIKE+j model. Identical to results from the DEC-LIKE+j model (Fig. 1), DIVA-LIKE+j (S7) and BAYAREA-LIKE+j (S8) models with New Guinea emergence constrained also inferred Australian origin of oscines, Corvides, and Passerides, as well as the SE Asian origin of Core-Passerides. The only substantial differences between models were found when analyses allowed emergence of New Guinea prior to 15 Ma, when the BAYAREALIKE and BAYAREALIKE+J models yielded Australia+New Guinea or Australia+New Guinea+S and SE Asia for Oscines, Corvides, and Passerides. However, these models assume all cladogenesis occurs within, rather than between designated areas, an unrealistic assumption likely violated in songbirds.

Paleogeography of Wallacea and Australasia
Wallacea comprises a composite geological landscape that assembled within a boundary zone of extensive tectonic convergence between the Australian, Pacific, and Eurasian plates. As such, the geotectonic history of this ecoregion is exceptionally complex, with virtually all known surface and mantle processes locally active during the Cenozoic [29][30][31] . Although knowledge and understanding of this regional geodynamic complexity is far from complete, recent advances in resolving the broad-scale geological assembly of Wallacea has provided an important spatio-temporal framework for testing biogeographic hypotheses and exploring the potential role of this insular system in linking Australasian biotas with Southeast Asia and beyond 32 .
Australia's separation from Antarctica was largely complete by the Late Cretaceous, yet these East Gondwanan fragments remained in close proximity well into the Paleogene 33 . Tectonics in this region shifted dramatically in the Late Eocene (~45 Ma) as seafloor spreading ended in the Tasman and Coral Seas but accelerated along the Southeast Indian Ridge (SIR), starting Australia's rapid progression north towards its present-day position 34 37 , who posit that New Guinea remained largely submerged until about 5 Ma, when a shift in tectonic motions initiated convergence between the Australian and Pacific plates, leading to rapid uplift of the fold-and-thrust montane belt that comprises the CDRs. Despite the profound differences between these competing tectonic models, it is now clear that New Guinea's remarkably recent geological development and rapid orogeny largely precludes the island from playing a significant role in early Oscine diversification and dispersal out of Australia.

Paleontological context
Although the oscine fossil record is sparse, and few taxa are placed phylogenetically, our timeframe and biogeographic hypothesis for oscine diversification agrees broadly with Mayr's 39 review of paleontological constraints on passerine evolution. Our data indicate that crown oscine lineages first reached Asia at the Oligocene-Miocene transition and dispersed worldwide shortly thereafter. The earliest oscine fossils from the northern hemisphere are from late Oligocene Europe 40,41 , but they have not been identified as either crown or stem lineages. The earliest passerine fossils from Africa 42 and the New World 43,44 are from the Miocene.

Paleoecology and evolution of the Australasian mesic biota
We propose that oscine songbirds initially diversified in isolation on the Australian plate, with dispersive elements subsequently colonizing Southeast Asia and other regions of the globe in the Early Miocene when tectonic collision and uplift in Wallacea produced newly emergent islands that enabled biotic interchange between Australasia and Sundaland. Although unconstrained biogeographic reconstructions indicate that many of these early oscine lineages originated in New Guinea ( Supplementary Fig. S7), we interpret this result as a bias associated with the severe Miocene aridification of Australia and wholesale reduction of its mesic biota 45 , as the New Guinea region largely remained submerged until the Late Miocene or Early Pliocene 35,37 .
Cool subtropical environments persisted across much of the Australian continent during the Eocene to Early Oligocene, harboring diverse rainforest communities with strong Gondwanan affinity. As Australia drifted to warmer latitudes, its climate gradually shifted and periods of drought became more pronounced, resulting in the first signs of contraction among rainforest habitats by the Late Oligocene 46,47 . Widespread aridification intensified around the mid-Miocene climatic optimum, driving Australia's once extensive mesic biome into small refugia along the eastern coast, further compounding the sharp decline and extinction of rainforest-adapted taxa that was already underway 45,48 . Importantly, the concomitant development of New Guinea's emerging highland landscape in the Late Miocene provided a vast new refuge for these relictual lineages to colonize along Australia's northern continental margin, which likely prevented extinction of numerous temperate and subtropical groups. Thus, in many respects, the rich montane biota that now inhabits the New Guinea highlands provides a unique window to Australia's past, and some of the early ancestral Gondwanan lineages that once characterized its subtropical rainforest environments. We suggest that this regional paleoclimatic history likely explains the presence of early oscine lineages in New Guinea that predate the island's recent geological development. Prime examples of lineages exhibiting long and bare branches indicative of this relictual hypothesis include the New Guinea endemic Satinbirds (Cnemophilidae) and Berrypeckers (Melanocharitidae), as well as several endemic monotypic genera within the Corvidan radiation (Eulacestoma, Oreocharis, and Rhagologus), all montane taxa that arose prior to the emergence of New Guinea's central cordillera.
Although this hypothesis differs markedly from recent studies of oscine diversification 5,9,11,12 , our results strongly corroborate an earlier hypothesis developed by Schodde 49 that has largely been overlooked because of incongruent temporal frameworks or unconstrained biogeographic analyses. Schodde and colleagues [49][50][51] examined the distribution and community composition of Australasian avifaunae in the context of regional tectonic history, paleoecology, and phylogeny. They identified an ancestral "Tumbunan" avifauna endemic to subtropical montane forests of New Guinea and a narrow corridor of rainforest refugia along the northeastern Australian coast. This community is diverse and elevationally structured within New Guinea, but comparatively depauperate in Australia. Schodde 49 hypothesized that aridification of Australia likely depleted the once widespread wet-adapted communities, which subsequently found refuge in New Guinea during its rapid orogenesis in the Late Miocene to Early Pliocene. Schodde and Christidis 51 questioned the hypothesis that New Guinea comprises the area of initial diversification of Corvides 9,11,12 , citing geological, temporal, and distributional inconsistencies, and concluded that New Guinea was more likely a refuge for relictual lineages as opposed to a "launch pad" for the corvoid radiation.
This signal of impoverishment and extinction among some early oscine lineages is part of a larger biogeographic pattern that is widely manifest in the Australasian mesic biota, suggesting a number of non-avian groups were similarly impacted by the mid-Miocene aridification of Australia. Byrne et al. 45 documented the decline of Australia's mesic biota throughout the Neogene, with the most extensive extinctions and range restrictions occurring in taxa associated with temperate and subtropical rainforest environments. The palynological record for Australia provides some of the clearest evidence of this dramatic and large-scale biotic restructuring 45,47 . For example, the Nothofagus subgenus Brassospora was once widespread among Australian subtropical rainforest environments during the Eocene to Early Oligocene. Macrofossils from Brassospora have been recovered in Australia that date to the Oligocene, and Brassospora pollen has been recorded on the continent as recently as 2 Ma. Although this droughtsensitive clade is now extinct in Australia, several Brassospora species remain a central component of humid mid-montane forests throughout New Guinea. A similar pattern is seen in some Australasian conifers (Podocarpaceae), with Dacrydium and Dacrycarpus formerly common among subtropical environments in Australia, and members of the later genus persisting until the late Pliocene along the southern coast 52 . Both genera are now ubiquitous across the New Guinea highlands.