Extensive introgression and mosaic genomes of Mediterranean endemic lizards

The Mediterranean basin is a hotspot of biodiversity, fuelled by climatic oscillation and geological change over the past 20 million years. Wall lizards of the genus Podarcis are among the most abundant, diverse, and conspicuous Mediterranean fauna. Here, we unravel the remarkably entangled evolutionary history of wall lizards by sequencing genomes of 34 major lineages covering 26 species. We demonstrate an early (>11 MYA) separation into two clades centred on the Iberian and Balkan Peninsulas, and two clades of Mediterranean island endemics. Diversification within these clades was pronounced between 6.5–4.0 MYA, a period spanning the Messinian Salinity Crisis, during which the Mediterranean Sea nearly dried up before rapidly refilling. However, genetic exchange between lineages has been a pervasive feature throughout the entire history of wall lizards. This has resulted in a highly reticulated pattern of evolution across the group, characterised by mosaic genomes with major contributions from two or more parental taxa. These hybrid lineages gave rise to several of the extant species that are endemic to Mediterranean islands. The mosaic genomes of island endemics may have promoted their extraordinary adaptability and striking diversity in body size, shape and colouration, which have puzzled biologists for centuries.

Historical dynamics in effective population size inferred by PSMC for three Iberian species. Supplementary Fig. 26 Historical dynamics in effective population size inferred by PSMC for species of the Balkan species group. Supplementary Fig. 27 Historical dynamics in effective population size inferred by PSMC for two lineages of P. siculus.  Genetic distances were calculated based on WGS SNVs. All samples are clearly separated into groups according to their geographic distribution (except for P. muralis and P. siculus which form separate cluster.

Supplementary Fig. 4 | Summary of node support across methodologies.
Left panels show phylogenetic trees of a, major clades of Podarcis, b, the Balkan group and c, the Iberian group. Black circles with numbers represent the nodes on the phylogeny. Right panels show the support rate based on different datasets and methods for each node. The first column with numbers in the squares is the proportion of local trees derived from 200-kb windows that support the nodes. The grey-scale of squares in the following columns represent bootstrap values (lighter shade indicates a lower support). A blank square indicates that the node is not supported by the corresponding phylogeny.

Supplementary Fig. 5 | Consistency of topologies among different window sizes. a-f, The
Podarcis trees inferred by multi-species coalescent approach (ASTRAL) from fixed-window local trees with different lengths a, 200 kb, b, 100 kb, c, 50 kb, d, 25 kb, e, 10 kb, and f, 5 kb. Bayesian posterior probabilities are provided next to the nodes unless they were 1. Regardless of window size, all topologies are highly consistent. Each panel shows a frequency distribution of bootstrap values on the top left. g-j, The proportion of fully resolved local trees (i.e., bifurcating tree) inferred from the different window sizes. The proportion decreases rapidly from 100 kb to 25 kb. However, most local trees are bifurcating for branches leading to h, major clades and i, within the Balkan group. Most polytomies are located in the j, Iberian group.

Supplementary Fig. 6 | Discrepancy between nuclear and mitochondrial trees.
The discrepancies between phylogenies based on concatenated WGS SNVs (left) and on mitochondrial genome data (right). Fig. 7 | Timing of diversifications. a, A 'DensiTree' illustration of the phylogenies inferred by SNAPP in the BEAST2 software including ten species. Each line in this plot represents a tree sample from every 1000th MCMC iteration of the run. The consensus tree is shown with a grey thick line. b, The consensus tree of SNAPP inference for the Podarcis tree. The blue bars represent the 95% credibility for the timing of a split. Due to computational limitations, only a subset of species were selected. The topology is identical to the major clades from both concatenating and multispecies coalescent methods. c, Estimation of diversification rate by RevBayes. The effect is weak but statistically supports two shifts of diversification rate. d, Timecalibrated tree inferred by MCMCtree. The blue bars represent the 95% credible intervals of estimated divergence times. e, Lineage Through Time (LTT) plot for all extant Podarcis species and lineages. The plot is based on the whole-genome timecalibrated tree. The dash lines indicate the 95% CI of the lineages. The LTT plot suggests that the number of lineages increases during ca 17-15 million years ago (MYA) and 6.5-4.0 MYA. f, Estimation of diversification rate by using a sliding window.

Supplementary
The window size is 2 million years with steps of 0.5 million years. The result is consistent with the LTT plot in panel e. The two grey-shaded, vertical bands in panels e and f represent the time interval in which the diversification rates were higher than average. g, Estimation of diversification rate by maximum likelihood method in treePar. The result also supports two shifts in diversification rate across the time-calibrated tree. The results shown in panels c to g are based on selected genomic regions whole local trees were concordant with the consensus phylogeny.  *support of the introgression event by the phyloNet method; 'Yes' signifies support; **support of the introgression event by the qpGraph method; 'Yes' signifies support; ***minimal errors between best-fitting scenario and the topology of the consensus tree; Note that multiple testing topologies can be comprised in a single minimal error calculation;