A genomic basis of vocal rhythm in birds

Vocal rhythm plays a fundamental role in sexual selection and species recognition in birds, but little is known of its genetic basis due to the confounding effect of vocal learning in model systems. Uncovering its genetic basis could facilitate identifying genes potentially important in speciation. Here we investigate the genomic underpinnings of rhythm in vocal non-learning Pogoniulus tinkerbirds using 135 individual whole genomes distributed across a southern African hybrid zone. We find rhythm speed is associated with two genes that are also known to affect human speech, Neurexin-1 and Coenzyme Q8A. Models leveraging ancestry reveal these candidate loci also impact rhythmic stability, a trait linked with motor performance which is an indicator of quality. Character displacement in rhythmic stability suggests possible reinforcement against hybridization, supported by evidence of asymmetric assortative mating in the species producing faster, more stable rhythms. Because rhythm is omnipresent in animal communication, candidate genes identified here may shape vocal rhythm across birds and other vertebrates.


Table of contents:
• Supplementary Methods ○ Supplementary Methods 1 -Sample collection and storage ○ Supplementary Methods 2 -Genomic data extraction, library preparation and sequencing ○ Supplementary Methods 3 -Chromosome-level reference genome assembly and evaluation ○ Supplementary Methods 4 -Whole-genome variant calling ○ Supplementary Methods 5 -Double-digest restriction site associated DNA sequencing (ddRAD) ○ Supplementary Methods 6 -Identifying candidate loci underlying IOI ○ Supplementary Methods 7 -Estimating timing of admixture in the contact zone ○ Supplementary Methods 8 -Hybrid song instability models and post hoc test ○ Supplementary Methods 9 -Quantification of playback effects ○ Supplementary Methods 10 -Estimation of repeatability ○ Supplementary Methods 11 -Assessing covariance of phenotypes • Supplementary Figures S1 to S15 • Supplementary Tables S1 to S15 • References

Supplementary Methods 2 -Genomic data extraction, library preparation and sequencing
DNA extraction was performed using a modified protocol of the Bionano Prep SP Frozen Human Blood DNA Isolation Protocol for the Bionano Prep SP Blood and Cell Culture DNA Isolation Kit (cat no.80030).Briefly, the input volume was adjusted to 15 µl of blood and then brought to a total of 40 µl with chilled 1x PBS.The lysis and digestion were performed on the whole blood and PBS mixture.Isolated high molecular weight DNA was kept at room temperature for a week to homogenize before quantified with a Qubit 3 fluorometer (Invitrogen Qubit dsDNA Broad Range Assay cat no.Q32850).DNA fragment size distribution was assessed using a pulsed field gel electrophoresis (PFGE; Pippin Pulse, SAGE Science, Beverly, MA) and found the majority of DNA was over 300 Kb.The obtained DNA was then used to prepare the libraries for the different sequencing technologies.For PacBio libraries, DNA was sheared down to ~40 Kb fragment size by needle shearing.To prepare PacBio large insert libraries, the SMRTbell express template pre kit 2.0 (no.100-938-900) was used.Using Sage BluePippin (Sage Science, USA), libraries were size-selected for 15 Kb, and were then sequenced on three 8M SMRT cells with the Sequel II instrument, using the Sequel II sequencing kit 2.0 (no.101-820-200) and 15-h movie time.
To generate linked reads, the extracted DNA was processed on the 10X Genomics Chromium platform (Genome Library Kit & Gel Bead Kit v2 PN-120258, Genome Chip Kit v2 PN-120257, i7 Multiplex Kit PN-120262) following manufacturer's guidelines.The libraries were then sequenced on an Illumina NovaSeq 6000 S4 150-bp PE lane at ~60× coverage.For Bionano libraries, the DNA was labeled using direct labeling enzyme (DLE1) following the DLS protocols (document number 30206).The samples were then imaged on a Bionano on a Bionano Saphyr instrument.Finally, chromatin interaction (Hi-C) libraries were generated by Arima Genomics (https://arimagenomics.com/) from the blood samples with in vivo cross-linking using the two-enzymes Arima-HiC kit (P/N: A510008).The proximally ligated DNA was sheared, size-selected around 200-600 bp with SPRI beads, and enriched for biotin-labeled DNA using streptavidin beads.Illumina libraries were then generated from the fragments using KAPA Hyper Prep kit (P/N: KK8504) and subsequently amplified through PCR and purified using SPRI beads.After a quality check with qPCR and Bioanalyzer, the libraries were sequenced on Illumina HiSeq X at ~60× coverage following the manufacturer's protocols.
For population genomics analyses, we sequenced 137 further tinkerbirds at the wholegenome level (Supplementary Data File 1) on Illumina NovaSeq 6000 S4 150-bp PE lanes (median depth = 8.53X; mean depth = 8.21X), of which 52 were assigned to extoni and 85 pusillus (including 28 allopatric extoni and 12 allopatric pusillus) based on forecrown color.We also sequenced a total of 452 tinkerbird samples, including 123 of those sequenced with WGS, using double-digest restriction site associated DNA sequencing (ddRAD), 71 of which were included in a previous study 1 (Supplementary Data File 2).
From each sample, 0.4 μg of DNA was used for whole-genome sequencing.Libraries were generated using the NEBNext DNA Library Prep Kit following the manufacturer's recommendations.The PCR products were then purified using the AMPure XP system, analyzed for size distribution (by Agilent 2100 Bioanalyzer) and quantified using real-time PCR.The libraries obtained with this method were then sequenced with the Illumina Novaseq6000 platform.
The ddRAD sequencing procedure also involved multiple stages.The first stage includes digestion of DNA with restriction enzymes (SbfI and MseI).We prepared a first master mix with the restriction enzymes and then combined the master mix with individual DNA samples in each plate well.The combination master mix with DNA was then vortexed, centrifuged and incubated.The following stage was the ligation of barcoded adapters which included thawing the adaptors, preparation of a second master mix, the addition of this master mix to each well of the restriction digested DNA and finally the addition of a unique barcoded adaptor for each DNA sample.The third stage involved purification (removal of short fragments) using Agencourt AMPure.AMPure purification consists of: i) addition of AMPure beads to the DNA volume, ii) binding, iii) separation of the beads from the solution by placing the reaction plate onto a magnetic plate, iv) ethanol wash by adding a specific amount of ethanol to each well of the plate, v) addition of the elution buffer and vi) transfer of the eluent with DNA into a new plate.
Subsequently, the next stage includes the amplification of each individual sample in four separate PCR reactions.This PCR step used the Illumina PCR primers to amplify fragments that have our adapters and barcodes ligated onto the ends.To improve stochastic differences in PCR production of fragments in reactions, we ran four separate reactions per restriction-ligation product to then combine them.We prepared a third master mix with standard PCR reagents including Q5 high-fidelity polymerase, and added the combined master mix III to each plate well followed by the addition of the diluted restriction-ligation purified with AMPure mix.Next, we ran the PCR for 20 thermal cycles.Subsequently, to maximise the amount of double-stranded DNA and reduce DNA heterodimer content, we prepared another master mix (master mix IV) with additional PCR primers and dNTPs, added it to each PCR product and then ran one additional thermal cycle.In the following phase, we evaporated the PCR product to about ½ volume and pooled the PCR product from the four replicates and all samples into one tube.Then we ran the PCR product on an agarose gel and selected fragments between 400 and 500 bp in order to exclude fragments that consist mostly of adaptor sequence.We then conducted a final AMPure cleanup on the pooled library and shipped it to Novogene Inc. for 150 bp paired-end sequencing on an Illumina HiSeq X Ten platform.

Supplementary Methods 3 -Chromosome-level reference genome assembly and evaluation
PacBio subreads were assembled into contigs with FALCON 3 v. 1.3.0 and haplotypes further phased with FALCON-Unzip 4 v. 1.2.0.A set of primary contigs, representing the principal pseudo-haplotype, and a set of alternate haplotigs, representing the alternate haplotype, were generated.Purge_dups 5 was used on the primary contigs to remove any retained alternate haplotigs, overlaps, collapsed repeats and low-and high-coverage contigs.The alternate haplotigs were merged together in a single file and purged again, while the primary purged contigs were subjected to three steps of scaffolding.
The first scaffolding step was performed with Scaff10X v2.0-2.1 (https://github.com/wtsi-hpag/Scaff10X)using 10x linked reads to join proximal contigs into larger scaffolds.The resulting set of scaffolds were then scaffolded with Bionano DLS optical maps 6 using Bionano Solve v3.2.1 in non-haplotype assembly mode with a DLE-1 one enzyme non-nicking approach.The third and last step of scaffolding was performed aligning the Hi-C reads to the scaffold assembly using the Arima Genomics mapping pipeline (https://github.com/ArimaGenomics/mapping_pipeline)and scaffolded using Salsa2 HiC v. 2.2 7 .
To improve the assembly per-base accuracy (QV) 8 , the primary scaffolded assembly was merged with the alternate combined haplotigs and the mitogenome to prevent switches and NUMTs overpolishing 8,9 , and three steps of polishing were performed.The first step of polishing was with Arrow v. SMRTLink7.0.1 (PacificBiosciences) using PacBio CLR reads.The last two rounds of polishing were done with Longranges Align v. 2.2.2 and freebayes 10 v. 1.3.1 using 10x linked reads.Primary scaffolded assembly and alternate haplotigs were then separated again and called with the prefix 'bPogPus1', on the basis of the VGP guidelines for genome identifiers 8 .
Manual curation was then performed on the primary assembly to remove any remaining false duplications, correct structural assembly errors, and assign scaffolds to chromosome names.

Supplementary Methods 4 -Whole-genome variant calling
To process raw whole-genome sequences of 137 extoni and pusillus we first proceeded to trim Illumina adapter sequences using Cutadapt 22 v. 4.0 and merged pair-end reads to produce single interleaved FASTQ files.We then aligned the reads to the recently assembled reference genome of a female red-fronted tinkerbird (pusillus: NCBI BioProject: PRJNA637953) using BWAmem 23 v. 0.7.1 , and included read group information.We also aligned the raw 10x linked reads of the reference genome individual to the reference for inclusion in subsequent analyses.Subsequently, we used SAMtools 24 v. 1.9 to filter out mapped reads with mapping quality below 20 and sort the data by coordinate before converting SAM to BAM files.We then removed duplicate reads with Picard's MarkDuplicates tool 25 (version 2.23.1).We performed all subsequent processes in GATK 26 v.4.0.11.0.We used HaplotypeCaller for individual-level variant calling, we then combined samples using CombineGVCFs and performed joint genotyping with GenotypeGVCFs.This process resulted in a single VCF file which we then filtered using GATK's recommended hard filtering parameters (QUAL < 30.0,QD < 2.0, SOR > 3.0, FS > 60.0, MQ < 40.0, MQRankSum < -12.5 and ReadPosRankSum < -8.0) to produce two variant files, one containing SNPs and the other containing indels.We then used these variants to implement base quality score recalibration (BQSR), in which the base quality scores of the BAM files, obtained after removing duplicate reads, were adjusted in a two-step process.Firstly, we used BaseRecalibrator to build a recalibration model for each BAM file along with the variant files previously produced.Secondly, we used ApplyBQSR to adjust the base quality scores within each BAM based on the recalibration model, to produce a new recalibrated BAM file.
Additionally, we built a second model from the recalibrated BAMs and produced 'before-andafter' plots to visualize the effects of the recalibration process using BaseRecalibrator and AnalyseBQSR, respectively.Using the recalibrated BAMs, we then called variants on an individual level, combined samples and performed joint genotyping, to produce the final VCF file, using the same tools described above.We finally applied filters to create two high-quality datasets for subsequent analyses that have different data requirements.Using VCFtools 27 v. 0.1.16,we first created a filtered dataset with the following parameters: maximum missingness = 0.95; minor allele frequency = 0.03; minimum depth = 3.5 and maximum depth = 50, and then created a second dataset with the same filter parameters excluding the minor allele frequency (hereafter 'no MAF' dataset).The no MAF dataset also contained invariant sites, which were necessary to calculate statistics such as  and DXY, whereas we used the MAF filtered dataset in all the other analyses unless noted.Overall, from the initial pre-filtered vcf files of 65,812,270 and 631,918,066 SNPs (with invariant sites), the above-mentioned pipeline resulted in a final dataset of 19,602,343 SNPs for the MAF filtered dataset and 303,114,882 SNPs for the no MAF dataset.Upon further exploration of the whole-genome dataset, we removed two allopatric individuals of each species that we deemed had been mislabelled and another allopatric extoni which showed signs of contamination, possibly during DNA extraction, with the final dataset consisting of 135 individuals.

Supplementary Methods 5 -Double-digest restriction site associated DNA sequencing (ddRAD)
We demultiplexed the raw reads using process_radtags along with a list including barcodes associated with individual samples to produce single sample FASTQ files for each paired-end read.Subsequently, we removed PCR clones using clone_filter.We then mapped and aligned the reads to the P. p. pusillus reference genome (bPogPus1.pri.cur.20200514.fasta)using BWA-mem v. 0.7.1.Using SAMtools v. 1.9, we filtered out mapped reads with mapping qualities < 20 and sorted the reads by coordinate before converting from SAM to BAM.Following this, we called variants and genotyped using gstacks in Stacks (using --var-alpha 0.01 and --gt-alpha 0.01) 28 .
Lastly, population genetics statistics were obtained using populations in Stacks together with a pop.map file specifying the population of each sample.Ultimately, we produced a VCF file with 82,950 SNPs after filtering out variants that had a depth < 4, > 20% missing genotype data, and a minor allele frequency < 5%, resulting in a median depth = 12.85X and a mean depth = 13.92X.

Supplementary Methods 6 -Identifying candidate loci underlying IOI
GEMMA LMM (-lmm 1 flag) output was analyzed and interpreted based on significant SNPs that passed a significance threshold set by permutation, which was performed by running 999 LMMs in GEMMA after shuffling phenotypes at random in each run.We set two thresholds, with the most stringent being the mean value of the most significant SNP from each GEMMA run (-log10 P = 6.9, rounded to 7) and the less stringent threshold established by averaging the pvalue of the 10 most significant SNPs across each run (-log10 P = 6.1, rounded to 6).We investigated just those clusters with at least three significant SNPs.Clusters of significant SNPs were then located in the pusillus reference genome after aligning them to the zebra finch (Taenopygia guttata) genome using BLAST 29 .We then performed the reverse process: after identifying potential genes that underlie the expression of IOI, we extracted their exact gene position from the annotated zebra finch genome and used BLAST to locate the genes in the tinkerbird genome.Following this, from the GEMMA output we extracted beta values and standard error only for the SNPs in the gene regions where the significant SNPs mapped into in order to calculate their effect size.We then compared the effect sizes of each gene against 10,000 SNPs randomly chosen across the entire genome.This was done by estimating the standard error over the mean (SEM), a measure of the variance explained by each gene.
In GEMMA, we also analysed the genetic architecture of IOI using Bayesian sparse linear mixed models (BSLMM), for which we implemented a 40 million MCMC chain after a burn-in of 10 million, sampling every 4000 iterations.Upon validation of the models using trace plots, we assessed four hyperparameters: 1) the proportion of phenotypic variance explained by the genotypes (PVE), the proportion of variance explained by major effect loci (PGE), 3) the posterior number of SNPs that have a sparse effect (i.e. a detectable large effect) and 4) the posterior inclusion probability (i.e. the frequency a SNP is estimated to have a detectable large effect in the MCMC, a measure of the strength of association of a SNP with a phenotype).We also predicted IOI values in GEMMA using a leave-one-out cross validation approach in a (using the -predict 1 flag).This was achieved by excluding the phenotypic information of one individual at a time and using the remaining 86 individuals to predict its phenotype based on their genotype.We then estimated the predictive performance of the inferred IOI by fitting a linear model to assess the proportion of observed IOI variance explained by the predicted IOI values.

Supplementary Methods 7 -Estimating timing of admixture in the contact zone
Following Johnson et al., 2011 30 and vonHoldt et al., 2022 31 , we counted the number of switches in ancestry blocks from each individual, and then computed the number of generations since admixture from B = (0.04) * T*L*z(1 − z).Here, B is the number of ancestry switches, T the number of generations since admixture, L represents the genome length in cM (3782.04cM for autosomal length and 88.37 cM for the Z chromosome) and z represents the autosomal or Z proportion of pusillus ancestry.We finally converted the obtained generation times into calendar years by multiplying the estimated number of generations since the admixture event by the generation times estimated for tinkerbirds 32 .We warn, however, that given the lack of life history data for tinkerbirds, generation times for tinkerbirds may have been estimated and, as a consequence, estimated admixture times may be slightly biased towards more recent admixture times.

Supplementary Methods 8 -Hybrid song instability models and post hoc test
For the Double Hierarchical Generalized Linear Models we performed tests using 86 of the 87 individuals recorded in the contact zone.We removed one juvenile hybrid individual whose inconsistent song we deemed an age-related factor.Specifically, while this song sounded almost like a typical adult song, it had an alternating rhythm which in recent field expeditions we have found to be emitted by young birds, and given that this individuals was also admixed (AS28707, proportion pusillus ancestry ~0.30) we decided upon its removal since its inclusion could have biased our results.Model selection was then based on an information criterion approach.We compared the widely applicable information criterion (WAIC) for three models.The first model included only the linear effect of ancestry in the variance part to account for the possibility that one species may be more variable than another, as recently found in zebra finches 68 .The second model was fitted with just the quadratic effect of ancestry to test for non linear relationships between ancestry and IOI (i.e.instability of hybrid individuals), and a final full model included both terms.
Quadratic effects can occur due to effects occurring prior to the inflection point of the quadratic curve, effects after the inflection point of the quadratic curve, or both (i.e.here an increase or decrease in IOI variance with higher proportions of extoni ancestry, pusillus ancestry, or both).We therefore estimated the value of IOI variance at the statistical peak (IOI RWV peak, the y coordinate of the inflection point of the quadratic function) and its corresponding ancestry value (ancestry at peak, the x coordinate of the inflection point of the quadratic effect) 33 to then perform a pre-/post-peak analysis.With β0 being the intercept of the RWV part of the model (also called ), β1 the variance () estimate of the linear effect and β2 the  estimate of the quadratic effect, we calculated the IOI RWV peak as β0 -β1 2 /4β2 as well as the ancestry value at the peak (-β1/2β2) 34 .We then replaced the quadratic term in the original model with two new covariates: 1) a categorical 'pre-peak' variable (with post-peak ancestry values coded as "0" and pre-peak values coded as "1") and 2) the interaction between 'pre-peak' and linear ancestry.The effect of ancestry in this post hoc test represents the post-peak ancestry effect, whereas the estimates of the interaction term represent the pre-peak effect as a deviation from the post-peak ancestry effect 33,35 .The sum of the two represents the pre-peak ancestry effect.After scaling all variables to aid model convergence, we ran all the above models on five chains, each with 5500 iterations (500 warm-up) and maximal tree depth set to 15.

Supplementary Methods 9 -Quantification of playback effects
The Southern African yellow-(P.c. extoni) and red-fronted (P.p. pusillus) tinkerbird are elusive birds that tend to inhabit the canopy cover of woodland savannas and scarp forest, respectively.
It is therefore extremely difficult to attract them to the mist nets unless conspecific playbacks are used.This approach is especially advantageous at the beginning of the breeding season, when the territorial response of the breeding pairs is at its peak.However, we acknowledge that using playbacks may bias some acoustic parameters.For this reason, we attempted to quantify the effect that playbacks have on tinkerbird vocalizations by comparing measurements obtained from recordings pre-and post-playback.To investigate potential effects of playback, we recorded seven extoni individuals in allopatry before and after playing their conspecific song and compared their measurements using a two-tailed t-test.Even though we acknowledge that our sample size is limited, we observed a high degree of overlap between pre-and post-playback acoustic measurements, which are statistically non-significant (t = -0.55901,df = 8.825, p-value = 0.5901, Supplementary Fig. 12).   spectra-cn plot of the reference genome using the primary and alternate assembly, and 10x linked-reads.Similarly to panel (a), the x axis reports k-mer multiplicity and the y-axes their frequency in the reads.The curves discriminate the k-mers occurrence in the assemblies, while the black bar at the origin of the graph shows k-mers found only in the assemblies (putative assembly errors).Two peaks are visible: the red one is the haploid peak (k-mers found only one in the assemblies) and the blue is the diploid peak (k-mer found twice in the assemblies), occuring at half average coverage and at the average coverage, respectively.No duplicated k-mers are reported (green, purple and yellow curves).(c) Hi-C interaction heatmap for the chromosome-level reference genome.Both axes represent the assembly, while the diagonal reports the 3D proximity of interacting Hi-C read pairs among the assembly, with their strength being represented by color intensity.A scaffold is considered a well-assembled chromosome when the diagonal is strong and there are no off-diagonal interactions.Chromosomes are ordered in decreasing length and the name of the biggest chromosomes are shown at the bottom of the heatmap.S9).Mean IOI and its variance (95% CI) decrease with increasing pusillus ancestry in both genes after the peak.Pre-peak the effects differ between the two genes, with again a significant decrease in IOI with increasing pusillus ancestry at COQ8A, but stable mean IOI, yet with increasing variance, with increasing pusillus ancestry at NRXN1.Plasticity of synaptic adhesion and signaling between neurons 39,40 .Associated with speech disorders in humans [41][42][43] and migration in passerines 44,45 .rs4 rs6957543 rs91357874 7.031 NRXN1 Plasticity of synaptic adhesion and signaling between neurons 39,40 .Associated with speech disorders in humans [41][42][43] and migration in passerines 44,45  Plasticity of synaptic adhesion and signaling between neurons 39,40 .Associated with speech disorders in humans [41][42][43] and migration in passerines 44,45

Figure S1 .
Figure S1.Spectrograms of tinkerbird song and relationships between IOI, frequency and mass.Figure illustrating a) full spectrograms of the 3 recordings shown in Figure 1b as well as b) correlations between IOI and body mass (upper panel) and IOI and peak frequency (lower panel), with r values denoting Pearson's correlation.Conditional mean and 95% CI are shown in panels b) and c).

Figure S3 .
Figure S3.Genome assembly quality control.(a) Genomescope 2.0 37 k-mer profile generated from unassembled 10x linked-reads data of the reference genome individual.The x axis reports multiplicity of k-mers in the raw reads and the y axis reports their cumulative frequency.The figure also reports the estimated genome size, repeat content, duplication content and heterozygosity.(b) Merqury 38 spectra-cn plot of the reference genome using the primary and alternate assembly, and 10x linked-reads.Similarly to panel (a), the x axis reports k-mer multiplicity and the y-axes their frequency in the reads.The curves discriminate the k-mers occurrence in the assemblies, while the black bar at the origin of the graph shows k-mers found only in the assemblies (putative assembly errors).Two peaks are visible: the red one is the haploid peak (k-mers found only one in the assemblies) and the blue is the diploid peak (k-mer found twice in the assemblies), occuring at half average coverage and at the average coverage, respectively.No duplicated k-mers are reported (green, purple and yellow curves).(c) Hi-C interaction heatmap for the chromosome-level reference genome.Both axes represent the assembly, while the diagonal reports the 3D proximity of interacting Hi-C read pairs among the assembly, with their strength being represented by color intensity.A scaffold is considered a well-assembled chromosome when the diagonal is strong and there are no off-diagonal interactions.Chromosomes are ordered in decreasing length and the name of the biggest chromosomes are shown at the bottom of the heatmap.

Figure S4 .
Figure S4.IOI can be reliably predicted by the genomic dataset.Observed phenotype versus phenotype predicted with Bayesian sparse linear mixed model in GEMMA for 86 sympatric individuals.The proportion of variance explained by the predicted phenotype (adjusted R 2 ), Pearson correlation (r) and significance value from a linear model (P) are also shown.Gray shades indicate 95% CI for the predicted mean values from the linear model.

Figure S5 .
Figure S5.Diversity scans and recombination rates.Genomic scans for chromosome 25 illustration a) recombination rates estimated based on 26 allopatric extoni as well as genomic scans of b) absolute and c) relative differentiation between > 95% pure ancestry extoni and pusillus in sympatry.The gray shaded area spans the range of the significant SNPs.

Figure S9 .
Figure S9.Gene-specific SNPs cluster in three groups.PC1 to PC2 plots of the GEMMA significant SNPs that mapped into specific genes on chromosome 25.

Figure S10 .
Figure S10.Ancestry blocks and tinkerbird ancestry.Plot showing variation of a) the number of ancestry blocks and b) block size with proportion of pusillus ancestry in sympatric individuals.Red smooth lines represent the conditional means, whereas 95% CI are represented in gray.

Figure S11 .
Figure S11.Geographic clines of ancestry and IOI Geographic clines illustrate concordant trends in ancestry across the entire genome (red), within NRXN1 (blue), COQ8A (ochre) as well as IOI (black) across the contact zone.

Figure S12 .
Figure S12.IOI stability post-hoc analysis.Model effects of the post hoc test to investigate effects of ancestry within (A) NRXN1 and (B) COQ8A on the residual within-individual variance in inter-onset interval for 86 sympatric individuals.The dashed line demarks the ancestry value at the performance peak.For each panel, only the respective pre-(< than the ancestry value at the peak; left of dashed line) or post-peak effects (> than the ancestry value at the peak; right of dashed line) are relevant for interpretation.Mean IOI and its variance (95% CI) decrease with increasing pusillus ancestry in both genes after the peak.Pre-peak the effects differ between the two genes, with again a significant decrease in IOI with increasing pusillus ancestry at COQ8A, but stable mean IOI, yet with increasing variance, with increasing pusillus ancestry at NRXN1.

Figure S13 .
Figure S13.Tinkerbird sexing by sequencing depth.Distribution of the depth ratio of 452 tinkerbirds sequenced with ddRAD.

Figure S14 .
Figure S14.Relationship between IOI and forecrown feather hue in 87 tinkerbirds from the contact zone.Hue values < 0.2 indicate red forecrown coloration, Hue > 0.2 and < 0.4 reflects orange -amber forecrowns, whereas high values of Hue represent yellow forecrowns.The plot reveals that not all individuals with red forecrowns (Hue < 0.2) have a higher proportion of pusillus ancestry, whereas the individuals with fast songs (IOI < 0.5) are mostly pure pusillus.The r score indicates Pearson's correlation.Mean fitted values and their 95% CI are represented by the black line and gray shaded areas, respectively.

Figure S15 .
Figure S15.Effect of playback on tinkerbird IOI.Vocal response of seven individuals of P. chrysoconus extoni recorded in allopatry pre-and post-playback.The plot shows high overlap between IOI measurements obtained before and after exposing the singing birds to conspecific playbacks which show no statistical differences.Test statistics refer to a two-tailed t-test.The central line within each boxplot represents the median, with box limits representing the 1 st (Q1) and 3 rd quartile (Q3).Whiskers are calculated as Q1/Q3 -/+ 1.5 * (Interquartile range).

Table S2 .
Assembly statistics measured with gfastats

Table S3 .
QV and k-mer completeness measured with Merqury using a 21 k-mer length Meryl database.

Table S5 .
Complete list of the significant SNPs associated with Inter-Onset Interval (IOI) highlighted by the association test in GEMMA.

Table S6 .
Genomic features of the areas where the significant SNPs are located obtained from the zebra finch reference genome (bTaeGut1.4.pri) general feature format (GFF) file.For COQ8A, the start and end of the exon are shown within which the significant SNP is found.

Table S7 .
Two-sided t-test output table comparing the standard deviation over the mean, a measure of variance explained, between 10000 randomly selected SNPs across the genome and the four genes highlighted by the study.

Table S8 .
GLMM output showing the effect of vegetation density, represented by the Enhanced Vegetation Index, and autosomal proportion of pusillus ancestry on IOI for 87 sympatric individuals.Since no contrasts are applied, no adjustments for multiple comparisons are applied.

Table S10 .
GLMM output showing the effect of autosomal ancestry (a), Z-linked ancestry (b) and sex on the Inter-onset Interval.Significant terms are represented in bold.No adjustments for multiple comparisons were applied.

Table S11 .
DHGLM output on IOI instability of hybrid individuals (n = 86) at the contact zone (H1).Three models testing the effect of whole-genome ancestry (ac) are compared using WAIC scores, with the best model being the one with the lowest WAIC value.We then tested for the effects on IOI of varying ancestry in specific loci: d) NRXN1, e) COQ8A, f) MSH2 and g) ENAH.Significant terms are represented in bold.

Table S12 .
Table illustrating the post-hoc analysis of DHGLM models with a significant quadratic effect based on NRXN1 and COQ8A ancestry, with panel (a) showing peak values (and their estimates (β) back-transformed to the original scale for IOI) and (b) the results of the preand post-peak analysis.

Table S13 .
DHGLM output testing for character displacement (H2) in IOI as well as its variance in pure parental-species (> 99% ancestry of extoni or pusillus) individuals (n = 181).Significant terms are in bold.

Table S14 .
Output of a linear model (LM) showing a significant difference in the level of assortative mating between putatively pure extoni mothers and putatively pure pusillus mothers of females in the contact zone.The difference in the respective ancestry (Δ parental ancestry) of those mothers and the fathers that sired the females we sampled is greater in extoni than in pusillus, suggesting pusillus females mate assortatively with pusillus males but extoni females do not mate assortatively.No adjustments for multiple comparisons were applied.