Introduction

Evolution in eukaryotes is shaped by changes in multiple genomic components that differ in their modes of inheritance: mitochondrial DNA (mtDNA) is usually inherited through the matrilineal line, autosomes are inherited through both parental lines and sex chromosomes are inherited differentially depending on the sex of both parent and offspring (Avise 2000). During speciation, populations of a single species diverge genetically, often in isolation, with the strength and pattern of genetic differentiation varying across the different genomic components (reviewed in Coyne and Orr 2004; reviewed in Price 2008). This variation arises due to differences in each component’s rate of evolution as well as the degree to which each component contributes toward reproductive isolation and is resistant to gene flow between populations in secondary contact. Most commonly, speciating taxa will show clear differentiation in mtDNA (e.g., Hebert et al. 2004; Kerr et al. 2007), moderate differentiation in sex chromosomes (e.g., Thornton and Long 2002; Borge et al. 2005; Lu and Wu 2005; Harr 2006; Ruegg et al. 2014; Sackton et al. 2014), and comparatively modest differentiation across autosomes (Harr 2006; Nadeau et al. 2012; Irwin et al. 2018).

Mitochondrial DNA often shows strong differentiation between speciating populations (e.g., Hebert et al. 2004; Kerr et al. 2007). These patterns are partially driven by the mitochondrial genome’s uniparental inheritance and haploid nature which decrease its effective population size to ¼ that of autosomal DNA under the assumptions of neutrality and an equal sex ratio (Moore 1995). Combined with a relatively high mutation rate (Lynch et al. 2006), this low effective population size leads to the rapid fixation of mutations via genetic drift and strong mtDNA differentiation between taxa. Additionally, a lack of recombination across the mitochondrial genome can further contribute to mtDNA divergence through rampant genetic hitchhiking (Hill 2020). Here, positive selection for an adaptive mtDNA mutation causes the fixation of genetic variants across the mitochondrial genome resulting in strong differentiation between speciating taxa.

Sex chromosomes—specifically the Z (Borge et al. 2005; Ruegg et al. 2014; Sackton et al. 2014) and X chromosomes (Thorton and Long 2002; Lu and Wu 2005; Harr 2006)—tend to show more moderate levels of genetic differentiation between speciating populations, but often much greater differentiation than autosomes. This trend is driven by two important characteristics of Z/X chromosomes. First, on Z/X chromosomes, beneficial recessive mutations are immediately exposed to selection in the heterogametic sex allowing these advantageous variants to fix more rapidly than they would on autosomes (Meisel and Connallon 2013; Irwin 2018). Second, because Z/X chromosomes are inherited as either one or two copies depending on the sex of the offspring, these chromosomes have a lower effective population size than autosomes (Mank et al. 2010; Irwin 2018). This lower effective population size allows for the fixation of a greater number of neutral and slightly deleterious mutations due to less effective purifying selection and a larger role of genetic drift. Working in tandem, these two characteristics drive genetic divergence of Z/X chromosomes between speciating populations in a phenomenon known as the “faster Z/X effect” (Meisel and Connallon 2013; Mank et al. 2010; Irwin 2018).

Across autosomes, genetic differentiation between speciating taxa tends to be modest compared to mtDNA and sex chromosomes and is often characterized by “islands of differentiation”—genomic regions of high relative differentiation that appear within a background of low relative differentiation (Harr 2006; Nadeau et al. 2012; Hejase et al. 2020). Explanations for these “islands” often invoke reduced gene flow between speciating taxa during secondary contact (Wu 2001) and/or repeated bouts of selection prior to and following secondary contact (Cruickshank and Hahn 2014; Irwin et al. 2018). In the former scenario, differentiation islands are hypothesized to house the loci responsible for reproductive barriers between taxa making them resistant to the homogenizing influence of gene flow (Wu 2001). In the latter scenario, differentiation islands are described as genomic areas that have experienced recurrent selection or selective sweeps (Cruickshank and Hahn 2014; Irwin et al. 2018). These events reduce genetic diversity first in the ancestral population and then in both daughter populations producing relative peaks in genetic differentiation between taxa.

An increasing number of studies have reported differentiation patterns that vary from the above description of what is usually seen between speciating populations (Irwin et al. 2009; Yannic et al. 2010; Bryson et al. 2010). In a number of taxonomically widespread cases, mtDNA shows dramatically low differentiation between taxa compared to nuclear DNA (Toews and Brelsford 2012) and observable phenotypes. One hypothesis that may explain these observations is introgression of mitochondrial haplotypes from one population into another following secondary contact and hybridization. Introgression of mtDNA can occur neutrally due to various processes such as sex-biased dispersal or hybrid zone movement (Toews and Brelsford 2012); however, the extreme degree of discordance observed between mitochondrial and nuclear differentiation in certain systems (e.g., Alves et al. 2008; Irwin et al. 2009) suggests that mtDNA introgression may be driven by selection and occur adaptively by providing a fitness advantage to individuals in the receiving population.

Introgression of foreign mtDNA could provide fitness advantages through two major avenues. First, because variation in mtDNA has been associated with variation in mitochondrial efficiency under different abiotic conditions (e.g., Ballard et al. 2007), mtDNA introgression may allow the receiving population to better adapt within a changing or novel environment (Hulsey et al. 2016; Sloan et al. 2017; Hill 2019a). For example, mtDNA introgression has been tentatively associated with thermal adaptation in rabbits (Alves et al. 2008) and cichlids (Hulsey et al. 2016). The second way that mtDNA introgression can provide a fitness advantage is by replacing a mitochondrial genome with a high mutational load (Sloan et al. 2017; Hill 2019a). As described earlier, the low effective population size (Moore 1995) and high mutation rate (Lynch et al. 2006) of mtDNA can lead to the rapid fixation of mutations, including deleterious mutations. This tendency combined with genetic hitchhiking of deleterious mutations (Hill 2020), can lead to a high mtDNA mutational load which may decrease mitochondrial efficiency. Through introgression of a foreign mitochondrial haplotype with a lower mutational load, the receiving population may be able to regain greater mitochondrial function (Llopart et al. 2014; Hulsey et al. 2016; Sloan et al. 2017; Hill 2019a).

Adaptive mtDNA introgression presents a compelling hypothesis for the discordant differentiation patterns observed between some taxa (Irwin et al. 2009; Yannic et al. 2010; Bryson et al. 2010). This idea becomes even more interesting when we consider recent work that suggests strong coevolution between the mitochondrial and nuclear genomes (Hill 2019b). In most bilaterian animals, the mitochondrial genome consists of 37 genes. Because of this low gene content, mitochondrial processes are reliant on more than 1000 proteins encoded by “mitonuclear genes” across the nuclear genome (Calvo and Mootha 2010; Lotz et al. 2014). A few hundred of these mitonuclear proteins interact closely with products of the mitochondrial genome (i.e., RNA and proteins) or with the mitochondrial genome itself (Gershoni et al. 2009; Burton and Barreto 2012; Hill 2019b). Direct interactions between mitochondrial and mitonuclear products are necessary for mtDNA replication, transcription and translation (Diodato et al. 2014; Greber and Ban 2016) and for the formation of complexes I, III, IV and V in the electron transport chain (ETC; Hill 2019b) which performs oxidative phosphorylation in the mitochondrion (reviewed in Ernster and Schatz 1981). These associations between mitochondrial and mitonuclear products imply tight coevolution between the mitochondrial and nuclear genomes due to selection for mitochondrial efficiency: changes in one create selective pressure for compatible changes in the other (Gershoni et al. 2009; Burton and Barreto 2012; Hill 2019b).

Such coevolution suggests that hybridization and gene flow between speciating populations may be selected against due to recombination that separates coevolved mitochondrial and mitonuclear alleles exposing genetic incompatibilities (mitonuclear incompatibilities) within hybrid individuals. In this way, mitonuclear coevolution has the potential to contribute toward reproductive isolation and select against mtDNA introgression between taxa. Nevertheless, if the fitness advantages of mtDNA introgression outweigh the fitness disadvantages of mitonuclear incompatibilities, this process may still be able to occur. In that case, mtDNA introgression could select for co-introgression of coevolved mitonuclear alleles that together optimize mitochondrial function (Sloan et al. 2017; Hill 2019a). Evidence for mitonuclear co-introgression has been found in some systems (Beck et al. 2015; Morales et al. 2018; Wang et al. 2021), but the general significance of this process is still up for debate.

The yellowhammer (Passeriformes: Emberizidae: Emberiza citrinella) and pine bunting (E. leucocephalos) system (Fig. 1) is one that may capture the complex interplay between genetic differentiation and mitonuclear coevolution. Thought to have diverged in isolation during the Pleistocene glaciations (Irwin et al. 2009), this Eurasian avian sister pair is highly divergent in plumage and moderately divergent in song and ecology (Panov et al. 2003; Rubtsov and Tarasov 2017). Yet, despite their differences, yellowhammers and pine buntings hybridize extensively in a large and apparently expanding secondary contact zone in central and western Siberia (Panov et al. 2003, 2007; Rubtsov 2007; Irwin et al. 2009; Rubtsov and Tarasov 2017). Previous genomic work has identified discordant genetic differentiation patterns between allopatric yellowhammers and pine buntings (Irwin et al. 2009) as they are nearly identical in mtDNA but show moderate differentiation in nuclear AFLP (Amplified Fragment Length Polymorphism) markers. To explain these results, Irwin et al. (2009) suggested that mtDNA introgressed adaptively from one species into the other during a previous selective sweep, and this hypothesis was supported by several statistical tests performed on the mtDNA haplotype network. Alternative hypotheses of shared ancestral polymorphism or neutral introgression and mtDNA replacement were considered unlikely given the depth of differentiation observed in nuclear DNA compared to mtDNA, the inferred length of time the two taxa apparently evolved in allopatry, and the presently large population size. Based on mitonuclear theory, mtDNA introgression between yellowhammers and pine buntings could select for similar introgression at mitonuclear alleles to maintain mitochondrial function (Sloan et al. 2017; Hill 2019b). The resulting lack of mitonuclear incompatibilities between taxa as a result of co-introgression could facilitate their continued hybridization and hamper the build-up of reproductive barriers. With this opposition between the observed strong nuclear differentiation and the potential for mitonuclear co-introgression, the fate of the yellowhammer and pine bunting system remains uncertain. Depending on which way the scales tip, yellowhammers and pine buntings may continue to diverge and speciate or they may continue to hybridize and eventually collapse into one interbreeding population.

Fig. 1: Geographic distribution and phenotypic variation of sampled yellowhammers and pine buntings.
figure 1

A Map of sampling locations included in this study. Red numbers accompanying each location correspond to the sampling location numbers appearing in Table 1 which also describes sample sizes. Sampling locations may include multiple sites that appeared too close together to be shown in detail in this figure. Full details for the sites included in each sampling location can be found in Supplementary Table S1. Sampling location points are colored based on the taxon caught in each area: yellowhammer (Emberiza citrinella; yellow) and pine bunting (Emberiza leucocephalos; brown). The solid black line indicates the geographic range of the yellowhammer and the dashed black line indicates the geographic range of the pine bunting as described in Irwin et al. (2009). B Photos of plumage variation between yellowhammers and pine buntings. Each photo represents one of four phenotypic classes: PC, SC, PL and SL. Individuals with a PC and SC phenotypic class were grouped together as Emberiza citrinella and individuals with a PL and SL phenotypic class were grouped together as Emberiza leucocephalos. All photos are credited to Dr. Alexander Rubtsov.

Here, we present a large-scale comparison of DNA sequence variation across the nuclear genomes of allopatric yellowhammers and pine buntings. With this data, we address key questions regarding genetic differentiation and mitonuclear coevolution in this system. First, what is the degree and structure of genetic differentiation between yellowhammers and pine buntings across the nuclear genome? Earlier AFLP analyses identified clear differentiation of nuclear markers between yellowhammers and pine buntings (Irwin et al. 2009), but those results were not based on actual DNA sequences and only captured a small portion of the nuclear genome. Comparing patterns of differentiation across the nuclear genome enables better understanding of the extent of discordance between mitochondrial and nuclear differentiation and of which genomic regions show particularly high differentiation. Second, is there an over-representation of known mitonuclear genes within genomic regions showing genetic patterns suggestive of mitonuclear co-introgression? Support for mitonuclear co-introgression and a resulting lack of mitonuclear incompatibilities could explain the extensive hybridization seen between yellowhammers and pine buntings and implicate this process as a force that counters divergence and the evolution of strong reproductive barriers between groups. By answering these questions, we hope to provide insight on the evolutionary trajectory of yellowhammers and pine buntings (i.e., whether it is one of continued population divergence or of population merging) and also to explore how the interplay between genetic differentiation and mitonuclear coevolution influences the speciation process more generally.

Materials and methods

Sampling

We included 109 blood and tissue samples in this study: 53 phenotypic yellowhammers, 42 phenotypic pine buntings, and 14 other members of Emberizidae (one Emberiza aureola [yellow-breasted bunting], one Emberiza calandra [corn bunting], one Emberiza cioides [meadow bunting], one Emberiza hortulana [ortolan bunting], four Emberiza stewarti [white-capped bunting], and six Emberiza cirlus [cirl bunting]) to put variation between yellowhammers and pine buntings into a deeper phylogenetic context (Fig. 1A; Table 1; Supplementary Table S1). These samples were part of a larger dataset that included individuals collected within and near the yellowhammer and pine bunting sympatric zone in western and central Siberia (Fig. 1A). All individuals examined in the present study (referred to as “allopatric”) were found at least 400 km from the border of the sympatric zone to account for recent and continued expansion of this region (Panov et al. 2003, 2007; Rubtsov 2007; Irwin et al. 2009; Rubtsov and Tarasov 2017). All male birds included in this study had clear phenotypes consistent with these phenotypic classes described by Rubtsov and Tarasov (2017): yellowhammers of the pure citrinella (PC) or semi-citrinella (SC) phenotypes; and pine buntings of the pure leucocephalos (PL) or semi-leucocephalos (SL) phenotypes (Fig. 1B). Because females are phenotypically similar between species, we did not put them into phenotypic classes. A total of 91 of our samples were included in the AFLP analysis of Irwin et al. (2009) while 18 samples were examined for the first time as part of the present research.

Table 1 Geographic locations and sample sizes of the sites included in this study.

DNA extraction and genotyping-by-sequencing

DNA was extracted from samples using a standard phenol-chloroform method. We then divided the DNA samples into four genotyping-by-sequencing (GBS) libraries (Elshire et al. 2011). The 109 samples included in this study were sequenced together with 226 yellowhammer, pine bunting and hybrid samples collected near and within the sympatric zone (Nikelski et al. in prep). The libraries were prepared as per the protocol described by Alcaide et al. (2014) with modifications specified by Geraldes et al. (2019) except that we maintained a 300–400 bp DNA fragment size during size selection. Paired-end sequencing was completed by Genome Québec using an Illumina HiSeq 4000 system, producing more than 1.2 billion reads, each 150 bp in length, across the four GBS libraries.

Genotyping-by-sequencing data filtering

We processed the reads following Irwin et al. (2016; 2018), as summarized here. Reads were demultiplexed using a custom perl script designed by Baute et al. (2016). Next, reads were trimmed for quality using Trimmomatic version 0.36 (Bolger et al. 2014) with the parameters: TRAILING:3, SLIDINGWINDOW:4:10, MINLEN:30. Trimmed reads were aligned to the zebra finch reference genome (Taeniopygia guttata version 3.2.4; Warren et al. 2010) using the program BWA-MEM (Li and Durbin 2009) and a BAM file of this information was created for each individual using the programs Picard (http://broadinstitute.github.io/picard/) and SAMtools (Li et al. 2009). The zebra finch reference genome was utilized for its comprehensive gene annotation which was essential to conducting our mitonuclear analysis discussed below. BAM files were converted into GVCF files using the HaplotypeCaller command as part of GATK version 3.8 (McKenna et al. 2010). We then combined information from the individuals to create (1) a genome-wide “variant site” VCF file containing only variant site information with sufficient coverage, and (2) a series of chromosome-specific “all sites” VCF files which contained information on both variant and invariant sites with sufficient coverage.

To create the genome-wide “variant site” VCF file, we used the GenotypeGVCFs command in GATK to identify single nucleotide polymorphisms (SNPs) among the 336 individuals included in the pipeline. This command also converted the variant site information into a single VCF file of SNP information encompassing the entire nuclear genome. Using a combination of VCFtools (Danecek et al. 2011) and GATK, we filtered the VCF file to remove indels and non-biallelic SNPs. To avoid erroneous SNP calls based on poorly mapped reads, we discarded loci with QD < 2.0, MQ < 40.0, FS > 60.0, SOR > 3.0, or ReadPosRankSum < −8.0. Finally, loci with more than 60% missing genotypes were removed. The average coverage of variable sites in the resulting VCF file was 16.59.

To convert GVCF files into “all sites” VCF files, we similarly employed the GenotypeGVCFs command in GATK with the addition of the -allSites and -L flags to retain invariant sites and split the information into chromosome-specific files respectively. The resulting VCF files were filtered using VCFtools and GATK to remove indels, sites with more than two alleles, sites with more than 60% missing data, sites with MQ values lower than 20 and sites with heterozygosities >60% (to avoid potential paralogs). Some of these filters were more permissive than those applied to the “variant site” file in order to increase the resolution of windowed statistics whose calculation are discussed below.

Variant site analyses

The genome-wide “variant site” VCF file was analyzed using modified versions of the R scripts described in Irwin et al. (2018), and all of our analyses used R version 3.6.2 (R Core Team 2014). A total of 374,780 SNPs were identified among allopatric yellowhammers and pine buntings. For each of these SNPs, we calculated sample size, allele frequency, and Weir and Cockerham’s FST (Weir and Cockerham 1984). Genetic differentiation between yellowhammers and pine buntings was then visualized using a principal components analysis (PCA) generated with the pca command and the svdImpute method to account for any missing genomic data using the pcaMethods package (Stacklies et al. 2007). A Manhattan plot of FST values for 349,807 SNPs identified among allopatric yellowhammers and pine buntings with known genomic locations was created using the package qqman (Turner 2018).

Differentiation across the genome

To thoroughly investigate nuclear differentiation between allopatric yellowhammers and pine buntings, we performed further analyses on both variant and invariant loci within “all sites” VCF files using R scripts described in Irwin et al. (2018). We calculated Weir and Cockerham’s FST and between-group nucleotide differentiation (πB) comparing allopatric yellowhammers and pine buntings as well as within-group nucleotide variation (πW) for each population. All statistics were calculated within nonoverlapping windows of available sequence data across each chromosome. The first window was positioned at the “start” of each chromosome as described in the zebra finch reference genome (Warren et al. 2010) and each window contained 2000 bp of sequence data which included multiple small sequenced regions separated by unsequenced regions. We used a window size of 2000 bp of sequenced data rather than 10,000 bp (as in Irwin et al. 2018), to visualize narrow peaks in relative and absolute differentiation within our dataset. We hereafter refer to these windows as “genomic windows.”

We developed a new R script to calculate a Tajima’s D value (Tajima 1989) for each of the genomic windows. Values of Tajima’s D were used to identify areas of the genome where patterns of variation in yellowhammer and pine bunting populations deviated from a neutral model. Significantly negative Tajima’s D implies that the ratio of common versus rare alleles is lower than expected under neutrality, likely because of a selective sweep or population expansion following a bottleneck. Significantly positive Tajima’s D indicates that the ratio of common versus rare alleles is higher than expected under neutrality, potentially stemming from balancing selection or a rapid population contraction.

Phylogenetic comparison with other Emberizidae species

We employed whole-genome averages of πB between allopatric yellowhammers and pine buntings as well as among these focal species and six other Emberizidae species (Emberiza aureola, Emberiza calandra, Emberiza cioides, Emberiza cirlus, Emberiza hortulana and Emberiza stewarti) to estimate a phylogeny. A list of average πB values for each species pair was converted into a distance matrix and used to create an unrooted neighbor-joining tree. This tree was constructed using the ape package (Paradis and Schliep 2019) and the BioNJ algorithm (Gascuel 1997) with Emberiza aureola set as the outgroup (Alström et al. 2008). The phylogeny was created to provide further support for the sister relationship between yellowhammers and pine buntings that had previously been hypothesized using mitochondrial markers (Alström et al. 2008; Irwin et al. 2009) but that was questioned in some studies (Rubtsov and Opaev 2012). In creating this phylogeny, we were also able to investigate discordance between mitochondrial and nuclear differentiation within the system using a greater amount of nuclear data.

Signals of mitonuclear co-introgression

To investigate potential mitonuclear co-introgression between allopatric yellowhammers and pine buntings, we asked whether there was an association between a list of known mitonuclear genes and a list of genomic windows with statistical characteristics suggestive of introgression. Due to a lack of resolution in our genomic dataset, we were unable to perform statistical tests to directly characterize regions of introgression between taxa and, instead, identified genomic windows with low πB and low Tajima’s D in each taxon. Low πB indicates high similarity between the nucleotide sequences of allopatric populations, which would be expected if alleles introgressed from one taxon into the other. Low Tajima’s D suggests a past selective sweep within a population which would be expected if an adaptive allele introgressed from a separate taxon and swept throughout the receiving population. We call these windows “Between Population Putative Sweep Windows” (BPPSWs). Our quantitative criteria for a BPPSW were a Tajima’s D value within the lowest 5% of the available windowed values and a πB value within the lowest 30% of the available windowed values. The Tajima’s D threshold was kept relatively low to capture windows with particularly strong signals of selection that could be associated with an adaptive selective sweep and introgression between taxa while the πB threshold was left higher to identify an appreciable number of BPPSWs for analysis.

The mitonuclear genes that we investigated in our co-introgression analysis were all protein coding with products that interact directly with mtDNA or an immediate product of the mitochondrial genome. Theory predicts that these genes experience strong mitonuclear coevolution and that changes in mtDNA caused by introgression would drive selection for co-introgression of compatible alleles (Gershoni et al. 2009; Burton and Barreto 2012; Hill 2019b). By consulting various resources on mitochondrial and mitonuclear dynamics, we produced a list of these mitonuclear genes that encode: protein subunits of complexes I, III, IV, and V of the ETC, assembly and ancillary proteins involved in the formation of these ETC complexes, and proteins that are part of the transcription, translation or DNA replication machinery within mitochondria (Diodato et al. 2014; Greber and Ban 2016; Hill 2019b). After removing any genes that were not annotated in the zebra finch reference genome or that lacked a specific location on the reference genome, a total of 162 mitonuclear genes remained for analysis (Supplementary Table S2). Using a custom R script, we assigned each mitonuclear gene to the genomic window that minimized the absolute difference between the location of the mitonuclear gene centre and the location of the window centre. We then determined the number of mitonuclear genes that occurred within the BPPSWs identified for each taxon.

We conducted a Fisher’s Exact test for both yellowhammers and pine buntings to determine whether the proportion of mitonuclear genes within BPPSWs was significantly different from what would be expected based on the total proportion of protein coding genes appearing within these windows. To complete this analysis, a list of the 14,008 protein-coding genes annotated in the zebra finch reference genome (Warren et al. 2010) was compiled. We removed our 162 mitonuclear genes from this list and assigned the remaining 13,846 non-mitonuclear genes to genomic windows using the methodology described above. Because previous research has reported that mitonuclear genes are non-randomly distributed between autosomes and sex chromosomes in some systems (Drown et al. 2012; Dean et al. 2014)—a situation that could bias our mitonuclear analysis—we investigated whether the mitonuclear genes we examined were over- or under-represented on the Z chromosome. A Fisher’s Exact showed no significant difference in the proportions of mitonuclear versus non-mitonuclear genes positioned on chromosome Z versus on autosomes (p value = 0.7182). Following this non-significant result, we proceeded with comparing the proportion of mitonuclear and non-mitonuclear genes appearing in yellowhammer and pine bunting BPPSWs.

Following the completion of the above analyses, we performed additional analyses to determine whether windows containing mitonuclear genes differed significantly in average FST, πB, πW and Tajima’s D from windows not containing mitonuclear genes. Average values were calculated based on 155 genomic windows with mitonuclear genes and 7032 genomic windows without. A series of Welch’s t-tests were used to compare these values with results shown in detail in Supplementary Table S3.

Results

When comparing allopatric yellowhammers and pine buntings, we identified 374,780 variable SNPs within our “variant site” VCF file and 13,703,455 invariant and 699,122 variant sites across thirty autosomes and the Z chromosome within our “all sites” VCF files (the different numbers of variable sites result from differences in filtering parameters; see Methods). In the latter “all sites” files, we designated a total of 7187 genomic windows (of 2000 sequenced bp each) across the genome, with each window covering an average distance of about 139 kilobases (with the reference genome being 1.0 Gb in length).

Phylogenetic comparison with other Emberizidae species

An unrooted neighbor-joining tree of average πB values between yellowhammers, pine buntings and six other Emberizidae species (Fig. 2) depicted similar species relationships as estimated previously using mitochondrial markers (Alström et al. 2008; Irwin et al. 2009). Relative branch lengths were also similar, with the major exception being the branch length between yellowhammers and pine buntings which was much longer in our analysis using nuclear DNA. To put this into context, we determined the relative genetic distance between yellowhammers and pine buntings versus the genetic distance between E. stewarti and either member of the yellowhammer/pine bunting clade for our nuclear phylogeny and for the previously calculated mitochondrial phylogeny (Irwin et al. 2009). The nuclear ratio was 11.4 times greater than the mitochondrial ratio which corroborates the presence of strong discordance in genetic differentiation between yellowhammers and pine buntings. These results also support the hypothesis of an extended period of divergence between yellowhammers and pine buntings followed by adaptive mtDNA introgression.

Fig. 2: Unrooted neighbor-joining tree of Emberizidae species constructed based on average absolute between-population nucleotide diversity (πB).
figure 2

Sample sizes for each species are as follows: E. aureola = 1, E. calandra = 1, E. cioides = 1, E. hortulana = 1, E. cirlus = 6, E. stewarti = 4, E. citrinella = 53 and E. leucocephalos = 42.

Overall genetic differentiation

Based on 374,780 SNPs, the genome-wide FST estimate was 0.0232 between allopatric yellowhammers and pine buntings. Despite this low average, a PCA based on the same SNP genotypes separated yellowhammers and pine buntings into tight genetic clusters (Fig. 3). Two pine buntings were outliers along PC1, while the remaining yellowhammers and pine buntings separated into distinct groups mainly along PC2. Further investigation into these outliers revealed that they were males from the same location, but a kinship analysis completed as part of a separate study did not find close kinship between the two pine buntings that could explain their position (Nikelski et al. in prep). We also examined the PC1 loadings and found that the signal for PC1 position was broadly distributed across the nuclear genome, rather than being concentrated in a few highly influential regions (Supplementary Fig. S1). We then temporarily removed one of the outliers and re-ran the PCA. This caused the other outlier to fall into the pine bunting cluster, but revealed a further yellowhammer outlier (Supplementary Fig. S2). Removal of this yellowhammer outlier in addition to one member of the pine bunting outlier pair in turn revealed another yellowhammer outlier (Supplementary Fig. S3). It is unclear what is responsible for these outliers, but the distinct yellowhammer and pine bunting genetic clusters remained intact in all the PCAs considered.

Fig. 3: PCA of genetic variation between allopatric yellowhammers (yellow; n = 53) and allopatric pine buntings (brown; n = 42), based on 374,780 genome-wide SNPs.
figure 3

PC1 and PC2 explain 3.6% and 2.9%, respectively, of the variation among individuals.

Differentiation across the genome

Relative differentiation between allopatric yellowhammers and pine buntings was highly heterogeneous across the nuclear genome with peaks in FST seen on most of the larger chromosomes (Figs. 4, 5, 6A; Supplementary Figs. S4, S5). Chromosome Z in particular showed a large peak in FST with several SNPs possessing values close to one. In fact, FST for the Z chromosome was 0.1246—more than five times larger than the genome-wide FST.

Fig. 4: Relative differentiation (FST) of 349,807 genome-wide SNPs identified among allopatric yellowhammers (n = 53) and allopatric pine buntings (n = 42), with chromosomes represented with alternating black and gray.
figure 4

Narrow regions of elevated differentiation can be seen on many autosomes, and there are broad regions of high differentiation on the Z chromosome.

Patterns of between-group nucleotide differentiation (πB) and within-group nucleotide variation (πW) were also heterogenous across the genome and comparable to each other in magnitude: genome-wide πB = 0.0041; genome-wide πW for both taxa = 0.0040 (Fig. 5; Supplementary Fig. S4). Because between-group and within-group nucleotide differentiation are so intimately related in their evolution and calculation, it is expected that windowed averages of these two statistics will show a highly positive relationship. In fact, most windowed πB and πW averages fell near a 1:1 association line (Spearman’s Rank Correlation: 0.9943, p < 2.2 × 10−16; Fig. 6B) which is equivalent to no or little differentiation. However, some genomic windows showed much reduced πW compared to πB; these were the windows with high FST. In addition, we detected a weak but highly significant negative correlation between the windowed averages of FST and πB (Spearman’s Rank Correlation: −0.1196, p < 2.2 × 10−16; Fig. 6C) as is hypothesized if peaks in relative differentiation are products of repeated selective events (Cruickshank and Hahn 2014; Irwin et al. 2018).

Fig. 5: Patterns of genetic variation comparing allopatric yellowhammers (n = 53) and allopatric pine buntings (n = 42) across chromosomes 2, 5 and Z shown as examples of general genomic patterns in this system (similar plots for all chromosomes can be found in Supplementary Fig. S4).
figure 5

Relative nucleotide differentiation (FST), absolute between-population nucleotide differentiation (πB), absolute within-population nucleotide variation (πW) and Tajima’s D (TajD) are shown as 2000 bp windowed averages across each chromosome. FST and πB are shown as purple lines to indicate that values were calculated as a comparison between allopatric yellowhammers and pine buntings. πW and TajD are shown as two separate lines (yellow = yellowhammers, brown = pine buntings) to indicate that values were calculated separately for each population.

Fig. 6: A summary of genetic statistics calculated for allopatric populations of yellowhammers and pine buntings.
figure 6

A A histogram of average relative differentiation (FST) values calculated for windows of 2000 sequenced nucleotides across the nuclear genome when comparing allopatric yellowhammers (n = 53) with allopatric pine buntings (n = 42). A truncated version of this plot that shows the high FST tail more clearly can be seen in Supplementary Fig. S5. B Mean absolute within-group nucleotide variation (πW) of allopatric yellowhammers and allopatric pine buntings plotted against absolute between-group nucleotide differentiation (πB). Each dot represents the average value taken from a window of 2000 sequenced bases across the nuclear genome. The black line indicates where mean within-group nucleotide variation equals between-group nucleotide differentiation. Increasing values of relative differentiation (FST) calculated for each window are shown in darker shades of blue. C Association between relative differentiation (FST) and absolute between-group nucleotide differentiation (πB) of allopatric yellowhammers and allopatric pine buntings. Each black dot represents average values calculated from a 2000 bp window of sequenced data. A cubic spline fit between the variables is shown as a purple line.

Finally, we found that Tajima’s D varied across the genome but was mostly negative (Fig. 5; Supplementary Fig. S4), consistent with a history of population growth and/or selective sweeps. The average genome-wide Tajima’s D was similar between populations: −1.377 for yellowhammers and −1.335 for pine buntings.

Signals of mitonuclear co-introgression

Of the 7187 genomic windows identified across the nuclear genome, we classified 244 (3.4%) as BPPSWs within yellowhammers and 222 (3.1%) as BPPSWs within pine buntings. Average values of πB and Tajima’s D in yellowhammer BPPSWs were 0.0016 and −2.3751 respectively, and 0.0019 and −2.3369 in pine bunting BPPSWs respectively. In non-BPPSWs, the average values of πB and Tajima’s D were 0.0042 and −1.3416 in yellowhammers and 0.0042 and −1.3031 in pine buntings. Of the BPPSWs identified in yellowhammer and pine bunting populations, 71 were shared between the taxa. It should be noted that sharing of some BPPSWs is expected given that the contribution of πB to window selection was identical for both taxa (in contrast, Tajima’s D was calculated separately for yellowhammers and pine buntings).

Our examination of the gene content within yellowhammer BPPSWs revealed that they contained a higher percentage (7.4%) of mitonuclear genes (12 of the 162 genes considered) than of non-mitonuclear genes (4.1%; 574 of the 13,846 genes considered). This difference was statistically significant (Fisher’s Exact test: p = 0.04714), providing evidence for mitonuclear genes preferentially appearing within yellowhammer BPPSWs. Pine bunting BPPSWs contained 4.3% of the mitonuclear genes (7 of the 162 genes considered) and 3.3% of the non-mitonuclear genes (455 of the 13,846 genes considered), a difference that was not statistically significant (Fisher’s Exact test: p = 0.3806).

The twelve mitonuclear genes that appeared within yellowhammer BPPSWs were: APOPT1, COX5A, COX17, LARS2, MRPL1, MRPL27, MRPL32, MRPS7, MRPS25, NDUFC1, SSBP1 and UQCR11 (Table 2). Five of these genes encode protein subunits of the mitoribosome, three encode structural subunits of the ETC, two encode assembly factors of the ETC, one encodes a mitochondrial aminoacyl-tRNA synthetase and one encodes a single-stranded DNA-binding protein involved in mtDNA replication. Two genes each appear on chromosomes 2, 4 and 18 while the rest appear on separate chromosomes. Interestingly, three of the five putatively introgressed genes associated with the ETC are specifically associated with complex IV.

Table 2 Identities, chromosomal locations, windowed Tajima’s D values and functions of mitonuclear genes that appeared within 244 yellowhammer BPPSWs and within 222 pine bunting BPPSWs.

The seven mitonuclear genes that appeared within pine bunting BPPSWs were: ATP5H, COX5A, MRPL2, MRPL58, MRPS7, MRPS14 and NDUFB4 (Table 2). Four of these genes encode protein subunits of the mitoribosome and three encode structural subunits of the ETC. Three genes appear on chromosome 18—with two genes sharing the same genomic window—while the rest of the genes appear on separate chromosomes. The COX5A and MRPS7 genes were found in both yellowhammer and pine bunting BPPSWs.

In comparisons of genomic windows that did versus did not contain mitonuclear genes, we found that average FST, πB, πW and Tajima’s D did not differ significantly between these two categories (Supplementary Table S3). Only yellowhammer πW approached significance (p = 0.05106) with average πW being lower within genomic windows containing mitonuclear genes (0.00373) compared to those not containing mitonuclear genes (0.00400). This finding is consistent with our results showing that mitonuclear genes appear preferentially in yellowhammer BPPSWs, as selective sweeps of mitonuclear genes into a population would lower nucleotide diversity in the associated genomic windows.

Discussion

Yellowhammers and pine buntings show negligible mtDNA differentiation (Irwin et al. 2009) but are well differentiated phenotypically (Panov et al. 2003; Rubtsov and Tarasov 2017) and moderately differentiated in AFLP nuclear markers (Irwin et al. 2009). In the wake of this discordance in differentiation, Irwin et al. (2009) proposed that mtDNA adaptively introgressed between taxa following a period of allopatric isolation; however, due to the limited information provided by AFLP analyses, the extent of discordance and of nuclear differentiation between yellowhammers and pine buntings remained unknown lending some uncertainly to this hypothesis. In the present study, analysis of genetic variation identified heterogeneous nuclear differentiation between allopatric populations with strong differentiation peaks that separated taxa into distinct genetic clusters. This result supports yellowhammers and pine buntings experiencing a period of separate evolution followed by hybridization within their current contact zone in western and central Siberia (Panov et al. 2003; 2007; Rubtsov 2007; Rubtsov and Tarasov 2017). Our phylogenetic analysis showing a longer branch length between yellowhammers and pine buntings based on nuclear markers—when compared to the very short branch length in a phylogeny based on mtDNA—also corroborates a hypothesis of recent mtDNA introgression and mitochondrial haplotype replacement in this system likely driven by selection (Irwin et al. 2009). In addition, our finding that mitonuclear genes are over-represented in BPPSWs in yellowhammers lends support for mitonuclear co-introgression from pine buntings into yellowhammers.

Though genetically distinct, the genome-wide FST between allopatric yellowhammers and pine buntings (0.0232) was comparable to or sometimes lower than the averages seen between avian subspecies (e.g., subspecies of barn swallow: 0.017–0.026 (Scordato et al. 2017); myrtle warbler and Audubon’s warbler: 0.077–0.106 (Irwin et al. 2018); yellow- and red-shafted northern flickers: 0.098 (Manthey et al. 2017). This low genome-wide FST contrasts with the moderate FST averages reported from an analysis of AFLP markers performed on the same populations: 0.078 based on allele frequencies and 0.140 based on band frequencies (Irwin et al. 2009). However, the present study also revealed that relative differentiation was highly heterogeneous across the nuclear genome with FST peaks on various chromosomes. It is possible that the previous AFLP analysis captured a disproportionate number of loci within these differentiation peaks, thereby inflating FST estimates. This comparison highlights the caution that should be taken when interpreting genome-wide averages, because highly variable genetic differentiation landscapes can cause large variability in FST estimates when they are based on a limited and non-random sample of loci.

The FST peaks seen between yellowhammers and pine buntings on larger autosomes and most significantly on the Z chromosome are consistent with the “islands of differentiation” often noted in comparisons of closely related taxa (Harr 2006; Nadeau et al. 2012; Irwin et al. 2018). In contrast to these islands, large regions of close similarity in πB and πW suggests high gene flow between taxa across much of the nuclear genome. This scenario is consistent with the observed extensive hybridization between these taxa (Panov et al. 2003, 2007; Rubtsov 2007; Rubtsov and Tarasov 2017). Nevertheless, the high FST islands—those with much reduced πW compared to πB—can be explained by divergent selection causing low gene flow in these regions. It is unlikely that this pattern is the result of genetic drift over an extended period of geographic separation, as this would result in most genomic regions deviating slightly from πB = πW congruence rather than the observed pattern of extreme heterogeneity. Instead, this trend suggests that selection acted in a way that lowered πW relative to πB within “islands of differentiation”. Considering that high FST regions were associated with relatively low values of πB, we propose that differentiation islands in this system are most consistent with a model invoking repeated bouts of selection that lower nucleotide diversity (Cruickshank and Hahn 2014; Irwin et al. 2018). A sweep-before-differentiation model (Irwin et al. 2018), where FST peaks are produced by adaptive selective sweeps between populations followed by adaptive selection at the same regions in local populations, is particularly in line with the extensive hybridization presently observed between yellowhammers and pine buntings (Panov et al. 2003, 2007; Rubtsov 2007; Rubtsov and Tarasov 2017).

Of the “islands of differentiation” identified between taxa, the tallest and widest was found on the Z chromosome. Greater differentiation on the Z chromosome compared to autosomes is a common observation when comparing closely related species (Borge et al. 2005; Ruegg et al. 2014; Sackton et al. 2014) and is consistent with stronger positive selection and with less efficient purifying selection on this chromosome (Mank et al. 2010; reviewed in Meisel and Connallon 2013; reviewed in Irwin 2018). However, the large regions of the Z chromosome that have FST values near zero suggest that additional factors are involved in producing this island of differentiation.

One possible explanation for the large differentiation island on chromosome Z could be that it corresponds with an area of low recombination—a region of connected loci that tend to be inherited together, leading to linked selection of nearby loci. Strong divergent selection acting on one SNP within this region would act similarly on loci that are linked to it such that a wide, highly divergent genomic block would become fixed and appear as an “island” between taxa (reviewed in Cutter and Payseur 2013). Areas of low recombination and linkage are often associated with inversion polymorphisms (reviewed in Smukowski and Noor 2011) as different orientations of an inversion experience little successful recombination (reviewed in Kirkpatrick 2010). Further research is warranted to characterize the nature of this differentiated region as well as whether it houses an inversion polymorphism.

While numerous “islands of differentiation” were observed between yellowhammers and pine buntings implying moderate genetic divergence, mtDNA introgression has the potential to homogenize the nuclear genomes of these taxa at mitonuclear genes by selecting for co-introgression of compatible alleles (Beck et al. 2015; Sloan et al. 2017; Morales et al. 2018). Because our genomic dataset lacked the resolution to directly test for introgression between taxa, we identified genomic windows with low between-population differentiation and strong signals of selective sweeps in each population (BPPSWs) as a proxy and tested for an association between these regions and mitonuclear genes predicted to coevolve with mtDNA. We found statistical support for mitonuclear genes appearing preferentially in yellowhammer BPPSWs, but no such statistical support in pine bunting BPPSWs. Specifically, our analysis showed that the proportion of mitonuclear genes within yellowhammer BPPSWs was 1.7 times higher than within pine bunting BPPSWs. In other words, mitonuclear genes showing highly similar nucleotide sequences when comparing allopatric populations were more likely to have selectively swept through the yellowhammer population than the pine bunting population. This could suggest that mitonuclear gene introgression, if it occurred, proceeded in the direction of pine buntings into yellowhammers.

We noted additional support for mitonuclear gene introgression in this system in the fact that mitonuclear genes appearing within yellowhammer and pine bunting BPPSWs encode proteins with interrelated mitochondrial functions. Three of the mitonuclear genes within yellowhammer BPPSWs and three within pine bunting BPPSWs encode structural subunits of the ETC. Four of the five ETC complexes are made up of subunits encoded by both the nuclear and mitochondrial genome (Hill 2019b); correct fit between differentially encoded subunits is essential for the flow of electrons and protons across the ETC during oxidative phosphorylation. To put this in perspective, changing even a single amino acid in one subunit can significantly disrupt its ability to interact with other subunits within an ETC complex (e.g., Gershoni et al. 2014). Because of the tight interactions within complexes and the consequences of subunit incompatibility, introgression of mtDNA is expected to select for co-introgression of mitonuclear genes encoding ETC structural subunits. Such co-introgression has been detected between differentially adapted populations of eastern yellow robin where mtDNA introgression between populations was followed by similar introgression of mitonuclear genes encoding subunits of complex I (Morales et al. 2018) and between different species of Drosophila where introgression and replacement of the mtDNA of one species during hybridization selected for co-introgression of genes that encode subunits of complex IV (Beck et al. 2015).

Of the ETC complexes, complex IV showed the strongest signal of potential co-introgression in the yellowhammer and pine bunting system. Three of the genes within yellowhammer BPPSWs and one gene within pine bunting BPPSWs were associated with this complex. As well, the gene COX5A—a structural subunit of complex IV—appeared in both sets of BPPSWs. It is unlikely and inconsistent with mitonuclear theory that this gene introgressed in both directions in the yellowhammer and pine bunting system; however, it is possible that COX5A adaptively swept in both populations which, due to the nature of BPPSW identification, would cause it to appear in both sets of BPPSWs. In such a situation, we can predict that a particularly adaptive mitonuclear allele may have appeared in one species and swept to high frequency before co-introgressing and sweeping across the other species following mtDNA introgression. Interestingly, the COX5A gene was one of the subunits that co-introgressed in the Drosophila example discussed above (Beck et al. 2015) lending some support to its particular importance to mitonuclear coevolution. More generally, complex IV is often used as a model for studying mitonuclear interactions due to its distinctive structure where a core of mitochondrial-encoded subunits is surrounded by nuclear-encoded subunits (Saraste 1999). With such an excess of mitonuclear interactions, incompatibility involving complex IV has been detected in several systems including within primate xenomitochondrial cybrids (Barrientos et al. 2000) and between different species of Drosophila (Sackton et al. 2003). Furthermore, work by Osada and Akashi (2012) has provided strong evidence for compensatory coevolution between mitonuclear genes related to complex IV—including COX5A—and mtDNA among primates at interacting amino acids of differentially encoded subunits. Altogether, these results suggest a crucial role for complex IV in mitonuclear coevolution and co-introgression between pine buntings and yellowhammers.

Another group of mitonuclear genes that appeared consistently within the yellowhammer and pine bunting BPPSWs were those encoding subunits of the mitoribosome (five in yellowhammer BPPSWs and four in pine bunting BPPSWs). MRPS7, like COX5A, appeared in both yellowhammer and pine bunting BPPSWs suggesting that this gene may have adaptively swept through both taxa. Unlike the protein-protein interactions occurring within ETC complexes, mitonuclear interactions in the mitoribosome are between nuclear-encoded proteins and mitochondrial-encoded RNA (Hill 2019b). Protein subunits associate closely with rRNA during the formation of a mitoribosome, but also interact with mRNA and tRNA during the synthesis of mitochondrial proteins (Greber and Ban 2016). Currently, research is limited on the extent and importance of interactions between mitoribosomal subunits and mitochondrial RNA (but see: Sloan et al. 2014; Barreto et al. 2018). However, the fact that interactions between components are extensive and necessary for the synthesis of mitochondrial proteins suggests close coevolution between mtDNA and genes encoding mitoribosomal subunits that could strongly select for mitonuclear co-introgression following mtDNA introgression.

Our mitonuclear analysis provides support for mitonuclear gene introgression in the direction of pine buntings into yellowhammers; nevertheless, we acknowledge that there are shortcomings in this methodology and that our results must be interpreted with caution. Most importantly, our BPPSWs are an imperfect measure of introgression between pine buntings and yellowhammers that utilize diversity and differentiation statistics as a proxy for this process. As such, it is possible that the mitonuclear genes tend to occur in genomic regions that have other characteristics that are associated with BPPSWs, such that the association between mitonuclear genes and BPPSWs is not directly causal. Knowing the direction of historical mtDNA introgression in this system would allow us to determine whether the direction of mitonuclear gene introgression proposed here matches the direction of mtDNA introgression (Gershoni et al. 2009; Burton and Barreto 2012; Hill 2019b). Unfortunately, introgression has apparently eliminated the original mitochondrial haplotype in the receiving population (Irwin et al. 2009) such that determining the direction of mtDNA introgression between yellowhammers and pine buntings is not possible using available data and methodology. Despite these weaknesses in our analysis, we found what we believe is compelling support for mitonuclear gene introgression from pine buntings into yellowhammers which warrants further investigation.

To summarize, yellowhammers and pine buntings are sister taxa that are divergent in appearance, song, and ecology (Panov et al. 2003; Rubtsov and Tarasov 2017) yet vary greatly in their genomic differentiation from virtually none (at the mitochondrial genome) to nearly fixed (the differentiation peak on the Z chromosome). These patterns are best explained by a period of differentiation while geographically separated, followed by hybridization and mtDNA introgression. We found tentative support for mitonuclear gene introgression (compared to introgression of other genes) from pine buntings into yellowhammers, as well as a tendency for mitonuclear genes encoding structural components of the ETC and the mitoribosome to appear within the BPPSWs of both taxa. One intriguing possibility is that mitonuclear co-introgression has resulted in reduced mitonuclear incompatibilities between yellowhammers and pine buntings (Gershoni et al. 2009; Burton and Barreto 2012; Hill 2019b), thereby contributing to their current extensive hybridization within central Siberia (Panov et al. 2003, 2007; Rubtsov 2007; Rubtsov and Tarasov 2017). This idea leads into the question—which can be addressed through a close examination of genomic variation within the hybrid zone—of whether the observed islands of differentiation on the Z and autosomes are sufficient in stabilizing yellowhammers and pine buntings as separate entities where they hybridize, or whether the two taxa are gradually merging into a single species.