Genetic variation, either from mutation or standing variation, is crucial for adapting to novel environments; however, adaptation from standing genetic variation is likely to lead to faster evolution compared to waiting for de novo mutations (Barrett and Schluter 2008; Hermisson and Pennings 2005). Such standing variation can originate from within the same population, from other populations, or even different species, through adaptive introgression (i.e., standing introgression variation; SI, sensu (Jagoda et al. 2017). In animals, numerous examples have highlighted the importance of adaptive introgression in providing alleles important for a wide variety of environmental pressures, including resistance to insecticides (Clarkson et al. 2014; Song et al. 2011), rodenticides (Liu et al. 2015; Norris et al. 2015), pollution (Oziolor et al. 2019), high-altitude adaptation (Huerta-Sánchez et al. 2014), seasonal camouflage (Jones et al. 2018), aposematic mimicry (Pardo-Diaz et al. 2012), and beak size (Grant and Grant 2016). In plants, this phenomenon is even more widespread (Morjan and Rieseberg 2004; Rieseberg 2011; Suarez-Gonzalez et al. 2018; Whitney et al. 2015).

Hemoglobin (Hb) has been a flagship molecular marker in studies of adaptation to high altitude (Storz 2019). The globin-gene families provide a source of variation from which adaptation has occurred both through standing variation and new mutations (Galen et al, 2015; Hoffmann et al. 2012; Natarajan et al, 2016; Projecto-Garcia et al. 2013; Storz 2016a). At high altitudes, the low partial pressure of oxygen (PO2) results in a reduction in O2 saturation of arterial blood (Beall 2001). Waterfowl and other bird species (Natarajan et al. 2015; Projecto-Garcia et al. 2013) have featured prominently in studies of Hb function and evolution in response to colonization of high-altitude regions. Specific amino-acid substitutions have been identified that increase O2-binding affinity of the HbA (major) isoform by affecting α/β-subunit interactions and sensitivity to allosteric cofactors (Perutz 1983; Weber et al. 1993). In most birds, there is a consistent trend toward increased Hb-O2 affinity in the high-altitude lineages (Storz 2016b), a pattern that has been experimentally validated using both naturally purified Hb and recombinant Hb (Natarajan et al. 2015; Projecto-Garcia et al. 2013). In addition, a high degree of parallel amino-acid substitution is also observed in the α- and β-subunit genes in many avian lineages, suggestive of a strong role for selection on standing variation in addition to the role of de novo mutation (Natarajan et al. 2015). Finally, recent work has implicated genes in the hypoxia-inducible factor (HIF) pathway (Fig. 1) as being important for high-altitude adaptation in some of the same species and other high-altitude species (Cho et al. 2013; Pamenter et al. 2020; Qiu et al. 2012; Qu et al. 2013; Scheinfeldt and Tishkoff 2010; Wang et al. 2014). In waterfowl, for example, variants in the same exon of EPAS1 (HIF-2α) show a striking pattern of parallel evolution qualitatively similar to that seen in Hb (Graham and McCracken 2019). However, it is unknown to what degree any of these variants were due to parallel evolution involving de novo mutations or due to adaptive introgression.

Fig. 1: A simplified schematic showing how HIF-pathway-induced expression of erythropoietin (EPO) in the kidney stimulates expression of red blood cells and hemoglobin (Hb).
figure 1

The HBA and HBB genes are expressed in the bone marrow and other tissues during proerythroblast development (Gassmann and Muckenthaler 2015).

Here we use statistical analyses of genomic DNA sequence data to evaluate evidence for adaptive introgression from standing interspecific variation in the β-globin and HIF-pathway genes (31 genes in all) of two species of Andean ducks, the speckled teal (Anas flavirostris) and yellow-billed pintail (A. georgica), which are widespread across southern South America and each have resident populations at high altitude. Ultimately, we show that linked nonsynonymous genetic variations in two β-globin genes (HBE, HBB) have likely introgressed between high-altitude populations of these two species. We furthermore show that this introgression likely occurred into the high-altitude yellow-billed pintail population from the speckled teal population following a possible range shift into a novel environment by the former. This potentially enabled the yellow-billed pintail to become established in the Andes mountains after the speckled teal population had already occupied high altitudes for some time prior and for evolution to proceed faster that it would have had yellow-billed pintails had to wait for advantageous de novo mutations. By contrast and despite strong evidence of parallel evolution, introgression of EPAS1 and EGLN1 or other loci in the HIF pathway was not supported by our analysis. Studies of adaptive introgression between these two species and the consequences thereof deserve study at much wider genomic scale and fine-scale chromosome mapping to elucidate the genomic architecture of these past hybridization event(s).


Study organisms

Our study focuses on the speckled teal and the yellow-billed pintail, the two most common waterfowl (family Anatidae) species in South America. These species are sympatric throughout their range and are superficially similar, each with brown plumage that is similar between the sexes (i.e., monochromatic), and each with a similar yellow bill, yet the species are significantly divergent in body size (e.g., averaging 412 vs. 776 g for male teal versus pintail respectively; (Kear and Hulme 2005). However, these two species are not sister taxa and belong to different clades that are not particularly closely related within the genus Anas. Uncorrected mtDNA divergence between species is 5.2% (Johnson and Sorenson 1999), corresponding to ~1 million or more years ago divergence time based on based on 4.8 × 10−8 substitutions/site/year reported by Peters et al. (2005) for the avian mtDNA control region. Furthermore, there is a substantially deeper divergence between mtDNA lineages of low- and high-altitude populations of speckled teal compared to yellow-billed pintail (~3 vs. 0%; Supplementary Fig. 1), suggesting that speckled teal (for which the low- and high-altitude populations are classified as different subspecies) have occupied high-altitude habitats for a much longer period of time (Graham et al. 2018). In contrast, pintail populations do not show significantly different mtDNA haplotype frequencies between low- and high-altitude populations, are classified as the same subspecies, and therefore probably colonized high-altitude regions more recently and/or have maintained much higher levels of gene flow between low and high-altitude sites (McCracken et al. 2009c).

Specimen collection and DNA extraction

A total of 40 individual vouchered specimens were used for this study from these two Andean duck species. We used 20 individuals of each species, the same used by Graham and McCracken (2019) in a previous study of the HIF pathway. For the speckled teal, individuals from low-altitude populations are the A. f. flavirostris subspecies (n = 10; elevation range 77–860 m) and from high-altitude are the A. f. oxyptera subspecies (n = 10; elevation range 3211–4405 m). For the yellow-billed pintail, individuals from both populations are classified taxonomically as A. g. spinicauda. A total of 20 yellow-billed pintails were collected from low- (n = 10; elevation range 292–914 m) and high-altitude sites (n = 10; elevation range 3332–4070 m). Maps depicting the provenance of these specimens and accompanying georeferenced locality data were published previously by McCracken et al. (2009a, 2009b, 2009c) and Graham et al. (2018). Genomic DNA was extracted from tissue using the DNeasy Tissue Kit (Qiagen, Valencia, California, USA) following manufacturer’s protocols. Tissues are deposited in the Genomic Resources Collection at the University of Alaska Museum, and vouchered specimens reside at the University of Alaska Museum and Colección Boliviana de Fauna in La Paz, Bolivia.

Target-enrichment sequencing, sequence assembly, and annotation

We used in-solution target capture prior to next-generation sequencing to selectively enrich libraries for the complete α- and β-globin cluster genes in combination with capture of 26 genes from the HIF pathway (Gnirke et al. 2009). For details on the genes chosen and the methods in subsequent assembly for the HIF-pathway genes see Graham and McCracken (2019). Briefly, the HIF-pathway genes utilized in this analysis were chosen for being either (1) genes of interest in previous studies of humans using whole-genome scans looking to identify genetic adaptations to high-altitude or hypoxic environments, (2) part of the canonical HIF pathway, either as a known downstream target, or a part of the repression machinery of the pathway, and/or (3) part of similar transcription factor families. Rationale for sequencing the complete α- and β-globin clusters, and linked embryonic genes in particular, followed Natarajan et al. (2015)’s experimental characterization of the genetic basis of increased Hb-O2 affinity in these same species of Andean ducks and the preliminary finding that identical by descent β-globin alleles were present across this linkage group.

For the Hb, these sequence capture targets comprised linkage groups spanning 17,851 bp (α) and 42,936 bp (β) respectively, including all of the known genes and pseudogenes in the α- and β-globin clusters, expressed for both adult and embryonic Hb isoforms. The intronic and exonic Hb regions for each of the α subunits (HBZ = LOC101800713; HBA1 = LOC101800520; HBA2 = LOC101800332) and β subunits (HBE = LOC101798290; HBB = LOC101798111) were then annotated using genomic features available through the mallard (Anas platyrhynchos) genome on NCBI (BGI_duck_1.0, GCA_000355885.1). In the case of the β-globin, our 43 kb sequence also included two olfactory receptor genes (OR51G2, OR51L1) flanking the 5′ and 3′ ends of the 20 kb β-globin cluster.

All steps of the sequence capture process were performed by MYcroarray (now Arbor Biosciences; Ann Arbor, MI). Custom MYbaits® biotinylated ssRNA target capture baitsets were designed. Specifically, a custom set of 210 (α-globin) and 579 (β-globin) 120mer probes at 2X tiling density was designed from the mallard genome. The DNA samples were sonicated, size-selected, and converted to Illumina libraries using dual indexes (P5, P7) during library amplification. These libraries were then enriched using the MYbaits® v3.0 procedure (Gnirke et al. 2009), quantified using qPCR, and then pooled for sequencing. Sequencing was performed on an Illumina Hi-Seq v4 platform paired end (100 bp) with a 250–300 bp insert size. Sequences were received demultiplexed by individual, with adapters trimmed and quality filtered (>Q30). Additionally, sequences were trimmed of adapter sequences (fastq-clipper), then filtered by length and quality using the FASTX-Toolkit v. 0.0.13 package (Gordon and Hannon 2010).

The globin complexes were assembled, and variants described using two different methods—one producing contiguous sequences, and another producing a variant file (vcf). First, a reference-guided assembly was performed with Bowtie2 (Langmead and Salzberg 2012) against the α-globin (NW_004678373, scaffold 2065) and β-globin (NW_004682656, scaffold 6035) sequences of the mallard. Thus, the annotations were pulled from NCBI’s most recent version of this assembly at the time (July 2015). Consensus sequences were then extracted for each individual and aligned separately by species using MAFFT (Katoh et al. 2002). The intronic and exonic Hb regions for each of the α subunit (απ = LOC101800713; αD = LOC101800520; αA = LOC101800332) and β subunit (βε = LOC101798290; βΑ = LOC101798111) clusters were then annotated using genomic features available through the mallard genome on NCBI. Second, a custom pipeline ( was created to remove orphan sequences and assemble sequences against the α-globin and β-globin sequences of the mallard reference, using BWA v0.7.15 (Li and Durbin 2009). The Samtools package v1.3.1, including BCFtools v.1.3.1 (Li et al. 2009) was then used to create a VCF file and to provide assembly statistics (i.e., bp-by-bp coverage). These programs used in the pipeline called SNP variants against the mallard reference, including indels (insertion/deletion); however, indels were excluded in the final dataset as were sites with missing data.

For the HIF-pathway genes and Hb clusters, variant call format (VCF) files were created for downstream analyses. These files included all four populations comprising speckled teal (low and high) and yellow-billed pintail (low and high). All sites containing indels and missing data were removed, generating a total of 35,515 variable positions for the 26 HIF-pathway genes, 636 variable positions for the 17.9 kb-long α-globin cluster, and 2931 variable positions for the 42.9 kb-long β-globin cluster including adjacent olfactory receptor genes.

RAD-Seq reference data

Previously generated RAD-Seq (Restriction Site Associated DNA Sequencing) data for the same samples were used to function as a “backdrop”, i.e., genomic background, for analyses for the α- and β-globin clusters and HIF-pathway enriched SNP data. The single-digest RAD-Seq data were generated via FLORAGENEX ( using methods described in Natarajan et al. (2015) and Graham et al. (2018). Ultimately, the SNP output comprising 47,731 SNPs for speckled teal and 49,670 SNPs for yellow-billed pintail was then incorporated into the final Hb and HIF-pathway SNP dataset and used for downstream population genetic analyses (see next section).

F ST for identifying outlier and introgressed loci

To identify outlier loci we calculated FST, which measures population genetic differentiation due to genetic structure and the corresponding reduction in heterozygosity caused by population subdivision and/or directional selection. Whereas the latter, directional selection, is expected to have locus-specific effects related to individual gene function, the former affects genome-wide patterns because population subdivision is a demographic process (Peters et al. 2012). We used Weir and Cockerham (1984)’s FST calculated locus-by-locus (SNP-by-SNP) in VCFtools (Danecek et al. 2011). FST was first calculated between low- and high-altitude populations within species to identify conventional outliers potentially evolving by directional selection in high-altitude populations. This was done for the complete α- and β- globin clusters new to this study and also the 26 HIF-pathway genes previously surveyed by Graham and McCracken (2019; e.g., see Figs. 2, 3). The RAD-Seq data were used as a genome-wide backdrop to assess significance of outliers in the case of the α- and β- globin clusters. This analysis had already been performed previously for the HIF-pathway genes using the log-posterior odds of directional selection (Graham and McCracken 2019), whereas the present analysis of the globin clusters has utilized a quantile approach to identifying extreme outliers.

Fig. 2: Coalescent model illustrating how a more recently established population of yellow-billed pintails at high altitude likely acquired beneficial β-globin allele(s) from an older, more established population of speckled teal at high altitude.
figure 2

This example of adaptive introgression presumably involved a founder event by yellow-billed pintails followed by hybridization and backcrossing to the parental population and a selective sweep on newly-acquired beneficial β-globin alleles, which appear identical by descent (IBD) across the entire β-globin cluster. A selective sweep likely occurred earlier the original speckled teal β-globin prior to yellow-billed pintail becoming established at high altitude.

Fig. 3: Function propoerties of the observed β-globin amino acid substitutions in the HbA isoform of speckled teal and yellow-billed pintail.
figure 3

(A) Three-dimensional homology model of HBA showing the position of AlaβA116Ser and (B) independent and joint effects of AlaβA116Ser and LeuβA116Met on intrinsic hemoglobin P50 (y-axis) in the absence of allosteric effectors (figures modified and reprinted from Natarajan et al. 2015 under the Creative Commons Attribution (CC BY) 4.0 License applied by PLoS journals). The largest increase in Hb-O2 affinity (i.e., lower P50) occurs in the presence of AlaβA116Ser and absence of LeuβA116Met, whereas a moderate increase in Hb-O2 affinity is caused by substitution of the two amino acids combined. The derived non-ancestral amino acids present in high-altitude populations of speckled teal and yellow-billed pintail are underlined in bold, whereas ancestral amino acids present in the low-altitude populations are not underlined in plain text. Numbers refer to position in the βA-chain (HBB) polypeptide subunit.

Next, we calculated FST unconventionally, comparing not within species, but between species to demarcate genomic regions appearing identical by descent. This was done by calculating FST between the high-altitude population of speckled teal and the high-altitude population of yellow-billed pintail, and between the low-altitude population of speckled teal and the low-altitude population of yellow-billed pintail. Corresponding Manhattan plots were produced for each pairwise interspecific comparison, including the complete α- and β-globin clusters and the 26 HIF-pathway genes. The rationale for this analysis is that pairwise FST comparisons between species as genetically divergent as speckled teal and yellow-billed pintail (i.e., 5.2% in mtDNA) should reveal a relatively uniform distribution of high FST values reflecting many fixed or nearly fixed differences. However, the expectation for recently introgressed loci is that these regions will show a lower FST closer to zero, reflecting identity by descent (IBD) and therefore qualitatively different from regions of the genome representing non-introgressed loci. If especially little time has elapsed for new fixed differences to accumulate since introgression and a strong selective sweep occurred, such loci should also exhibit low nucleotide diversity (π) around the selected region with high linkage disequilibrium (LD) and slower LD decay.

In such examples, IBD thus will be reflected by “valleys” of low FST as opposed to “peaks” sensu Feder et al. (2012). We predicted that these “valleys” should be revealed between comparisons of the introgressed high-altitude populations only, and isolated to only those targeted loci that have actually introgressed, whereas non-introgressed regions of the genome should exhibit uniformly high FST and patterns of π and LD predicted above. Finally, we also predicted there should be correspondence between regions identified by this method of analysis and regions identified as introgressed in the ABBA–BABA analysis (see below). These analyses were performed for separate partitions of the data, including the α- and β-globin clusters and 26 HIF-pathway genes and contrasted with the genome-wide backdrop provided by the RAD-Seq data (Graham and McCracken 2019). VCF files were created separately to be fed into these analyses.

Tests for introgression (ABBA–BABA)

In conjunction with the FST-based approaches, we used the ABBA–BABA method to obtain Patterson’s D-statistic to test if a pattern of shared variation between groups can be better explained by hybridization than incomplete lineage sorting (ILS; Durand et al. 2011). Specifically, this method entails calculating the ratio of ABBA- to BABA-like four-taxa tree topologies, where the true tree topology is reflected by AABB, and ABBA and BABA reflect alternative topologies produced by ILS or introgression (Supplementary Fig. 2). In cases where introgression prevails, the ratio of ABBA:BABA ≠ 1, whereas in cases of ILS the expectation is that ABBA:BABA = 1 because ILS is a stochastic process subject to the influence of genetic drift resulting in relatively evenly distributed alternative tree topologies. Finally, a positive D-statistic signifies an excess ABBA sites, whereas a negative D-statistic signifies an excess of BABA sites or vice versa. In these analyses, a significant excess of ABBA sites corresponding to a positive D-statistic was used to infer introgression between high-altitude specked teal and high-altitude yellow-billed pintail populations.

We performed the four-taxa D test in the software package Dsuite (Malinsky et al. 2021) using standard ABBA–BABA gene trees comprising four clades corresponding to each low- and high-altitude population (Supplementary Fig. 2). While the ABBA–BABA test is usually depicted in the context of a fully pectinate topology (((YP-H,YP-L),ST-H),ST-L), the correct topology for these taxa comprises two reciprocally monophyletic sister clades ((YP-H,YP-L),(ST-H,ST-L)) (Fig. 2). However, there is no reason to assume either topology to perform the introgression tests because the frequencies of ABBA vs. BABA sites are topology independent (i.e., calculated from the SNPs themselves, not reconstructed on internal branches), and a priori knowledge of the true topology is not required (Malinsky et al. 2021). Thus, here we use the ABBA–BABA test despite not having a pectinate topology since the differences in ABBA vs. BABA patterns are informative of introgression events under a topology with two reciprocally monophyletic sister clades. Using the Dtrios subprogram in Dsuite, we performed 20 jacknife replicates to calculate the D-statistic and record the number of ABBA and BABA allocated SNPs for each locus; n.b., averaging Dtrios output over the 20 jacknife replicates resulted in non-integer ABBA and BABA values. The z-scores were used to determine significant introgression using a conservative cutoff of z = 3.3 (P < 0.001) as recommend by Durand et al. (2011) and Ottenburghs et al. (2017), followed by adjusting α for multiple tests using the false discovery rate of Benjamini and Hochberg (1995). Because the D-statistic can have a large variance when applied to small genomic regions and can give inflated values in regions of low diversity, we also calculated global fD (Martin et al. 2015) using Dinvestigate for each locus, and also along a sliding window for the β-globin cluster. This was performed for (a) the full α- and β-globin clusters, (b) all 26 genes from the HIF pathway (Graham and McCracken 2019), and finally (c) the composite RAD-Seq comprising the ~2 million bp of RAD-clusters shared between speckled teal and yellow-billed pintail.

Inferring the direction of gene flow using the coalescent

To determine directionality of interspecific gene flow for the putatively introgressed β-globin locus, we performed Isolation-with-Migration coalescent analyses in the software IMa2 (Hey 2011). We used the 4-population model (Fig. 2) defined by two reciprocally monophyletic populations of speckled teal (ST-Low, ST-High) and two populations of yellow-billed pintail (YP-Low, YP-High). The following parameters were used for population size (ΘST-H, ΘST-L, ΘYP-H, ΘYP-L, ΘST-ANC, ΘYP-ANC, Θ ANC), bidirectional migration rates (MST-L>H, MST-H>L, MYP-L>H, MYP-H>L, MST-L>YP-L, MYP-L>ST-L, MST-H>YP-H, and MYP-H>ST-H, etc.), and time since divergence (t0, t1, and t2 respectively, where t0 is divergence between yellow-billed pintail populations, t1 is time since divergence between speckled teal populations, and t3 is divergence between species).

This analysis was executed on a 291 bp fully-phased capillary-sequenced segment of the βA-globin (HBB) gene near the 3′ end of the coding region incorporating 28 SNPs (21 parsimony informative) and the amino-acid substitutions Alaβ116Ser and Leuβ133Met previously characterized experimentally by Natarajan et al. (2015). The joint and independent effects of these amino-acid substitutions on intrinsic Hb-O2 affinity are shown in Fig. 3 as illustrated by Natarajan et al. (2015). Phasing of this HBB sequence was performed in the software PHASE (Stephens et al. 2001), in conjunction with the 4-gamete test, as implemented in IMgc (Woerner et al. 2007) to identify a minimally non-recombining block of sequence comprised of ≤5 possible recombination events and haplotypes from each of the four low- and high-altitude populations (McCracken et al. 2009a). To illustrate these haplotype relationships, we constructed an allelic network for the 3′ end of the βA-globin using the median-joining algorithm in NETWORK 4.6 (Bandelt et al. 1999). Due to the short length of the RAD-Seqs, and small number of SNPs per locus in this RAD-Seq dataset, coalescent analysis was not performed for the RAD-Seq data. Instead, we used the corresponding sequences located on different chromosomes from the coalescent analyses published by McCracken et al. (2009a), which included fully-phased allele sequences from 5-intronic loci with no detectable recombination (see McCracken et al. (2009a) for the corresponding GenBank accession numbers and capillary sequencing methods).

IMa2 was run first with wide uninformative priors. Analyses were then conducted with more narrow uniform priors that encompassed the posterior distributions from the preliminary runs (e.g., θ = 10–50, M = 100, and t = 2–5). The Markov chain Monte Carlo was run five times independently for as many as 10 million steps in the case of intronic DNA sequences, sampling the posterior distribution every 150 steps, with a burn-in of 500,000 steps. All runs included 20 chains with a geometric heating scheme. Thus, we were able to obtain estimates of bidirectional migration rates between all pairs of low- and high-altitude populations, the effective population size parameter (θ = 4Neμ) for each extant and ancestral population, and time since divergence using these different data sets.

Signature of selective sweep and LD on introgressed loci

To further examine evidence for a selective sweep on introgressed alleles in the high-altitude population, we calculated three additional population genetic parameters, including π, the allele-frequency spectrum for derived alleles (dAFS), and decay of LD. Following a selective sweep on introgressed adaptive variation in the recipient high-altitude population, we predicted that π would be lower, the dAFS should be more skewed toward low- and high-frequency polymorphisms, and LD should be higher whereas LD decay should be slower. For the β-globin cluster, π and LD decay were calculated, whereas π, dAFS, and LD decay were calculated for EPAS1 and EGLN1. The dAFS was also calculated for the RAD-Seq for a genome-wide comparison outside the selected regions. VCFtools (Danecek et al. 2011) was used to calculate π and the dAFS. PopLDdecay (Zhang et al. 2019) was used to plot LD decay.


α- and β-globin assembly statistics

Coverage was calculated for both the Bowtie2 (contiguous sequence assembly) and BWA (VCF pipeline) assemblies. For Bowtie2, the average coverage for the α-globin cluster was 1,022× with an average of 110,779 reads per individual assembly, containing an average 49.6% GC content. The average coverage for the β-globin cluster was 275× with an average of 120,586 reads per individual assembly, containing an average 49.11% GC content. For BWA, across the two species, the per base-pair coverage was 279× covering 92.5% of the β-globin cluster, and 525× covering 97.5% of the α-globin cluster. Assembly statistics for the 26-HIF pathway genes were published previously (Graham and McCracken 2019).

Introgression of the β globin

To test for introgression, D-statistics were calculated for (a) the α- and β-globin clusters, (b) the 26 genes from the HIF-pathway, two of which had shown prior evidence for convergent evolution in these same species (Graham and McCracken 2019), and (c) the combined RAD-Seq clusters comprising 81,957 SNPs. We found strong evidence supporting adaptive introgression in the β-globin genes (D = 0.78, fD = 0.68, 43.7 ABBA vs. 28.9 BABA sites; z = 4.586, P < 0.000002) after correcting α for multiple tests using the false discovery rate (Benjamini and Hochberg 1995), but not in any of the α-globin genes (D = −0.04, fD = −0.005, 3.4 ABBA vs. 3.3 BABA sites; z = 0.057, P < 0.477; Table 1). Of the 26 genes in the HIF Pathway, D-statistics were not significant at the z = 3.3 (uncorrected P < 0.001) level for any locus (Table 1). EGLN1 (D = 0.05, fD = 0.005, 12.7 ABBA vs. 12.1 BABA sites) and EPAS1 (D = 0.06, fD = 0.006, 19.1 ABBA vs. 16.8 BABA sites), which are on the same chromosome, had z-scores < 0.86, corresponding to uncorrected P values > 0.19, thus providing no evidence that high-altitude sequences in these two species are introgressed (Table 1). While some other HIF-pathway loci had elevated D-statistics (ARNTL, D = 0.37; HSP90AA1, D = 0.48), no loci had uncorrected z-scores > 2.58 (P > 0.01), and fD was low or very low for all HIF-pathway loci. Finally, there was no significant evidence of genome-wide introgression between the two species in the pooled RAD-Seq clusters (D = 0.002, fD = 0.001, 581.3 ABBA vs. 574.0 BABA sites; Table 1).

Table 1 Tests for introgression across the α- and β-globin gene clusters, 26 HIF-pathway genes, and genome-wide RAD-Seq analysis of 81,957 SNPs.

High F ST outlier loci

In the embryonic HBE (ε) gene of yellow-billed pintail and speckled teal there are four novel amino-acid changes that have evolved convergently in high-altitude populations of speckled teal and yellow-billed pintail. These convergent changes include Serε14Gly, Ileε15Leu, Serε17Gly, and Alaε126Thr (Table 2). Compared to the background FST between lowland and highland populations in these two species (FST = 0.06494 and 0.01660, respectively), FST for these ε-globin polymorphisms (FST = 0.733–1.000) falls far outside the 99% quantiles for each species (Fig. 4), ranking them among the most extreme outliers for these populations, a pattern qualitatively similar to the HBBA) subunit amino acid changes experimentally shown to increase Hb-O2 affinity (Fig. 3).

Table 2 Putatively-convergent amino-acid replacements in embryonic HBE (ε) globin gene of high-altitude speckled teal and yellow-billed pintail populations.
Fig. 4: Amino-acid polymorphisms in the adult βA and embryonic ɛ-globin genes (HBB and HBE, shown in red) are extreme outliers in genomic scans of allele-frequency differentiation between low- and high-altitude populations of speckled teal and yellow-billed pintail.
figure 4

The histograms and box plots depict genome-wide distributions of SNP-specific FST and 99% upper quantiles based on the RAD-Seq survey of genomic polymorphism (speckled teal = 47,731 SNPs, yellow-billed pintails = 49,670 SNPs). Arrows indicate extreme outliers resulting in the labeled amino-acid replacements.

In the embryonic HBE (ε) globin subunit, three of the four nonsynonymous changes resulted in a difference of polarity in their respective side-chains: two involve a change from a polar, hydrophilic amino acid (Ser) to an aliphatic amino acid (Gly), whereas the other is from an aliphatic, hydrophobic amino acid (Ala) to a polar, hydrophilic amino acid (Thr). For speckled teal, all four sites are fixed for different amino acids between low- and high-altitude populations (FST = 1.000), whereas allele frequencies are highly divergent but not fixed in yellow-billed pintail (FST = 0.733–0.778) (Table 2; Fig. 4). The first three of these at positions 14, 15, and 17 occur in the first helical turn of the first α subunit, and the last one at position 126 in the last helical turn. These amino-acid substitutions were located on the human embryonic protein structure (Protein Data Bank PDBe › 1a9w). No bird embryonic structure is currently available to study these on an avian background. Among the 26 genes in the HIF pathway, only EPAS1 and EGLN1 exhibit unusually high FST (FST = 0.715–0.910), qualitatively dissimilar to any other HIF-pathway locus or the genome-wide background FST, and the pattern for these two loci is similar, involving outliers in the same EPAS1 exon (exon 12), for both speckled teal and yellow-billed pintail (Fig. 5; see Figs. 2, 3 in Graham and McCracken 2019). The two genes are both located on the same chromosome of the duck genome (Chr3: 37,562,797–37,679,832 and Chr3: 42,539,456–42,573,466 respectively), about 5 Mb apart and with the undifferentiated gene VEGFA (Chr3: 33,313,820–33,336,832) also analyzed in this study ~4 Mb to the other side of EPAS1 relative to EGLN1.

Fig. 5: Manhattan Plots showing locus-by-locus FST calculated pairwise for all SNPs for the 26 HIF-pathway genes between low- vs. high-altitude populations of speckled teal (within species comparison) and low- vs. high-altitude populations of yellow-billed pintail (within species comparison).
figure 5

The HIF-pathway outliers EPAS1 and EGLN1 identified by Graham and McCracken (2019) are shown with arrows. Individual gene identities are labeled with corresponding colors below the plots.

Low F ST outlier loci

Our between-species FST scan for low-value valleys revealed that the β-globin cluster shows a pronounced low-FST valley between the high-altitude populations of each species corresponding to the same region of the sequence with elevated fD, but not between the low-altitude populations of each species (Fig. 6). This pattern spans all adult and embryonic β-globin genes in the cluster and is clearly demarcated by high FST regions of sequence in the two olfactory receptor genes (OR51G2 and OR51L1) flanking the 5′ and 3′ ends of the β-globin cluster. By comparison the α-globin cluster, which does not exhibit D-statistics characteristic of introgression, shows no such pattern (Fig. 6). In the α-globin cluster, high FST polymorphisms reflecting many fixed differences are uniformly distributed across the entire α cluster for the comparisons of high- vs. high-altitude populations and low- vs. low-altitude populations. The β-globin therefore appears identical by descent because it is introgressed between high-altitude populations of the two species, whereas the α-globin cluster does not.

Fig. 6: Manhattan Plots showing locus-by-locus FST calculated pairwise for all SNPs for the α- and β-globin clusters between high-vs.-high (between species comparisons) and low-vs.-low (between-species comparisons) populations of speckled teal vs. yellow-billed pintail (bottom and top, respectively).
figure 6

The high vs. high comparison of the β-globin cluster differs qualitatively from the low vs. low comparison, as it also does from both α-globin comparisons. Specifically, the high-vs.-high β-globin shows a “valley” of low FST values centered on the βA- and ε-globin genes. The fD-statistic calculated in Dsuite (Malinsky et al. 2021) and shown in red furthermore shows a broad peak spanning most of the β-globin cluster and corresponding to the region of introgressed sequence. The two graphs at the bottom for the α-globin cluster do not differ qualitatively from each other, as each display long tracts of high FST values across the linkage group. Finally, the olfactory receptor genes OR51G2 and OR51L1 flanking the 5′ and 3′ ends of the β-globin cluster exhibit high FST compared to the intervening low FST β-globin genes in the high-vs.-high comparison, but this pattern is not evident in the low-vs.-low comparison. The smoothed lines illustrating average fD and FST were calculated using sliding windows and default parameters in JMP v14. Positions of α- and β-globin genes on the scaffolds are shown at the top of each set of panels.

By contrast, we found no patterns of low FST similar to the β-globin cluster for either EPAS1 or EGLN1 or the other HIF-pathway loci. For both the high- vs. high-altitude and low- vs. low-altitude population comparisons, many fixed differences were apparent across all 26 loci, including EPAS1 and EGLN1 (Supplementary Fig. 3A, B). Therefore, there is no qualitative pattern consistent with IBD, as confirmed by the ABBA–BABA D-statistics. However, the 3′ half of EPAS1 and all of EGLN1 exhibit disproportionately fewer loci with intermediate FST values (0.25 < FST < 0.75) than other HIF-pathway loci (Supplementary Fig. 3C; see also Supplementary Fig. 4). This pattern is apparent in the high- vs. high-altitude pairwise FST comparison of these two loci but not the low- vs. low-altitude pairwise FST comparison.

Identity by descent (IBD)

The allelic network we constructed for the β-globin cluster represents 291 bp from the 3′ end of the HBBA) globin gene (Fig. 7). It includes part of intron 2 and all of exon 3 including the AlaβA116Ser and LeuβA116Met amino-acid substitutions determined by Natarajan et al. (2015) to increase intrinsic Hb-O2 affinity (Fig. 3). It clearly illustrates a pattern of IBD, reflecting a selective sweep on high-altitude populations sharing the same haplotypes at approximately equal frequencies. One βA116Ser-βA116Met haplotype constituting 99.3% of high-altitude speckled teal and 89.2% of high-altitude yellow-billed pintail predominates in the highlands. Six other βA116Ser-βA116Met haplotypes occur, but they exist at low frequencies (4 are singletons in our dataset) in the highlands and differ by a single nonsynonymous substitution, radiating from the central common haplotype in a star-like pattern (Fig. 7). Interestingly, five of these rare A116Ser-βA116Met haplotypes are present in speckled teal only, whereas only one is present in yellow-billed pintail. This ratio of singletons between species is consistent with speckled teal having older ancestry in the highlands than the yellow-billed pintail.

Fig. 7: Haplotype network for the β-globin cluster showing the 3′ coding region of capillary-sequenced HBBA), including 291 bp spanning part of intron 2 and all of exon 3 including the AlaβA116Ser and LeuβA116Met amino-acid substitutions previously characterized experimentally by Natarajan et al. (2015).
figure 7

Bold edged circles represent haplotypes with putatively adaptive amino-acid substitutions appearing identical by descent (IBD) between high-altitude populations of the two duck species. Six βA116Ser-βA116Met haplotypes were identified. One haplotype (at the center) is common, and the other five are singletons or occur at low frequency and radiate from the central common haplotype in a star-like pattern. Five of these rare βA116Ser-βA116Met haplotypes are present in speckled teal, whereas only one is present in yellow-billed pintail, consistent with the coalescent model data indicating that speckled teal has an older ancestry in the highlands.

Direction of the β-globin introgression

In the allelic network for the 291 bp from the 3′ end of the HBBA) globin gene, many yellow-billed pintails at high altitude share haplotypes with yellow-billed pintails at low altitude, whereas few speckled teal at high altitude share haplotypes with speckled teal at low altitude (Fig. 7). This pattern suggests that yellow-billed pintails exhibit higher gene flow between low and high altitude than speckled teal, a pattern which was corroborated previously and is consistent with the highly volant characteristics of this species (see McCracken et al. 2009c). We also know from depth of mtDNA divergence (~3 vs. 0%; see also Supplementary Fig. 1) that the high-altitude specked teal population (a separate subspecies, and phenotypically distinct from the low-altitude subspecies) probably was established at high altitude earlier and for much longer than the yellow-billed pintail population (not a separate subspecies, and not phenotypically distinct).

Using the same 291 bp capillary-sequenced alleles from the 3′ end of the HBB (βA) globin shown in Fig. 7, we conducted 4-population coalescent analyses in IMa2 (Hey 2011) for the 3′ HBB (βA) globin. For comparison to putatively “non-introgressed” loci reflecting the genomic background, we used fully-phased sequence data from five intronic loci sampled from different chromosomes (McCracken et al. 2009a). Our analysis reveals that while there was no evidence of interspecific gene flow in the intronic loci analyzed by McCracken et al. (2009c), the 3’-end of the HBB (βA) globin introgressed into the high-altitude yellow-billed pintail population from high-altitude speckled teal, but not the reverse (Supplementary Table 1). As expected, no nonzero estimates of gene flow were found in any other direction for all possible pairwise comparisons of migration rates between species. In sum, our finding strongly confirms that yellow-billed pintail acquired its present-day βA116Ser and LeuβA116Met-containing globin variants through adaptive introgression. Furthermore, it was the more genetically and morphologically divergent species that probably colonized high altitude earlier in history (speckled teal) that transferred putatively adaptive genetic variation to the species that probably colonized high altitude more recently (yellow-billed pintail).

Signature of selective sweep and LD on introgressed loci

Contrary to our prediction that nucleotide diversity (π) would be lower in high-altitude yellow-billed pintail than high-altitude speckled teal, we did not find this to be the case for the β-globin cluster for which π was similar between species and actually slightly higher in the high-altitude yellow-billed pintail population (Supplementary Fig. 5). However, nucleotide diversity was lower in high-altitude yellow-billed pintail for the two HIF-pathway EPAS1 and EGLN1. As predicted, the dAFS of EPAS1 and EGLN1 were more skewed in high-altitude yellow-billed than high-altitude speckled teal (Supplementary Fig. 6). Whereas yellow-billed pintail dAFS was qualitatively different from the RAD-Seq dAFS, the speckled teal dAFS was not, suggesting a more recent selective sweep in yellow-billed pintail. Finally, β-globin LD was higher and LD decay was slower for the high-altitude yellow-billed pintail population than for the high-altitude speckled teal population (Fig. 8). A similar LD pattern also occurred in EPAS1, but not in EGLN1 (Supplementary Fig. 7).

Fig. 8: Decay of linkage disequilibrium (LD) plotted for the high-altitude populations of speckled teal and yellow-billed pintail, respectively, across the β-globin cluster.
figure 8

Following a selective sweep in both species at different times in history, linkage disequilibrium has persisted to the present in yellow-billed pintail and and falls off slower compared to the older high-altitude speckled teal population.


In this study, we present a comprehensive analysis of the two Hb gene clusters and 26 HIF-pathway genes of two Andean duck species. This includes previously unexamined embryonic Hb genes (HBE, HBZ), and the EPAS1 and EGLN1 genes, the latter which have been a prominent focus of human genetic studies. Our results reveal previously undescribed adaptive introgression of the β-globin cluster between the two high-altitude populations of speckled teal and yellow-billed pintail.

Adaptive introgression has been identified across plants and animals as an important source of genetic variation allowing organisms to quickly adapt to environmental pressures (Goulet et al. 2017; Hedrick 2013; Racimo et al. 2015). One prominent example is in Tibetan populations, where adaptation to hypoxia at high-altitude in the HIF-pathway was acquired through interbreeding of ancient human populations with Denisovans (Huerta-Sánchez et al. 2014; Jeong et al. 2014). Similar patterns were seen in Tibetan mastiffs and cattle, which showed adaptive introgression of the HIF pathway (Li et al. 2014; Miao et al. 2017; Wu et al. 2018). There is also now evidence that adaptive introgression involving the β globin and increased Hb-O2 binding affinity in the Tibetan mastiff resulted from a legacy of selective interbreeding and hybridization with the Tibetan wolf (Canis lupus laniger) (Signore et al. 2019). The HIF-pathway genes (EPAS1, EGLN) and Hb are notable because of their key role in mediating the core physiological response to changes in O2 availability (Storz and Cheviron 2016), and are frequent targets for selection in high-altitude living (Hanaoka et al. 2012; Li et al. 2014; Qiu et al. 2012; Qu et al. 2013; Scheinfeldt and Tishkoff 2010; Simonson et al. 2012; Wang et al. 2014). Often, genetic variation in EPAS1 and EGLN1 are associated with changes in Hb concentration, among other physiological traits (Ai et al. 2014; Beall et al. 2010; Lorenzo et al. 2014; Peng et al. 2017; Xiang et al. 2013; Yi et al. 2010). Our study shows exciting new evidence of adaptive introgression in this key cluster of genes influencing Hb-O2 affinity and therefore likely arterial-O2 saturation (e.g., Tate et al. 2017), and it is only the second documented example of adaptive introgression involving high-altitude hemoglobin variants (see also Signore et al, 2019)

Our results also suggest additional potential mechanisms of adaptations to high altitude through embryonic Hb in speckled teal and yellow-billed pintail. Based on currently published structure of embryonic human Hb, the locations of amino acid changes found in this study are peripheral to the locations of metal/heme binding. However, this does not preclude the possibility of those amino-acid replacements having functional effects on Hb-O2 binding affinity, as even slight changes resulting in small structural differences such as the elimination of a single H-bond can cause profound changes in intrinsic Hb-O2 affinity (Natarajan et al. 2015). Embryonic Hb thus may be under even stronger selection for increased O2-binding at altitude because of egg-shell impermeability, which in numerous species has evolved to maximize O2 availability to the developing embryo at the expense of water loss (Carey et al. 1994; Carey et al. 1989). Further surveys of embryonic Hb are thus warranted but potentially more difficult to undertake than for adult-expressed Hbs, as nests, eggs, and young offspring of some high-altitude species can be difficult to locate and must be sampled at known developmental intervals to purify the embryonic Hb isoforms.

Qualitative differences between the β-globin cluster and HIF-pathway

Our analysis revealed strong qualitative differences between the β-globin cluster and HIF-pathway genes. The ABBA–BABA analysis strongly supported introgression of the β-globin cluster but not EPAS1 or EGLN1, which nonetheless are under strong directional selection in these taxa (Table 1). When we calculated pairwise FST between high-altitude populations of both species (Supplementary Figs. 3, 4), a pronounced FST “valley” reflecting IBD was apparent for the β-globin cluster in the high-altitude yellow-billed pintail population, but no such FST “valley” was observed for EPAS1 and EGLN1. In the case of the β-globin cluster, detection of this FST “valley” was revealed by sequencing well outside the β-globin cluster into adjacent flanking olfactory receptor genes that do not exhibit the same signature of introgression (i.e., not IBD). Our sliding window ABBA–BABA analysis further revealed that fD, which is more robust to regions of low genomic diversity, is inversely related to FST in this region (Fig. 6).

Nucleotide diversity (Supplementary Fig. 5) was lower and the dAFS (Supplementary Fig. 6) was more skewed in EPAS1 and EGLN1 of the high-altitude population of yellow-billed pintail compared to the high-altitude population of speckled teal, despite the two species having similar census population sizes and therefore possibly subject to similar magnitudes of genetic drift (yellow-billed pintail is probably more influenced by gene flow between the lowlands and highlands). Therefore, these findings may be consistent with a more recent selective sweep on high-altitude yellow-billed pintail (as opposed to high-altitude speckled teal), followed by the fixation of rare variants. Like the β-globin cluster, this conclusion is also reinforced by higher LD and slower LD decay in the EPAS1 of high-altitude yellow-billed pintail (Supplementary Fig. 7). Most evidence for the β-globin cluster and at least some evidence for EPAS1 and EGLN1 thus point towards a more recently diverged yellow-billed pintail population, and in the case of the β-globin cluster, the recipient of adaptive genetic variation from the older diverged speckled teal population (Fig. 2).

In sum, the strong qualitative differences we observed between the β-globin cluster and HIF-pathway genes suggest introgression of the β-globin cluster followed by a selective sweep on mutations increasingly Hb-O2 affinity (Fig. 3) may have occurred recently in the history of this pair of species. This is consistent with the remarkable IBD between β-globin alleles from the two high-altitude populations (Fig. 7), as well as pronounced LD present in yellow-billed pintails (Fig. 8). For the β-globin cluster, mutation and recombination have so far done little to erase the signature of a selective sweep on this locus in high-altitude yellow-billed pintails.

Selection in the face of countervailing gene flow

One important confounding factor in our analysis is gene flow between low- and high-altitude populations of each species. It is a general rule that local adaptation occurs in the face of countervailing gene flow when the selection coefficient s > m the migration rate (Slatkin 1987). This is because selection is locus specific, whereas population subdivision and migration affect polymorphisms genome wide (Peters et al. 2012; Rieseberg and Burke 2001; Wu 2001). In a previous study, it was found that gene flow into the high-altitude population of yellow-billed pintails from low-altitude yellow-billed pintails was an order of magnitude greater than gene flow into the high-altitude population of speckled teal from low-altitude speckled teal (McCracken et al. 2009a). This gene flow has predictable effects, principally contributing to augmented Ne. This might be one factor explaining why we found similar π between high-altitude yellow-billed pintails and high-altitude speckled teal for the β-globin cluster (Supplementary Fig. 5). Many yellow-billed pintails at high altitude share β-globin haplotypes with yellow-billed pintails at low altitude, whereas few speckled teal at high altitude share haplotypes with speckled teal at low altitude (Fig. 7). Numerous yellow-billed pintails were heterozygous for βA116Ser and βA116Met, and most were sampled at the periphery of the geographic range of the high-altitude yellow-billed pintail population (McCracken et al. 2009c). No individuals homozygous for the low-altitude ancestral alleles βA116Ala and βA116Leu were found at high-altitude sites in any of these extensive surveys (McCracken et al. 2009a; McCracken et al. 2009b; McCracken et al. 2009c), suggesting that fitness of β-globin heterozygotes may be relatively high at altitude due to codominant expression of Hb, and that such heterozygous individuals may not be adversely affected at low altitude either.

Propensity to hybridize

Interspecific hybridization is common in plants and animals, especially in waterfowl (Anatidae), which hybridize extensively (Johnsgard 1960; Randler 2008; Tubaro and Lijtmaer 2002). A principal factor contributing to hybridization in waterfowl occurs when one species is rare and the other is common in sympatry (Randler 2002)—otherwise known as the Hubbs Principle, or “Desperation Hypothesis” (Hubbs 1955), which predicts that with skewed mating opportunities the rarer species is more likely to mate with heterospecifics. Here we report adaptive introgression resulting from interspecific hybridization and backcrossing between two waterfowl species that presently coexist in broad sympatry and mixed flocks throughout the Andean highlands of South America.

This is not the first documented hybridization between these species. In previously documented examples of hybridization between these two species on remote oceanic islands, one species was abundant and the other was extremely rare (McCracken and Wilson 2011; McCracken et al. 2013). Our finding of adaptive introgression in the Andean highlands raises the question about what a hybridization event leading to adaptive introgression might have looked like. If in the past, speckled teal were previously already established and abundant in the Andean highlands, whereas yellow-billed pintails were occasional sojourners to high altitude, scarcity of conspecific mating opportunities could possibly have contributed to hybridization and backcrossing between these two species following the range shift into a novel environment. Subsequent to adaptive introgression, yellow-billed pintails, unlike speckled teal, would not have had to wait for adaptive de novo mutations to enable successful colonization of high-altitude environments in the wake of such a founder event, but rather acquired beneficial alleles from the standing variation of another species that experienced a selective sweep at the same locus much earlier in its history, leading to faster evolution. This study of two Andean waterfowl thus illustrates an example of how adaptive introgression may be a frequently overlooked mechanism that species use to enhance their evolvability and adapt to a changing environment while still maintaining species integrity.


Introgression through hybridization is an important mechanism for genetic adaptation in plants and animals. In vertebrates, adaptation to hypoxic high-altitude conditions involves coordination of multiple molecular and cellular mechanisms, including selection on the HIF pathway and the blood-O2 transport protein Hb. Genetic variation within these genes has frequently been found to be a target of selection, in multiple organisms, and recent evidence indicates adaptive introgression played a role in Tibetan people, which exhibit high heritability for both %O2 saturation and Hb concentration. In this study, striking DNA sequence similarity and strong LD reflecting IBD in two Andean duck species is present across the entire β-globin cluster including both embryonic (HBE) and adult (HBB) paralogs. However, this was not the case for EPAS1 and or EGLN1. In the case of the β-globin, these similarities likely resulted from historical interbreeding between high-altitude populations of two different distantly-related species, and possibly at more than one time in history under similar selective influences coinciding with past founder events. We furthermore examined the direction of introgression and discovered that in the β globin, the speckled teal, the species with a deeper mtDNA divergence that first colonized high altitude earlier in history transferred adaptive variation to the yellow-billed pintail, the species with a shallower divergence that likely colonized high altitude more recently (not the reverse). Thus, the yellow-billed pintail received these variants through hybridization and may not have waited for de novo mutations to adapt to the high-altitude environment, but rather acquired beneficial alleles from the standing variation, albeit via another species with a similar genetic background, leading to faster evolution.