Introduction

The complex interplay between environmental factors and local adaptation plays an essential role in the generation and maintenance of biodiversity1,2,3,4. Typically, genetic studies on adaptation have focused on DNA sequences (single nucleotide polymorphisms; SNPs), given that point mutations were thought to be the predominant source of selectable variation5,6; however, genomes can also vary in their physical structure across species, populations, and even individuals7,8,9,10,11,12. These physical variations, collectively known as structural variants, include insertions or deletions of single or large numbers of nucleotides, duplications of genes or entire regions of the genome, inversions or changes in polarity of chromosomes, and translocations both within and among chromosomes. Structural variants can be significant factors in shaping species divergence and local adaptation10,12,13,14,15,16. Early comparative genomics research determined that chromosomal inversions were linked to speciation in Drosophila8. More recent work is focused on the eco-evolutionary impact of structural variation within natural populations12,16. For instance, Arostegui et al.17 found that a chromosomal inversion is likely responsible for ecotype differentiation in rainbow trout. Likewise, Cayuela et al.13 found significant genotype-environment associations (GEA) among sympatric capelin lineages in the North Atlantic Ocean linked to chromosomal rearrangements and hypothesized that they were associated with adaptation to environmental conditions at spawning sites. These examples and others demonstrate how investigations of structural variants can provide important and novel insights into local adaptation.

Copy number variants (CNVs) span different classes of structural variants (e.g., insertions/deletion, duplications, transposable elements) that vary in the number of copies among individuals18. Previous studies have revealed that signal at CNVs can differ relative to SNPs, providing important insights into population differentiation and evolutionary history, including the genetic basis of adaptation (reviewed in Mérot et al.10). Importantly, recent studies have effectively detected and analyzed CNVs from reduced representation genome sequencing (e.g., restriction-site associated DNA sequencing; RADseq19), providing a cost-effective approach for investigating the role of structural variation in a range of non-model organisms, including American lobster20, capelin21, and Columbia spotted frog22. Moreover, these analytical approaches can now be retrospectively applied to the wealth of publicly available RADseq data to investigate the potential role of CNVs underlying patterns of ecological and evolutionary relevance23.

The American pika (Ochotona princeps) is a small lagomorph distributed across a large latitudinal gradient in western North America, from New Mexico (USA) to central British Columbia (Canada)24,25. Pikas are alpine specialists typically found at elevations > 2000 m, although their full distribution spans an elevational gradient from 0 to 4000 m25,26,27. These environmental gradients serve as an excellent system for investigating local adaptation, as variations in genomic architecture can be directly correlated to differences in environmental variables including temperature, precipitation, and solar radiation1,28. Furthermore, American pikas can also be separated into six geographically isolated evolutionary lineages29, allowing for investigations of potentially independent and parallel histories of adaptation within the same system. While previous studies have found genetic evidence for local adaptation in the American pika on regional scales30,31,32 and at the whole genome level33,34, range-wide adaptation has not been explored. Moreover, these previous studies focused solely on sequence variation; investigation of structural variation may provide a novel, yet complementary, perspective on local adaptation in this species.

To complement past and on-going studies of sequence variation, we investigated the role of CNVs underlying patterns of local adaptation in the American pika using RADseq data previously collected for 36 sites across the species range29. We used a combination of partial redundancy analysis and linear mixed-effect modelling to identify loci putatively associated with temperature, precipitation, and solar radiation within and across all six major American pika lineages. We subsequently examined the spatial distribution of putatively adaptive variation and population differentiation across range-wide latitudinal and elevational gradients. Finally, we annotated putatively adaptive variants and identified target genes with potential functional impacts on local adaptation.

Methods

Data and study area

We used previously generated RADseq data29 collected from 348 individuals sampled from 36 localities across the entire American pika distribution spanning all six major lineages: (NRM) Northern Rocky Mountains; (CRM) Central Rocky Mountains; (SRM) Southern Rocky Mountains; (CSC) Cascades; (SN) Sierra Nevada; and (CU) Central Utah (Fig. 1; Table S1); a subset (n = 173) of these samples were used in Galbreath et al.35. We removed site 18 from the NRM lineage as this site displayed significant admixture between the NRM and CRM lineages29.

Figure 1
figure 1

Sampling sites and lineage delineations (thick black lines) for the American pika (Ochotona princeps). Shaded regions indicate the approximate American pika distribution as modified from Galbreath et al.35.

Initial SNP calling and filtering

Raw sequencing data from all 11 RADseq libraries were de-multiplexed using the process_radtags module of Stacks v2.5736; during this stage, reads were trimmed to 94 bp to remove low-quality bases at the end of the read. De-multiplexed sequences were then aligned to the American pika reference genome assembly (OchPri4.0; GenBank accession ID: GCA_014633375.133) using BWA-mem v2.2.137 under default parameters. The aligned sequence data was then processed using the gstacks module of Stacks v2.5736 to generate a catalog of RADtags for downstream analyses.

Due to low range-wide heterozygosity38,39 that interfered with downstream copy number variant (CNV) detection (see Discussion for further explanation), we called SNPs independently within each lineage from the aligned sequence data using Stacks v2.5736 retaining polymorphic, autosomal SNPs present in ≥ 70% of all individuals within that lineage. This design also allowed us to assess patterns of parallel evolution by considering each lineage as a separate pseudoreplicate. We further filtered loci to only retain genotypes with a genotyping quality ≥ 20 using VCFtools v0.1.1540 and only retained loci present in ≥ 2 individuals within each sample site. Following Dorant et al.20, we performed a final filtering step to retain loci which genotyped in ≥ 70% of individuals within a sample site (allowing for 10% of populations to fail this threshold; maximum of 2 populations per lineage) and had a minimum minor allele sample (i.e., the minimum number of samples possessing the minor/rare allele) of 5% of individuals within a lineage (rounded up to the nearest individual) across all individuals within a lineage using the 05_filter_vcf_fast.py script downloaded from https://github.com/enormandeau/stacks_workflow (see Table S1 for a summary of filtering values). Also during this step, individual genotypes at a locus with a total read depth < 4 (i.e., minimum allele depth) were reclassified as missing data following Dorant et al.20.

Putative CNV identification

We used the HDplot method to detect duplicated SNPs representing putative CNVs from the above dataset following methods initially described by McKinney et al.41 and modified by Dorant et al.20. Putatively duplicated loci were identified using a combination of four parameters calculated for each locus: proportion of heterozygotes; inbreeding coefficient; median allele ratio for heterozygotes; and proportion of rare homozygotes. Each parameter was calculated using the 08_extract_snp_duplication_info.py script downloaded from https://github.com/enormandeau/stacks_workflow. We plotted the four parameters against each other and used graphical cut-offs to categorize loci based on their position in the distribution using a modified 09_classify_snps.R script downloaded from https://github.com/enormandeau/stacks_workflow. Putative CNVs were those loci categorized as duplicated, highly diverged, or had a high depth of coverage following Dorant et al.20 and Cayuela et al.21 (see Table S2 for summary of graphical cut-offs). We then extracted read depth for each putative CNV to use as a proxy for copy number20,41 using VCFtools v0.1.1540. To account for differences in sequencing effort across individuals, we normalized read counts using a trimmed mean of M-values method using the edgeR R-package42 following Dorant et al.20. Missing genotypes for an individual were imputed as the mean read depth for that locus across individuals within the same sample site.

Genotype-environment association analyses

We performed several genotype-environment association (GEA) analyses to identify CNVs with putative links to climate adaptation. For this, we downloaded climate data for 27 variables from ClimateNA43 (Table S3) and separated them into three climate variable categories: temperature (n = 16); precipitation (n = 6); and solar radiation (n = 5). We separated variables into categories to assess the relative impact of temperature, precipitation, and solar radiation independently and to minimize redundancy across categories due to multicollinearity. To further reduce multicollinearity, we calculated correlation coefficients between each pair of climate variables within each variable category; for each pair with |r|> 0.70, we removed the variable with largest mean absolute correlation using the findCorrelation function as part of the caret R-package44.

We first used partial redundancy analysis (pRDA) to detect CNVs with climate associations separately for each climate variable category using the vegan R-package45. Climate variables were assigned as predictors with normalized read depth as the response. Due to sequencing batch effects detected in the normalized read depth matrices (Figure S1), we included the sequencing library ID for each individual as a covariate. To validate the absence of multicollinearity, we calculated the variance inflation factor (VIF) for each predictor variable and removed those with VIF > 1046. We assessed model significance using global and marginal analyses of variance (ANOVAs) with 1000 permutations and retained all significant (p ≤ 0.05) axes for downstream analysis. In the event of no significant axes, only the first RDA axis was retained for outlier detection. Climate-associated CNVs (i.e., outlier loci) were then classified as those with a loading > 2.25 standard deviations from the mean loading along each retained RDA axis (p ≤ 0.01)47.

We also used linear mixed-effect models (LMM) to detect climate-CNV associations for each climate variable category using the lme4 R-package48. We used log-transformed normalized read depth as the response variable and the same climate variables as used for the pRDA as fixed effects; we included the sequencing library ID as a random intercept to account for batch effects. We assessed significance of each climate variable using the likelihood ratio test (LRT) as implemented by the drop1 function in the lmerTest R-package49. To control for false positives, we corrected p-values using the Benjamini–Hochberg false discovery rate method with a corrected significance threshold of α = 0.05. As a final step to reduce false positives, we retained only those loci detected as outliers in both the LMM and pRDA and classified these as “robust outliers”. Finally, we re-ran the pRDAs using the methods above with only the robust outliers to estimate the amount of variation explained by the climate variables for these loci.

Genetic differentiation and adaptive divergence

We estimated pairwise genetic differentiation using the variant fixation index VST50 via a custom R function, which is analogous to the estimator of population differentiation θ51, and is commonly used to identify differentiated CNVs between populations20,50. Using the robust outliers for each climate category, VST estimates were calculated on a per locus basis and averaged across each pair of sites to obtain mean pairwise estimates. We assessed significance using a bootstrap resampling procedure implemented in the boot R-package52 for mean pairwise VST estimates across 10,000 replicates.

We also examined patterns of adaptive divergence within lineages using a hierarchal clustering approach. Using the robust outliers for each climate category, we generated a matrix of pairwise genetic distance by calculating Bray–Curtis distances for each pair of individuals using the ecodist R-package53 then calculated the mean distance for each pair of sites. Using this distance matrix as an input, we performed a hierarchal clustering analysis employing the Ward’s minimum variance method54 using the hclust R-function. The resulting dendrograms were bootstrapped with 10,000 replicates to assess robustness using boot.phylo function in the ape R-package55.

To assess the impact of geography on adaptive divergence, we performed linear regressions on each robust locus with either elevation or latitude as the explanatory variable and normalized read depth as the response. Linear regressions were performed in R using the base packages56.

Annotation of putatively adaptive variants

To assess the putative functional implications of climate-associated CNVs, we annotated all robust outliers using the Ensembl Variant Effect Predictor v103.157 and identified CNVs found within protein-coding genes. We then performed a literature search to explore the function of genes that had linked CNVs from across multiple lineages, were linked to multiple CNVs, or were linked to CNVs with significant associations to multiple climate variable categories.

Results

SNP identification and CNV detection

Initial SNP calling resulted in 359,569–1,170,051 loci genotyped per lineage (mean = 784,092 loci) with 60,383–150,978 loci remaining post-filtering (mean = 94,428 loci; Table S1). From the filtered datasets, we identified between 2208 and 9585 putative CNVs per lineage with a mean sequencing depth of 11.0x ± 3.62 SD (ranging from 4.3x  to 118.7x) for downstream analysis (Table S1). Mean missing data were 4% ± 4.2 SD per locus (range: 0–30%).

GEA analyses and robust outliers

After removing colinear variables, we retained two to three variables within each climate variable category within each lineage (Table S4). For the pRDAs, all models were significant (p < 0.05) for all climate variable categories and all lineages except for the solar radiation model in the CU lineage (Table S4). Each RDA had at least one significant axis (p < 0.05) except for the CRM precipitation model (pRDA1 = 0.072), the SRM solar radiation model (pRDA1 = 0.120) and the CU solar radiation model (pRDA1 = 0.233); for these models, we still retained the first pRDA axis for outlier detection. The temperature model explained the most variation within each lineage (adjusted r2 = 0.018–0.064) except for SRM (adjusted r2 = 0.015), which had the precipitation model explaining the most variation (adjusted r2 = 0.017). We detected between 35 and 468 climate-associated CNVs for each of the pRDAs after removing duplicate loci (Table S4; Figs. S2–S7).

For the LMMs, we detected between 46 and 216 unique loci associated with at least one climate variable within each lineage (Tables S5–S10; Figs. S2–S7). We found the most significant associations with temperature for the NRM (79 unique loci), CSC (139 unique loci), and CU (116 unique loci) lineages, while precipitation had the most for the SRM (29 unique loci) and SN (112 unique loci) lineages. Solar radiation had the largest number of outlier loci for the CRM lineage (28 unique loci; Tables S5–S10; Fig. S2–S7).

We found from 37 to 193 total robust outliers detected by both methods within each lineage (Figs. S2–S7). Repeating the pRDAs using only the robust outliers for each climate variable category resulted in substantially greater variation explained by climate (between 19.8 and 59.1%; Table 1; Figs. 2, 3, 4). All models were highly significant across all climate variable categories and lineages (p < 0.001; Table 1; Figs. 2, 3, 4).

Table 1 Summary values for redundancy analysis (RDA) and ANOVA performed on robust outlier loci for six American pika (Ochotona princeps) lineages.
Figure 2
figure 2

Adaptive divergence of robust temperature outliers in the American pika (Ochotona princeps). Partial redundancy analysis (left) was run using the vegan R-package45; sample sites are indicated by colour and correspond to the sample site numbers on the dendrograms (middle) and heatmaps (right). Sample site numbers correspond with those found on Fig. 1. Dendrograms (middle) were created by hierarchal clustering of Bray–Curtis distances with bootstrap values shown on the nodes; for clarity, only values > 80 are shown. Heatmaps (right) display pairwise population differentiation (VST); shaded tiles indicate non-significant results from a bootstrapping resampling procedure with 10,000 replicates (p > 0.05). NRM Northern Rocky Mountains, CRM Central Rocky Mountains, SRM Southern Rocky Mountains, CSC Cascades, SN Sierra Nevada, CU Central Utah.

Figure 3
figure 3

Adaptive divergence of robust precipitation outliers in the American pika (Ochotona princeps). Partial redundancy analysis (left) was run using the vegan R-package45; sample sites are indicated by colour and correspond to the sample site numbers on the dendrograms (middle) and heatmaps (right). Sample site numbers correspond with those found on Fig. 1. Dendrograms (middle) were created by hierarchal clustering of Bray–Curtis distances with bootstrap values shown on the nodes; for clarity, only values > 80 are shown. Heatmaps (right) display pairwise population differentiation (VST); shaded tiles indicate non-significant results from a bootstrapping resampling procedure with 10,000 replicates (p > 0.05). NRM Northern Rocky Mountains, CRM Central Rocky Mountains, SRM Southern Rocky Mountains, CSC Cascades, SN Sierra Nevada, CU Central Utah.

Figure 4
figure 4

Adaptive divergence of robust solar radiation outliers in the American pika (Ochotona princeps). Partial redundancy analysis (left) was run using the vegan R-package45; sample sites are indicated by colour and correspond to the sample site numbers on the dendrograms (middle) and heatmaps (right). Sample site numbers correspond with those found on Fig. 1. Dendrograms (middle) were created by hierarchal clustering of Bray–Curtis distances with bootstrap values shown on the nodes; for clarity, only values > 80 are shown. Heatmaps (right) display pairwise population differentiation (VST); shaded tiles indicate non-significant results from a bootstrapping resampling procedure with 10,000 replicates (p > 0.05). NRM Northern Rocky Mountains, CRM Central Rocky Mountains, SRM Southern Rocky Mountains, CSC Cascades, SN Sierra Nevada, CU Central Utah.

Genetic differentiation and adaptive divergence

During preliminary analyses, we saw minimal to no evidence of population structure (Figure S1) and only weak population differentiation (all pairwise VST < 0.07) within each lineage using all putative CNVs. Using the robust outliers, we found consistent elevational and latitudinal patterns of population structure and population differentiation across analyses, though with slight differences between lineages and climate variable categories (Figs. 2, 3, 4, 5). Within the NRM lineage, we detected two main population clusters that largely followed latitudinal gradients (Figures S8S10), though there were also weak clinal trends between normalized read depth of the top correlated outlier loci and elevation (Figures S11S13). Population structure was less apparent within the CRM lineage, especially for the temperature (Fig. 2) and solar radiation (Fig. 4) outliers, though there were noticeable elevational patterns among precipitation outliers (Fig. S12). For the SRM lineage, we saw consistent patterns in population structure between temperature (Fig. 2) and precipitation (Fig. 3), with slightly different clustering for solar radiation (Fig. 4); however, we did see a distinct pattern between elevation (Figs. S11–S13) and latitude (Figures S8S10) and normalized read depth. Temperature outliers seemed to be strongly correlated with elevation and latitude in the CSC lineage, while precipitation outliers seemed to be primarily associated with elevation, and radiation outliers followed latitudinal gradients (similar to temperature; Fig. 5). Structure within the SN lineage followed both elevational and latitudinal gradients for temperature and solar radiation outliers (Figs. 2, 4, 5); there was still distinct clustering by sample site using the precipitation outliers (Fig. 3), though there was no consistent pattern between either latitude or elevation and read depth (Figs. S9, S12). Population structure and differentiation most clearly followed elevational and latitudinal gradients in the CU lineage for all variable categories (Figs. 2, 3, 4, 5).

Figure 5
figure 5

Boxplots showing the distribution of r2 values from linear regressions among robust climate-associated outlier loci in the American pika (Ochotona princeps). We performed independent linear regressions on each locus with either elevation or latitude as the explanatory variable and normalized read depth as the response. NRM Northern Rocky Mountains, CRM Central Rocky Mountains, SRM Southern Rocky Mountains, CSC Cascades, SN Sierra Nevada, CU Central Utah.

Annotation of adaptive variants

We found that 207 of 508 unique robust outliers were located within introns, exons, or untranslated regions of protein-coding genes, with hits to 158 unique genes (Table S11). Temperature outliers had the most gene hits (n = 85 unique), followed by precipitation (n = 65 unique) and solar radiation (n = 55). Of these genes, 31 had associations across climate variables, 25 had associations across multiple CNVs, and five had associations across multiple lineages (Table S11). Additionally, one gene (CHCHD3) that had two associated CNVs was identified independently in both the NRM and CU lineages (Table 2, Table S11). Following our literature review, we identified 12 genes that had putative implications for local adaptation including those with functions regarding: mitochondrial structure and function (i.e., CHCHD3, FBXL3, MCU); immune response (i.e., DOCK1, HPSE2, ONECUT1); transcription (i.e., MED12L); hemoglobin structure and function (i.e., HPSE2); response to hypoxia (i.e., LOC101529014, FBXL3); olfaction (i.e., LOC101517538, LOC101532007); and DNA repair (i.e., LOC101529014, HUS1, MACROD1; Table 2).

Table 2 Subset of gene annotations for robust climate outliers detected using GEA analyses on six American pika (Ochotona princeps) lineages.

Discussion

Structural variation and local adaptation

Various environmental factors can serve as drivers of local adaptation. For example, Muir et al.58 found that temperature significantly impacted larval period and growth rate in the common frog (Rana temporaria) distributed over an elevational gradient. Precipitation has been shown to be one of the most significant drivers of adaptation and natural selection on both continental and global scales59,60. High levels of solar (UV) radiation have led to rapid, convergent evolution of genes related to DNA repair in species residing on the Qinghai-Tibetan Plateau, the highest plateau on the planet28. Studying both the genetic and phenotypic impacts of environmental factors within and among species can improve understanding of biodiversity, speciation, and adaptation to heterogenous and changing landscapes.

There is a growing body of evidence that changes in genome structure may be an even more significant source of evolutionary potential than other, more well-studied markers such as SNPs10,12,61,62,63,64. Here, we found that structural variation in the form of copy number variation may be associated with local adaptation in the American pika. Specifically, we observed that putatively adaptive variation in this system largely followed elevational and latitudinal gradients. In the southwestern lineages (SN, CU), putatively adaptive variation was most strongly associated with elevational gradients, particularly with temperature (Figs. 2, 3, 4, 5). Populations of pikas in these regions are often limited to higher elevations, likely due to higher temperatures25,26; in fact, both recent and historical population declines have been documented in southwestern portions of the American pika range, primarily at lower elevations, and have been linked to warmer climates65,66,67,68. In the northern lineages (NRM, CSC), putatively adaptive variation closely followed latitudinal gradients, with elevational patterns also evident in the CSC lineage (Figs. 2, 3, 4, 5). Due to cooler temperatures at northern latitudes, pikas in these lineages can occupy a greater range of elevations and can be found as low as sea-level69,70. Additionally, the strength of latitudinal temperature gradients tends to increase moving northward from the tropics71 and could explain the difference in the effect of latitude between the northern and southern lineages. Furthermore, the northern lineages span a larger latitudinal gradient than the southern lineages, suggesting latitude has a greater potential to correlate with genetic variation in these regions. On the other hand, several studies have found that elevation plays a significant role in shaping putatively adaptive variation in northern populations of the American pika within the CSC lineage31,32,72, indicating that both spatial factors likely influence local adaptation in this system.

In contrast to the other southern lineages, we saw strong, latitudinal trends among outlier loci in the SRM lineage, though solar radiation in this lineage also strongly correlated with elevation (Fig. 5, Figs. S8–S10). Samples in this lineage were collected at high elevations (all > 3150 m) over a relatively small elevational gradient (~ 500 m difference between lowest and highest sites), which could explain the minimal impact of elevation in this lineage. The CRM lineage did not display any clear patterns and had significantly less structure when compared to the other lineages (Figs. 2, 3, 4, 5), possibly due to a narrow and intermediate latitudinal distribution that could limit differences in climate among sample sites. Alternatively, the genetic variation resulting from the relatively disjunct distribution between sample sites in this lineage, particularly in the case of site 28, may be masking signals of climate adaptation. This lineage is also the most recently diverged of the six phylogroups29 and may have had insufficient time post-divergence for selection to leave a significant signature of adaptive variation. Nevertheless, these patterns highlight the importance in sampling over appropriate environmental gradients to detect local adaptation73.

Potential relationships between CNVs, gene expression, and phenotype

Copy number variation can have significant phenotypic and adaptive consequences22,74,75,76. CNVs are the most abundant form of structural variation, accounting for up to 10% of the total length of the human genome. Moreover, they even occur more frequently than SNPs50,77,78,79 and can serve as a significant source of selectable material, particularly when genes are located within a CNV10,12,14,15. CNVs can directly alter gene expression through increases/decreases in copy number, cause the inactivation of genes via duplication, and lead to amino acid changes and/or reading frame shifts when genes are only partially covered by a variant. In fact, CNV-linked genes more often have functions related to environmental response compared to basic cellular processes74. Copy number variation can also lead to high elevation adaptation. A study involving five species of domestic livestock found that mtDNA copy number decreased among high elevation populations compared to low elevation populations, which was hypothesized to be the result of chronic hypoxia80. Copy number variation has also been linked to high elevation adaptation in a number of other species, including yak81, ground tit82, and humans83. Further investigations of CNVs and high elevation environments may lead to a more thorough understanding of adaptations to hypoxia, cold stress, and UV radiation.

We found numerous CNVs within genes with putative links to climate and high elevation adaptation related to mitochondrial function, response to hypoxia, and DNA repair. CHCHD3 had the greatest support for being under selection of all the annotated genes, as it was detected across all variable categories, several lineages (NRM, CSC, and CU), and multiple CNVs (n = 6; Table 2). This gene is critical in the formation of mitochondrial cristae. In fact, in vitro knockdown of CHCHD3 results in significantly reduced oxygen consumption and glycolytic rates84,85, indicating that this gene could have potential consequences for cold tolerance and adaptation to hypoxia. FBXL7, another gene identified in our study as a potential target of selection, is also an important regulator of mitochondrial function. Expression levels of FBXL7 were down-regulated under hypoxic conditions in the marine medaka86, and constitute an important predictor of the severity of asthma symptoms in humans87. We also found one CNV annotated to the mitochondrial calcium uniport (MCU), which is an integral component of the mitochondrial inner membrane88. These results are consistent with previous work that identified functional enrichment and positively selected genes associated with mitochondrial structure and function in the American pika reference genome33,34, providing further evidence for adaptation to high elevation environments in this species.

In addition, we found several genes associated with hypoxia response to be potentially under selection. The gene HPSE2 encodes for the enzyme heparanase-2, which plays a role in extracellular matrix remodelling as well as embryo implantation. HPSE2 has also been linked to hemoglobin-related traits, including fetal hemoglobin in North African human populations, which could have potential implications for adaptation to hypoxia89,90. We found that a predicted TRRAP ortholog (LOC101529014) also may be under selection. This gene is part of the INO80 family of chromatin remodelers, which appear to have putative links to the response to hypoxia by interacting with hypoxia inducible factor-191,92.

We further found that several genes related to DNA repair may be under selection. For example, TRRAP (discussed above) is also involved in DNA repair by binding with the MRN-complex to detect and repair double-strand breaks (DSBs)93,94. Knockout and knockdown of TRRAP results in the reduced efficiency and precision of end-joining following DSBs in mice and HeLa cells, suggesting this gene plays an important role in DSB signalling and repair94. The gene HUS1 is part of the Rad9-Hus1-Rad1 complex, an important component of the DNA repair pathway. This complex loads onto damaged chromatin (for example, from UV exposure), promoting DSB repair95. Again, these results are consistent with previous studies showing putative adaptation to increased UV radiation at high elevations in American pika33,34.

Other putatively adaptive genes

We found several other genes with putative links to local adaptation in the American pika. For example, the gene MED12L—a transcriptional coactivator of RNA poly II-dependent genes—was significantly associated with temperature in the NRM, CSC, and SN lineages (Table 2), and has been linked to elevational gradients in North American deer mice96 as well as mean annual temperature in Mediterranean cattle breeds97. We also found two genes, LOC101532007 (olfactory receptor 147-like) and LOC101517538 (vomeronasal type-2 receptor 116-like), which encode for olfactory receptors that could have an impact on foraging. American pikas do not hibernate; rather, they remain active throughout the winter and cache food into “hay piles” to ensure adequate food supplies98. These hay piles often consist of the highest quality vegetation available, likely detected and assessed via olfaction99; additionally, many of these cached foods contain high levels of secondary compounds that can help preserve biomass and nutrient availability into the winter months100. Differences in vegetation quality could also be linked to variation in precipitation, though the CNVs detected here were significantly associated with temperature. Evidence for putative adaptations related to olfaction has been previously shown in the American pika genome34.

We found further evidence for adaptations related to immune response. DOCK1 is a gene required for phagocytosis of apoptotic cells and has been linked to immune response and climate adaptation in Middle Eastern sheep101 and disease resistance in dolphins102. HPSE2 (discussed above) is also involved in the immune response, and expression levels of this gene were significantly associated with white blood cell count in pigs103. Similarly, ONECUT1 is involved in B cell differentiation and has been linked to local adaptation in Ethiopian cattle104 and three-spined stickleback105. American pikas experience relatively high levels of parasitism and may be experiencing a spillover of parasites from other small mammals, particularly rodents, which could result in immune response adaptations106,107,108. Additionally, some populations of American pikas may be physiologically stressed due to harsh environmental conditions109,110,111,112,113; enhanced immune response may confer a greater ability for this species to survive and thrive.

Limitations and future directions

Although we identified putatively adaptative variation in CNVs within the American pika, there were limitations to the approach employed in this study. We used climate data downloaded from ClimateNA which comes with several considerations43. First, this dataset uses climate data collected from 4891 weather stations distributed throughout North America to interpolate climate variables for any given set of coordinates, adjusting for elevation43. While this method allows for the estimation of climate data anywhere on the continent without requiring direct sampling, it also introduces potential error to climate variables. Wang et al.43 found that temperature estimates using this method generally correlated with direct measurements, whereas precipitation varied considerably more; additionally, elevation affected the accuracy of climate variables, with mountainous regions having lower accuracy in point estimates compared to flat regions. Second, this dataset measures total solar radiation of which harmful UV radiation only accounts for a small percentage (~ 5%)114,115. The relative contribution of UV radiation to total solar radiation also varies with elevation. However, both total solar radiation and UV radiation increase with elevation; as such, total solar radiation could operate as a crude proxy for UV radiation in lieu of a better estimate. These limitations could explain why both precipitation and solar radiation were relatively less correlated with copy number variation than temperature in this study. Lastly, these data only include ambient temperatures and do not account for the microclimates found underneath the talus that American pikas use for behavioural thermoregulation to prevent over-heating116. Recent findings suggest that subsurface microclimates important for pika thermoregulation may have changed at a faster rate than ambient temperatures over the past few decades in the Southern Rocky Mountains117. Consequently, the sole reliance on ambient temperature estimates may mask potential associations between CNVs and microclimate variation, which represents a potentially interesting avenue for future inquiry.

Our initial study design included plans to detect CNVs using a range-wide dataset. To take advantage of existing sequencing data, we used the HDplot method to detect CNVs13,20,41. This method visually detects putative paralogous loci by plotting several heterozygosity and genetic diversity metrics, and qualitatively identifies those that deviate from a central distribution. However, American pikas generally have low heterozygosity as a species at a range-wide level38,39, meaning standard SNP filtering procedures remove many variants. We found that the variants that remained showed very low levels of polymorphism with minimal deviations from the central distribution, greatly inhibiting our ability to confidently and accurately call putative CNVs using the HDplot method. To improve upon the ability to detect CNVs on a range-wide rather than within lineage scale, future work could employ whole genome resequencing that would provide a greater breadth of coverage and offer the added capability of detecting additional classes of structural variants, such as inversions and translocations118 that may be more directly associated with climate adaptation in the American pika.

Conclusions

Here, we present a novel analysis of local adaptation in the American pika based on copy number variation. We found that CNVs were significantly associated with temperature, precipitation, and solar radiation within each lineage, and trends in putatively adaptive variation largely followed elevational and latitudinal gradients. Additionally, we identified several genes related to putative high elevation and climate adaptation that could serve as important targets in future studies, including those explicitly involving gene expression. Overall, our work adds to a growing body of literature revealing the novel insights that may be obtained by explicitly examining structural variation in the genome for investigating species-level responses to changing environments20,119. With climate change significantly altering habitats worldwide, a fuller understanding of how organisms may respond, including sentinel species such as the American pika65,112, will be critical for maintaining global biodiversity moving forward.