Abstract
Ecogeographic rules denote spatial patterns in phenotype and environment that may reflect local adaptation as well as a species’ capacity to adapt to change. To identify genes underlying Bergmann’s Rule, which posits that spatial correlations of body mass and temperature reflect natural selection and local adaptation in endotherms, we compare 79 genomes from nine song sparrow (Melospiza melodia) subspecies that vary ~300% in body mass (17 − 50 g). Comparing large- and smaller-bodied subspecies revealed 9 candidate genes in three genomic regions associated with body mass. Further comparisons to the five smallest subspecies endemic to California revealed eight SNPs within four of the candidate genes (GARNL3, RALGPS1, ANGPTL2, and COL15A1) associated with body mass and varying as predicted by Bergmann’s Rule. Our results support the hypothesis that co-variation in environment, body mass and genotype reflect the influence of natural selection on local adaptation and a capacity for contemporary evolution in this diverse species.
Similar content being viewed by others
Introduction
Explicating correlations among phenotypes and environmental variables is a historic focus of theoretical and empirical research on natural selection, local adaptation, and speciation1,2,3,4. More recently, questions about local adaptation and the adaptive capacity of species to persist given environmental change have reinvigorated efforts to understand the micro-evolutionary processes involved. For example, correlations between ambient temperature, phenotype, and life-history traits affecting fitness and subject to natural selection are widely reported in vertebrates2,5,6,7,8 and summarized as ecogeographical rules9. Of particular relevance to climate adaptation, Bergmann’s rule posits an inverse correlation between ambient temperature and body size in endotherms10 and has garnered substantial empirical support (birds:11,12,13; mammals:14,15). If such patterns that follow Bergman’s rule reflect local adaptation in traits affecting fitness, such as thermal tolerance, we expect genes underlying such trait variation to show signs of historical or ongoing selection and to vary predictably with phenotype.
Species with large ranges that encompass steep environmental gradients are excellent candidates in which to test for a genomic basis of local adaptation16. We focused on song sparrows (Melospiza melodia), which are among the world’s most variable species17,18 as indexed by the number of recognized subspecies (25), and which accordingly exhibit a stunning range of co-variation in phenotype, life history, and environment (e.g.,11,19,20). Song sparrows inhabit highly heterogeneous environments, spanning about 36 degrees of latitude (26 °N to 62 °N) and 15 °C in mean annual temperatures (ca. 1 °C to 16 °C). Because many phenotypic traits of song sparrows are known to have an additive genetic basis, respond to selection, and affect individual fitness (e.g.,21,22,23), we hypothesized that the phenotypic clines in body mass exemplified in song sparrows at least partly reflect local adaptation to environment. If so, we further suggest that identifying the genomic mechanisms underlying such patterns can inform us about the adaptive capacity of other species displaying local adaptation to heterogeneous environments.
Here, we test this idea by comparing 40 whole genomes of two large- (M. m. maxima, M. m. sanaka; mean mass [range]: 45.9 [41.9–50.0 g]) and two smaller-bodied subspecies (M. m. merrilli, M. m. rufina; mean mass [range]: 26.7 [21.9–30.9 g]) that breed in northern British Columbia and Alaska in order to evaluate the genomic landscape of divergence and identify candidate genes associated with body mass. Maxima and sanaka reside year-round from the Alaskan Peninsula to the Aleutian Islands, experience cold winters and are nearly twice the mass of rufina and merrilli, which experience warmer winters24 (Fig. 1). With candidate genes for body mass identified, we then use those candidates to predict genotype in 39 birds that represent the five smallest subspecies of song sparrows, all endemic to California and residing in or adjacent to San Francisco Bay (mean mass [range]: 19.2 [16.9–23.3 g]). By doing so, we demonstrate the wider utility of identifying candidate genes that are targets of selection and which contribute to local adaptation in phenotypically variable species that occupy a diverse range of environmental conditions and geographic scales.
Results
Phenotypic correlations and genomic divergence
Body mass (g) and mean winter and summer temperatures (°C) were strongly negatively related in the subspecies we studied (Fig. 1), consistent with Bergmann’s rule. At a genome-wide scale, we observed marked differentiation and unambiguous clustering between all four northern subspecies, clearly distinguished as large- (maxima and sanaka) and smaller-bodied (merrilli and rufina) subspecies (11,223,039 SNPs; PC1: 19.93% and PC2: 6.76% of genomic variation; Supplementary Fig. 1). This pattern holds when the southern small-bodied subspecies from San Francisco Bay are included, with a PCA delineating body size on PC axis 1, which accounted for 12.48% of genomic variation (PC2: 3.48%; Fig. 1b). Population structure at K = 2 had the lowest cross-validation error, corresponding to grouping of the large- versus smaller-bodied subspecies; but structuring was apparent at K = 4 (ADMIXTURE; Supplementary Fig. 2). Genome-wide estimates of FST ( ± SD) indicated substantial divergence, with pairwise genome-wide values from 0.035 ± 0.026 in the control comparisons (merrilli-rufina) to 0.247 ± 0.112 (maxima-merrilli; Supplementary Table 1). Similarly, the genome-wide absolute genetic divergence (Dxy) was lowest between the control comparisons (mean ± SD; maxima-sanaka: 0.292 ± 0.087) and highest between maxima-merrilli (0.425 ± 0.046). Overall, the strongest signals of genetic differentiation were best explained by body size.
Identification of candidate genes
We identified a total of 25 elevated windows of FST that were shared by at least two pairwise comparisons of large- and smaller-bodied subspecies (i.e., windows exhibiting FST estimates in the 99.9th percentile of the genome-wide mean; Fig. 2). Several of these shared elevated 50 kb windows contained annotated genes (13 genes: sanaka and merrilli; 10: sanaka and rufina; 18: maxima and rufina; 19: maxima and merrilli). Overall, the mean FST values in the 50 kb outlier windows were elevated for each of the four pairwise comparisons (Mean FST: sanaka-merrilli = 0.671; sanaka-rufina = 0.625; maxima-rufina = 0.645; maxima-merrilli = 0.828), relative to low genome-wide FST averages (sanaka-merrilli = 0.159; sanaka-rufina = 0.176; maxima-rufina = 0.205; maxima-merrilli = 0.247; Fig. 2).
To specifically identify putative genes under selection for body size, we focused on genes that were shared between all four large- vs smaller-bodied comparisons (i.e., “candidate genes”). Six genes were shared by two subspecies comparisons (AIDA, TPPP, USP16, LTN1, C3ORF52, and unknown gene ENSTGUP00000002762), and two shared by three comparisons (TRIP13 and BRD9; Supplementary Table 2). Notably, across the shared 50 kb windows, 9 candidate genes were identified in all large- and smaller- bodied pairwise comparisons (FBXW2, GARNL3, RALGPS1, ZBTB34, ANGPTL2, ZBTB43, COL15A1, TGFBR1, TAF1A), with six candidate genes occurring on the same scaffold (contig 391/chr 17; Supplementary Table 2). None of these candidate genes were shared in control comparisons between the two larger (maxima and sanaka) and smaller (merrilli and rufina) subspecies pairs, as expected if these differentiated regions arose via divergent selection on body size (Fig. 3; Supplementary Table 2).
Signatures of selection
To search for evidence of local adaptation in body size, we calculated the composite likelihood ratio (CLR) test statistic to test for the presence of selective sweeps on the 3 contigs containing the 9 candidate genes (Fig. 3). Selective sweeps were evident on all contigs (>99th percentile of contig-wide mean; Fig. 3). We also measured nucleotide diversity (π) and Tajima’s D across 50 kb windows in all four northern subspecies to characterize genetic diversity and detect evidence of selection. Commensurate with our hypothesis of natural selection, nucleotide diversity was reduced in the regions of the 9 candidate genes shared among all four comparisons (maxima = 0.0001, sanaka = 0.0002, merrilli = 0.0002, rufina = 0.0005), whereas the genome-wide average was an order of magnitude higher (maxima = 0.0016, sanaka = 0.0022, merrilli = 0.0032, rufina = 0.0026) (Supplementary Figs. 3–5). Tajima’s D was also reduced in candidate regions, especially in maxima (−0.332), sanaka (0.229), and merrilli (−0.789), as compared to genome-wide means (0.851, 0.892, and 0.889, respectively), but not in rufina (DTaj candidate = 1.038; DTaj mean = 1.219). Dxy estimated in candidate regions was also elevated in size-related pairwise comparisons (maxima-merrilli = 0.899, maxima-rufina = 0.710, sanaka-merrilli = 0.844, sanaka-rufina = 0.771) compared to controls of within-size comparisons (maxima-sanaka = 0.032, merrilli-rufina = 0.233; Supplementary Fig. 6). Taken together, these results provide evidence for the presence of selective sweeps and imply that the strength and direction of selection may vary among subspecies.
Validation of candidate genes
Given the identification of 9 genes associated with body size and shared by the 4 northern subspecies, we next tested if these candidate genes could be used to predict allele frequency commensurate with the smaller-bodied phenotype in the 5 smallest subspecies of song sparrows endemic to California (mean mass [range]: 19.2 [16.9–23.3 g]), for which we had prior genomic data25. Allele frequency of the non-reference allele was assessed across the 467 SNPs located within the 9 candidate genes noted above revealed 8 SNPs that were highly correlated with mass (r > 0.90; p < 0.001; Fig. 4). Four of these SNPs were located within RALGPS1 (one of which is also in ANGTPL2), 1 SNP in GARNL3, and 3 SNPs in COL15A1 (Fig. 4). One SNP on RALGPS1 (position 57980) falls within the coding region of the gene, while the remaining SNPs are located within non-coding regions of genes. Notably, non-reference allele frequencies of the 8 candidate SNPs were also highly negatively correlated with average winter and summer temperature (range r = −0.79 to −0.85; p < 0.05; Fig. 4).
We observed consistent positive regressions of mass on non-reference allele frequency at each of the 8 focal SNPs (range: F(1,7) = 31.96–142.66; range R2: 0.820–0.953; p < 0.001; Supplementary Table 3). In contrast, regressing temperature on allele frequency revealed negative relationships at each of the focal SNPs (range: F(1,7) = 11.38–18.15; range R2: 0.619–0.722; p < 0.05; Supplementary Table 4). We used a partial Mantel test to further explore whether these relationships reflect a history of selection versus neutral processes using the 3 contigs containing the 9 candidate genes. Pairwise genetic distance and phenotype were significantly correlated across all 9 subspecies after controlling for geographic distance (partial Mantel test: r = 0.183; p = 0.004).
Per-site FST estimates for the 8 focal SNPs noted ranged from 0.63 to 0.96 among all large- and smaller-bodied subspecies (mean: 0.713–0.763, while control comparisons ranged from 0 to 0.272 (mean: 0.003–0.081; Supplementary Table 5). Across the 9 candidate genes, the percentage of fixed or nearly fixed SNPs between the large- and smaller-bodied subspecies ranged from 3.0 to 33.8%, whereas 0% of control comparisons contained fixed SNPs (Supplementary Table 5).
A phylogenetic tree reconstructed from individual genotypes of the 8 focal SNPs and 1–3 additional SNPs immediately up and downstream (54 total SNPs; Supplementary Fig. 7; see Methods) was consistent with the strong correlation between body size and non-reference allele frequency across subspecies. Across the nine subspecies, three clusters formed largely reflecting body size and geography, comprising two subspecies of intermediate mass (rufina, merrilli), two large Aleutian endemics (maxima, sanaka), and five small California endemics (gouldii, heermanni, maxillaris, pusillula, samuelis). Whilst these findings suggest compelling support for our hypothesis that historical and/or ongoing natural selection has contributed to genetic differentiation of body size between subspecies, lower coverage for the San Francisco Bay populations may limit our ability to fully resolve heterozygotes and should be validated using higher coverage, phased sequence data.
Discussion
We characterized 9 candidate genes associated with body mass in 4 subspecies of large- and smaller-bodied song sparrows that breed at the northern extent of the species’ range, and then used those candidates to predict genotype in 39 individuals from 5 small-bodied subspecies endemic to California that are ~50–70% smaller than subspecies that reside year-round in Alaska. Bergmann10 explained the tendency for endotherms to be larger in colder environments as an adaptation to minimize heat loss, though mechanisms remain uncertain13,26. Our observations of fixed and shared genomic differences, their co-variation with climate and body mass, and evident signatures of selection support Bergmann’s Rule as influential in song sparrows reflect a capacity for and history of local adaptation to environment in this species. Moreover, because many other traits exhibit substantial additive genetic variation23,27, respond to selection21,28, co-vary with environment11,25, and affect individual fitness in this species23, we interpret these cumulative results as suggesting that song sparrows exhibit substantial capacity for contemporary evolution via context-dependent selection on individual phenotype and life history20,29.
We observed multiple regions of elevated divergence in the genomes of large- vs smaller-bodied song sparrows versus a relatively homogenous background (Fig. 2; Supplementary Fig. 6), extending prior demonstrations of local adaptation at micro- to macrogeographic scales20,25,30. Consistent clustering of genetic groups in the large- and smaller-bodied populations was evident in comparisons based on mass and subspecies identity (Fig. 1; Supplementary Fig. 1) but absent in shared regions of divergence in control comparisons among subspecies of similar size (Fig. 2; Supplementary Fig. 6). These contrasts also support our hypothesis that local adaptation in song sparrow is underlain, as least in part, by genes that play a causal role in the patterns Bergmann sought to explain.
Having identified candidate genes capable of differentiating large- and smaller-bodied subspecies in Alaska, we then validated these patterns by predicting the genotypes of 39 birds from the five smallest subspecies endemic to California. Of the 9 candidate genes in regions shared by all large- and smaller-bodied pairwise comparisons, four contained SNPs (r > 0.90) that predicted allele frequency of small-bodied subspecies in California (RALGPS1, COL15A1, GARNL3, and ANGTPL2; Fig. 4), and included genes linked to body mass index (BMI), height, and fat distribution in humans (Supplementary Table 2; see methods). RALGPS1 (mean FST: 0.797, range: 0.071–0.974) and COL15A1 (mean FST: 0.514, range: 0.010–0.945) similarly associate with percent and distribution of body fat, size and BMI in humans31,32. ANGPTL2 contained several highly differentiated SNPs correlated with body mass and temperature (mean FST: 0.917, range: 0.845–0.959). ANGPTL2 codes for angiopoietin-like protein 2 which has important roles in lipid, glucose and energy metabolism across taxa, and has also been found to be associated with human height and obesity33,34. In contrast, GARNL3 (mean FST: 0.791, range: 0.055–0.973) was informative in differentiating subspecies by body size and predicting the genotype of smaller-bodied song sparrows, but it has not yet been linked to phenotype in other species. Notably, there were high proportions of fixed or nearly fixed SNPs (FST > 0.95) within RALGPS1, GARNL3, and ANGTPL2 (18/151; 7/59; 6/14 SNPs, respectively). These results support further our supposition that some or all of the genes we identified are directly or indirectly associated with thermoregulatory capacity and influential in their contributions to the roughly 300% increase in the body mass of song sparrows observed from California to Alaska.
Although five of the candidate genes did not include SNPs highly correlated with body mass in small-bodied subspecies endemic to California, four of these genes have previously been associated with BMI, body fat mass, height, or Marfan syndrome in humans (e.g., FBXW2, ZBTB43, TGFBR1, TAF1A;35,36,37,38,39,40; Supplementary Table 2), whereas ZBTB34 appears to reflect a novel association to mass. Because complex phenotypes often arise via many genes of small effect41, it is likely that such smaller effects would be challenging to discern given our statistical approach, stringent identification of candidates, and modest sample sizes.
Despite strong associations between body mass and the candidate genes noted above, genetic drift, variation in recombination rate and genetic architecture, and selection on correlated traits are also expected to influence population-level variation in the phenotype of song sparrows across their North American range42. Thus, whilst genome scans clearly offer valuable insights into the origins of adaptive evolution43,44,45, more detailed genomic and field studies are needed to validate gene expression, the effects of natural selection on the adaptive capacity of populations, phenotype, and fitness, and to understand the consequences of immigration on genetic architecture. Methods such as QTL mapping of hybrid individuals in populations with precise pedigrees will be needed to elucidate the additional effects of chromosomal rearrangements, insertions, deletions, or duplications, on phenotype, plasticity, and the process of speciation23,29. Similarly, a more precise understanding of the influence of plasticity versus direct or indirect selection on traits underlying Bergmann’s or other ecogeographic rules is needed to estimate the capacity of populations to maintain fitness given environmental change13,26,46,47.
Song sparrows exhibit a stunning range of variation in external phenotype and life history that correlates closely with local environmental conditions11,18,19,20,25. Because many phenotypic traits that affect individual fitness have an additive genetic basis in this species21,23,28, it is plausible that spatial and temporal variation in natural selection have contributed substantially to local adaptation and divergence across its range11,18,19,20,25. Our results reveal strong evidence of natural selection acting on genes which appear to contribute to phenotypic variation in body mass among nine subspecies distributed over a dramatic environmental cline, and augment our prior work demonstrating local adaptation in the genomes of individuals across macro- to micro-geographic scales25,48. Furthermore, our novel approach to validating gene function strongly suggests that the same genes have experienced historic and/or ongoing selection in multiple populations across the species’ range. Our findings thus support and extend our prior results at local to landscape scales23,25,30, implying a substantial capacity for eco-evolutionary adaptation to environment change in the song sparrow, and reinforcing its value as a model species for understanding the genomic underpinnings of adaptive evolution.
Methods
Study system and sampling
To identify candidate genes underlying Bergmann’s rule, we intentionally sampled large- and smaller-bodied song sparrows from the northern extent of the range where these populations in close geographic proximity vary substantially in mass. In total, we sampled 40 male song sparrows representing two large-bodied subspecies, M. m. maxima (n = 12; mean ± SD = 46.9 ± 2.5 g) and M. m. sanaka (n = 8; 44.4 ± 1.1 g), and two smaller-bodied subspecies, M. m. merrilli (n = 8; 23.2 ± 1.2 g) and M. m. rufina (n = 12; 29.0 ± 1.1 g). Tissue samples for these individuals were provided by the University of Alaska Museum. All birds were collected between 1997 and 2000 (Supplementary Data 1). We then used these candidate genes to predict the genotypes of small-bodied subspecies from California: M. m. gouldii (n = 10; mean ± SD = 18.44 ± 0.84 g); M. m. heermanni (n = 8; 21.22 ± 0.95 g); M. m. samuelis (n = 6; 17.95 ± 0.70 g); M. m. pusillula (n = 9; 18.57 ± 0.95 g); and M. m. maxillaris (n = 6; 20.25 ± 1.08 g). Because extreme cold and heat are both expected to influence body size we used mean winter (Dec, Jan, Feb) and summer (Jun, Jul, Aug) temperature data, which were acquired from ClimateNA v.6.0049 for each of the sample locations to visualize the relationship between body mass and temperature (Fig. 1). All subspecies are year-round residents across their range except rufina and merrilli, which are considered to be partial migrants. At our sampling locations, however, rufina are known residents and merrilli are known migrants, therefore we estimated the mean winter temperature for merrilli based on Patten and Pruett18.
Whole-genome sequencing and variant discovery
Genomic DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen, CA, USA). DNA concentrations were quantified using the Qubit BR dsDNA Assay Kit (Life Technologies). Using 150 ng of DNA from each sample, we prepared individually barcoded libraries with a 300 bp insert size following the protocol for the NEBNext Ultra II FS DNA Library Prep Kit (Illumina, CA, USA). Libraries for M. m. sanaka and M. m. merrilli were sequenced on a single Illumina NextSeq lane at the Cornell Institute for Biotechnology core facility, while those of M. m. maxima and M. m. rufina were sequenced on a single Illumina NovaSeq lane at Novagene (The University of California at Davis campus). Whole-genome sequencing yielded a mean of 48,428,940 reads per individual in our British Columbia and Alaska populations.
We assessed library quality using FastQC v.0.11.8 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). We used AdapterRemoval v.2.1.1 for sequence trimming, adapter removal, and quality filtering (requiring a minimum Phred quality score of 30), and we merged overlapping paired-end reads. We aligned filtered reads to the song sparrow reference genome based on a small-bodied subspecies (M. m. morphna) from Southern British Columbia50 using the default settings in Bowtie2 v.2.4.251 and obtained alignment statistics from Qualimap v.2.2.152. Mean alignment rates for the four subspecies comparisons were 97.98%. Mean coverage and mapping quality were 7X (range: 4X–16X), and 18.62 (Table S3). We used SAMtools v.1.953 to convert all resulting SAM files to BAM files and to sort and index files. We used Picard Tools v.2.8.2 (https://broadinstitute.github.io/picard/) to add index groups and mark duplicates. We used mpileup module in Bcftools v.1.12 for SNP variant discovery and genotyping for all song sparrows and filtered out variants that were not biallelic, had minor allele frequencies less than 5%, mean coverage less than 2X or more than 50X, and more than 20% missing data. This resulted in a total of 13,089,663 SNPs across the four subspecies. The mean missing data across individuals was 16.9% (Supplementary Data 1). We identified three samples that exhibited 50% or greater relatedness to another individual. Because three (rufina, sanaka, and maxima) of our subspecies were sampled from islands and are likely to have higher levels of inbreeding54, this signal is likely biologically relevant. To assess whether retaining related individuals had any impact on population structure, exploratory PCA plots were run both with and without these individuals. This comparison produced similar results (Fig. 1, Supplementary Fig. 1); therefore, all downstream analyses were conducted with the full dataset, including related individuals.
Population genomic analyses
We visualized genetic clustering in the SNP dataset by performing a PCA using snpgdsPCA function in the SNPRelate package55 in R v.4.0.556. To quantify the level of genome-wide differentiation between subspecies, we calculated FST57 between populations using VCFtools v.0.1.1458 across 50 kb windows. Manhattan plots were generated using the Manhattan function in the qqman package in R59. Before plotting windowed FST estimates, we filtered out all scaffolds with fewer than four windows and less than 10 SNPs. We also used VCFtools to calculate inbreeding coefficients (FIS) for all individuals. We then calculated Tajima’s D, nucleotide diversity (π), and absolute genetic divergence (Dxy) using the popgenWindows.py script (S. Martin; https://github.com/simonhmartin/genomics_general) to further provide insight into the evolutionary processes that have shaped population-level genetic variation. Putative chromosomal locations of different scaffolds were obtained by aligning them to the zebra finch (Taeniopygia guttata) assembly (bTaeGut1_v1.p; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_003957565.1/; SAMN02981239) using the pseudo-chromosome scaffolding command in SatsumaSynteny60. To analyze patterns of genetic structure among the four subspecies, Admixture v.1.2361 analyses were run using a filtered dataset (1,989,848 SNPs) that contained no missing data and was pruned to avoid linkage using the script ldPruning.sh (https://github.com/speciationgenomics/scripts/blob/master/ldPruning.sh). We investigated one to four population clusters with 200 bootstrap resampling iterations.
Genome-wide divergence
We identified regions of elevated divergence among the four pairwise (or between size) comparisons including large- and smaller-bodied northern populations using the sliding window FST estimates. Elevated values of FST averaged over non-overlapping 50 kb windows were considered elevated if exceeding the 99.9th percentile of genome-wide mean value (i.e., >0.498 when comparing sanaka and merrilli, > 0.493 for sanaka and rufina, 0.507 for maxima and rufina, and 0.518 for maxima and merrilli; Fig. 2). We compiled a list of genes within these outlier windows using Geneious V.11.1.562. To characterize putative candidate genes, we used ontology information from the zebra finch Ensembl database63 the annotated song sparrow reference genome50, and functional information from the Uniprot database64 (Supplementary Table 2). We additionally compared the identified list of genes to known genes involved in recent analyses of body size or BMI in other species (see discussion; NHGRI-EBI GWAS catalog; https://www.ebi.ac.uk/gwas/).
To control for phenotypic differences between subspecies in traits that are not body size (e.g., plumage differences, migratory behavior), we compared the two smaller-bodied populations, M. m. merrilli and M. m. rufina, that also differ in migratory behavior as a measure of ensuring that the candidate genes are correlated with our trait of interest. We also compared the two large-bodied populations, M. m. sanaka and M. m. maxima as an additional control for our smaller- vs large-bodied comparisons. There were no overlaps in elevated peaks between our large-small bodied (between size) comparisons with those identified within our smaller-bodied or large-bodied (within size) control comparisons, increasing our confidence that we have identified genes related to body size variation as opposed to migratory behavior or other traits differing between subspecies.
Among our large- and smaller-bodied (between size) comparisons, we classified outlier regions as shared if two or more paired comparisons identified the same gene within 50 kb of an elevated window (Supplementary Table 2). We narrowed the candidate gene set by keeping only genes identified in all four comparisons between large- and smaller-bodied subspecies. Using this approach, we identified 9 genes that are shared in all of the between body size comparisons. Within these 9 genes, there are 467 variants across the four subspecies of interest.
Signatures of selection
We used the candidate gene set, containing the 467 SNPs, to test for evidence of selective sweeps, identify SNPs highly correlated with body mass, and calculate the percentage of fixed or nearly fixed SNPs (FST > 0.95; Supplementary Table 5). To identify regions under selection, we calculated the composite likelihood ratio (CLR) test statistic using the program SweeD v3.3.265. We determined the genotypes of all individuals by phasing and imputing missing data using Beagle v.3.3.266. We ran SweeD separately for each population and on individual contigs of interest using the default parameters except for using a window size of 200 bp. The window within each contig with the highest CLR statistic is considered the likely location of a selective sweep67.
Hypothesis testing for validation
To test the hypothesis that our narrowed set of 9 candidate genes is related to body size, we predicted the genotypes of more distantly related small-bodied subspecies (M. m. gouldii, M. m. heermanni, M. m. samuelis, M. m. pusillula, and M. m. maxillaris) from a different geographic region, the San Francisco Bay area of California. Whole-genome sequences for the 39 San Francisco Bay individuals were generated in a separate study25 using library preparation and sequencing methods as detailed above. For the whole-genome sequencing summary statistics of the 39 San Francisco Bay individuals see Mikles and colleagues25.
We first performed a partial Mantel test on the 3 contigs containing the 9 candidate genes across all 9 subspecies to assess whether patterns of divergence were due to neutral, including isolation by distance, or selective processes. Using the vegan68 R package, we tested for correlations between mass and genetic distance, while accounting for geographic (Euclidean) distance.
To see how strongly allele frequency correlated with body size across all 79 individuals, we produced a matrix of the frequency of the non-reference allele using vcftools (--freq2 recode option) for the 467 SNPs. We used Pearson correlation coefficient (r) to quantify the relationship between the non-reference allele frequency and body mass (g), and between non-reference allele frequency and average summer and winter temperature (°C) at each of the SNPs. We selected the top 8 SNPs with a r > 0.90 between non-reference allele frequency and mass: contig 391 (chr 17) positions: 19152, 57980, 61835, 86656, 122228; contig 3361 (chr 2) positions: 255216, 255218, 262569. We also performed linear regression analyses to further explore the relationships between allele frequency and mass and temperature at each of the focal SNPs. For each analysis, allele frequency served as the independent variable, while mass and temperature were the dependent variables. Model parameters, including regression coefficients, intercepts, confidence intervals, effect sizes, R-squared values, p-values, and leverage and influence statistics can be found in Supplementary Tables 3 and 4. Relationships were then visualized using ggplot269 (Fig. 4).
In addition to looking at correlation between subspecies-level allele frequencies and mass, we assessed correlation with phenotype and genotype for individuals (heterozygotes or homozygotes for the reference/alternate alleles at our focal SNPs). To do this, we used the R package vegan68 to calculate a distance matrix between the 8 SNPs identified above and included 1–3 additional SNPs immediately up and downstream of each focal SNP for visualization purposes (54 total SNPs). Less than three SNPs were used in cases where the SNP was located at the end of a contig or adjacent to another focal SNP. We then used the Ward algorithm as a grouping method to express the relationships between sites, which was implemented using the hclust function in vegan. We plotted the output using the plot.phylo function in the R package ape70 and visualized genotypes by constructing a modified heatmap in which the base pairs of each individual at a locus are represented in the two halves of a diagonally split pixel (Supplementary Fig. 7).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw sequencing data generated in this study have been deposited in the National Center for Biotechnology Information (NCBI) BioProject database under accession code PRJNA1013697. Raw sequencing data for the California song sparrow subspecies used in this study are available in the NCBI BioProject database under accession code PRJNA1018990. Source data are provided with this paper.
Code availability
Scripts and bioinformatic pipelines used in this study are available at https://github.com/kcarbeck/SOSP_body_size (https://doi.org/10.5281/zenodo.8365146).
References
Darwin, C. The Origin of Species (John Murray, 1859).
Mayr, E. Animal Species and Evolution. (Harvard Univ. Press, 1963).
Coyne, J. A. & Orr, H. A. Speciation: a catalogue and critique of species concepts. Philos. Biol. Anthol. 272–292 (2004).
Schluter, D. Evidence for ecological speciation and its alternative. Science 323, 737–741 (2009).
Koski, M. H. & Ashman, T.-L. Floral pigmentation patterns provide an example of Gloger’s rule in plants. Nat. Plants 1, 1–5 (2015).
McDowall, R. M. Jordan’s and other ecogeographical rules, and the vertebral number in fishes. J. Biogeogr. 35, 501–508 (2008).
VanderWerf, E. A. Ecogeographic patterns of morphological variation in Elepaios (Chasiempis spp.): Bergmann’s, Allen’s, and Gloger’s Rules in a Microcosm—Patrones Ecogeográficos de Variación Morfológica en Chasiempis spp.: Las Reglas de Bergmann, Allen y Gloger en un Microcosmos. Ornithol. Monogr. 73, 1–34 (2012).
Zheng, S., Hu, J., Ma, Z., Lindenmayer, D. & Liu, J. Increases in intraspecific body size variation are common among North American mammals and birds between 1880 and 2020. Nat. Ecol. Evol. 7, 347–354 (2023).
Gaston, K. J. The Structure and Dynamics of Geographic Ranges (Oxford University Press, 2003).
Bergmann, C. Über die Verhältnisse der Wärmeökonomie der Thiere zu ihrer Größe (1847).
Aldrich, J. W. Ecogeographical variation in size and proportions of song sparrows (Melospiza melodia). Ornithol. Monogr. iii–134 (1984). https://doi.org/10.2307/40166779.
Johnston, R. F. & Selander, R. K. House sparrows: rapid evolution of races in North America. Science 144, 548–550 (1964).
Teplitsky, C., Mills, J. A., Alho, J. S., Yarrall, J. W. & Merilä, J. Bergmann’s rule and climate change revisited: Disentangling environmental and genetic responses in a wild bird population. Proc. Natl Acad. Sci. 105, 13492–13496 (2008).
Millien, V. Relative effects of climate change, isolation and competition on body-size evolution in the Japanese field mouse, Apodemus argenteus. J. Biogeogr. 31, 1267–1276 (2004).
Yom-Tov, Y. & Yom-Tov, S. Climatic change and body size in two species of Japanese rodents. Biol. J. Linn. Soc. 82, 263–267 (2004).
MacColl, A. D. C. The ecological causes of evolution. Trends Ecol. Evol. 26, 514–522 (2011).
Miller, A. H. Ecologic factors that accelerate formation of races and species of terrestrial vertebrates. Evolution 10, 262–277 (1956).
Patten, M. A. & Pruett, C. L. The Song Sparrow, Melospiza melodia, as a ring species: patterns of geographic variation, a revision of subspecies, and implications for speciation. Syst. Biodivers. 7, 33–62 (2009).
Arcese, P., Sogge, M. K., Marr, A. B. & Patten, M. A. Song Sparrow (Melospiza melodia). The Birds of North America (The Birds of North America, Inc., 2002). https://doi.org/10.2173/bna.704.
Carbeck, K., Wang, T., Reid, J. M. & Arcese, P. Adaptation to climate change through seasonal migration revealed by climatic versus demographic niche models. Glob. Change Biol. 28, 4260–4275 (2022).
Schluter, D. & Smith, J. N. M. Natural selection on beak and body size in the song sparrow. Evolution 40, 221–231 (1986).
Wolak, M. E. & Reid, J. M. Is pairing with a relative heritable? Estimating female and male genetic contributions to the degree of biparental inbreeding in song sparrows (Melospiza melodia). Am. Nat. https://doi.org/10.1086/686198 (2016).
Reid, J. M. et al. Immigration counter‐acts local micro‐evolution of a major fitness component: Migration‐selection balance in free‐living song sparrows. Evol. Lett. 5, 48–60 (2021).
Pruett, C. L. & Winker, K. Chapter 13: Alaska Song Sparrows (Melospiza Melodia) demonstrate that genetic marker and method of analysis matter in subspecies assessments. Ornithol. Monogr. 67, 162–171 (2010).
Mikles, C. S. et al. Genomic differentiation and local adaptation on a microgeographic scale in a resident songbird. Mol. Ecol. 29, 4295–4307 (2020).
Gaston, K. J., Chown, S. L. & Evans, K. L. Ecogeographical rules: elements of a synthesis. J. Biogeogr. 35, 483–500 (2008).
Smith, J. N. M. & Zach, R. Heritability of om morphological characters in a song sparrow population. Evolution 33, 460–467 (1979).
Schluter, D. & Smith, J. N. M. Genetic and phenotypic correlations in a natural population of song sparrows. Biol. J. Linn. Soc. 29, 23–36 (1986).
Bonnet, T. et al. Genetic variance in fitness indicates rapid contemporary adaptive evolution in wild animals. Science 376, 1012–1016 (2022).
Clark, J. D., Benham, P. M., Maldonado, J. E., Luther, D. A. & Lim, H. C. Maintenance of local adaptation despite gene flow in a coastal songbird. Evolution 76, 1481–1494 (2022).
Rask-Andersen, M., Karlsson, T., Ek, W. E. & Johansson, Å. Genome-wide association study of body fat distribution identifies adiposity loci and sex-specific genetic effects. Nat. Commun. 10, 339 (2019).
Zhu, Z. et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J. Allergy Clin. Immunol. 145, 537–549 (2020).
Barton, A. R., Sherman, M. A., Mukamel, R. E. & Loh, P.-R. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat. Genet. 53, 1260–1269 (2021).
Hato, T., Tabata, M. & Oike, Y. The role of angiopoietin-like proteins in angiogenesis and metabolism. Trends Cardiovasc. Med. 18, 6–14 (2008).
Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).
Adès, L. C. et al. FBN1,TGFBR1, and the Marfan-craniosynostosis/mental retardation disorders revisited. Am. J. Med. Genet. A 140A, 1047–1058 (2006).
Tachmazidou, I. et al. Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits. Am. J. Hum. Genet. 100, 865–884 (2017).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Hoban, S. et al. Finding the genomic basis of local adaptation: pitfalls, practical solutions, and future directions. Am. Nat. 188, 379–397 (2016).
Christe, C. et al. Adaptive evolution and segregating load contribute to the genomic landscape of divergence in two tree species connected by episodic gene flow. Mol. Ecol. 26, 59–76 (2017).
Lamichhaney, S. et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
Pfeifer, S. P. et al. The evolutionary history of nebraska deer mice: local adaptation in the face of strong gene flow. Mol. Biol. Evol. 35, 792–806 (2018).
Hendry, A. P., Schoen, D. J., Wolak, M. E. & Reid, J. M. The contemporary evolution of fitness. Annu. Rev. Ecol. Evol. Syst. https://doi.org/10.1146/annurev-ecolsys-110617-062358 (2018).
Husby, A., Visser, M. E. & Kruuk, L. E. B. Speeding up microevolution: the effects of increasing temperature on selection and genetic variance in a wild bird population. PLoS Biol. 9, e1000585 (2011).
Walsh, J. et al. Genomics of rapid ecological divergence and parallel adaptation in four tidal marsh sparrows. Evol. Lett. 3, 324–338 (2019).
Wang, T., Hamann, A., Spittlehouse, D. & Carroll, C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE 11 (2016).
Feng, S. et al. Dense sampling of bird diversity increases power of comparative genomics. Nature 587, 252–257 (2020).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Pruett, C. L. & Winker, K. Northwestern song sparrow populations show genetic effects of sequential colonization. Mol. Ecol. 14, 1421–1434 (2005).
Zheng, Q. et al. Systematic identification of genes involved in divergent skeletal muscle growth rates of broiler and layer chickens. BMC Genomics 10, 87 (2009).
R. Core Team. R: A Language and Environment for Statistical Computing (R. Core Team, 2022).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Turner, S. D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. bioRxiv 005165. https://doi.org/10.1101/005165 (2014).
Grabherr, M. G. et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26, 1145–1151 (2010).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Kearse, M. et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).
The UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Pavlidis, P., Živković, D., Stamatakis, A. & Alachiotis, N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol. Biol. Evol. 30, 2224–2234 (2013).
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Kim, Y. & Stephan, W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160, 765–777 (2002).
Oksanen, J. et al. Vegan: Community Ecology Package R Package Version 2.6-4 (CRAN, 2022).
Wickham, H., Chang, W. & Wickham, M. H. Package ‘ggplot2’: Create Elegant Data Visualisations Using the Grammar of Graphics Version 2, 1–189 (ggplot2, 2016).
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Acknowledgements
The authors thank the Cornell Lab of Ornithology, University of Alaska Museum, Berkeley Museum of Vertebrate Zoology, and Yvonne Chan for samples. B. Butcher, L. Campagna, D. Toews, T. Wang, and M. Whitlock provided essential assistance, comments, and support. Project funding was provided by the Natural Sciences and Engineering Research Council of Canada Discovery Grant RGPIN-2020-05268 (P.A.), University of British Columbia, and the Hesse Fellowship and Research Award (K.C.). The material in this manuscript is also based upon work supported by the National Science Foundation Postdoctoral Research Fellowship in Biology under grant No. DBI 1523719 (J.W.) as well as funding from the Fuller Evolutionary Biology Lab at the Cornell Lab of Ornithology (I.L.).
Author information
Authors and Affiliations
Contributions
K.C., P.A., and J.W. conceived and designed the study with input from I.L., K.W., and C.P. C.P. and K.W. conducted field work and collected samples. J.W. conducted laboratory work and K.C. carried out all bioinformatic analyses. Data analysis and interpretation were conducted by K.C. and J.W. with input from all co-authors. K.C. wrote the manuscript with input from all co-authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Haw Chuan Lim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Carbeck, K., Arcese, P., Lovette, I. et al. Candidate genes under selection in song sparrows co-vary with climate and body mass in support of Bergmann’s Rule. Nat Commun 14, 6974 (2023). https://doi.org/10.1038/s41467-023-42786-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-023-42786-2
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.