Candidate genes under selection in song sparrows co-vary with climate and body mass in support of Bergmann’s Rule

Carbeck, Katherine; Arcese, Peter; Lovette, Irby; Pruett, Christin; Winker, Kevin; Walsh, Jennifer

doi:10.1038/s41467-023-42786-2

Download PDF

Article
Open access
Published: 07 November 2023

Candidate genes under selection in song sparrows co-vary with climate and body mass in support of Bergmann’s Rule

Nature Communications volume 14, Article number: 6974 (2023) Cite this article

2779 Accesses
134 Altmetric
Metrics details

Subjects

Abstract

Ecogeographic rules denote spatial patterns in phenotype and environment that may reflect local adaptation as well as a species’ capacity to adapt to change. To identify genes underlying Bergmann’s Rule, which posits that spatial correlations of body mass and temperature reflect natural selection and local adaptation in endotherms, we compare 79 genomes from nine song sparrow (Melospiza melodia) subspecies that vary ~300% in body mass (17 − 50 g). Comparing large- and smaller-bodied subspecies revealed 9 candidate genes in three genomic regions associated with body mass. Further comparisons to the five smallest subspecies endemic to California revealed eight SNPs within four of the candidate genes (GARNL3, RALGPS1, ANGPTL2, and COL15A1) associated with body mass and varying as predicted by Bergmann’s Rule. Our results support the hypothesis that co-variation in environment, body mass and genotype reflect the influence of natural selection on local adaptation and a capacity for contemporary evolution in this diverse species.

Hybrid speciation driven by multilocus introgression of ecological traits

Article Open access 17 April 2024

Evolution of tissue-specific expression of ancestral genes across vertebrates and insects

Article 15 April 2024

Diversity-dependent speciation and extinction in hominins

Article Open access 17 April 2024

Introduction

Explicating correlations among phenotypes and environmental variables is a historic focus of theoretical and empirical research on natural selection, local adaptation, and speciation^1,2,3,4. More recently, questions about local adaptation and the adaptive capacity of species to persist given environmental change have reinvigorated efforts to understand the micro-evolutionary processes involved. For example, correlations between ambient temperature, phenotype, and life-history traits affecting fitness and subject to natural selection are widely reported in vertebrates^2,5,6,7,8 and summarized as ecogeographical rules⁹. Of particular relevance to climate adaptation, Bergmann’s rule posits an inverse correlation between ambient temperature and body size in endotherms¹⁰ and has garnered substantial empirical support (birds:^11,12,13; mammals:^14,15). If such patterns that follow Bergman’s rule reflect local adaptation in traits affecting fitness, such as thermal tolerance, we expect genes underlying such trait variation to show signs of historical or ongoing selection and to vary predictably with phenotype.

Species with large ranges that encompass steep environmental gradients are excellent candidates in which to test for a genomic basis of local adaptation¹⁶. We focused on song sparrows (Melospiza melodia), which are among the world’s most variable species^17,18 as indexed by the number of recognized subspecies (25), and which accordingly exhibit a stunning range of co-variation in phenotype, life history, and environment (e.g.,^11,19,20). Song sparrows inhabit highly heterogeneous environments, spanning about 36 degrees of latitude (26 °N to 62 °N) and 15 °C in mean annual temperatures (ca. 1 °C to 16 °C). Because many phenotypic traits of song sparrows are known to have an additive genetic basis, respond to selection, and affect individual fitness (e.g.,^21,22,23), we hypothesized that the phenotypic clines in body mass exemplified in song sparrows at least partly reflect local adaptation to environment. If so, we further suggest that identifying the genomic mechanisms underlying such patterns can inform us about the adaptive capacity of other species displaying local adaptation to heterogeneous environments.

Here, we test this idea by comparing 40 whole genomes of two large- (M. m. maxima, M. m. sanaka; mean mass [range]: 45.9 [41.9–50.0 g]) and two smaller-bodied subspecies (M. m. merrilli, M. m. rufina; mean mass [range]: 26.7 [21.9–30.9 g]) that breed in northern British Columbia and Alaska in order to evaluate the genomic landscape of divergence and identify candidate genes associated with body mass. Maxima and sanaka reside year-round from the Alaskan Peninsula to the Aleutian Islands, experience cold winters and are nearly twice the mass of rufina and merrilli, which experience warmer winters²⁴ (Fig. 1). With candidate genes for body mass identified, we then use those candidates to predict genotype in 39 birds that represent the five smallest subspecies of song sparrows, all endemic to California and residing in or adjacent to San Francisco Bay (mean mass [range]: 19.2 [16.9–23.3 g]). By doing so, we demonstrate the wider utility of identifying candidate genes that are targets of selection and which contribute to local adaptation in phenotypically variable species that occupy a diverse range of environmental conditions and geographic scales.

**Fig. 1: Song sparrow subspecies distribution, genetic divergence, and body size variation.**

Results

Phenotypic correlations and genomic divergence

Body mass (g) and mean winter and summer temperatures (°C) were strongly negatively related in the subspecies we studied (Fig. 1), consistent with Bergmann’s rule. At a genome-wide scale, we observed marked differentiation and unambiguous clustering between all four northern subspecies, clearly distinguished as large- (maxima and sanaka) and smaller-bodied (merrilli and rufina) subspecies (11,223,039 SNPs; PC1: 19.93% and PC2: 6.76% of genomic variation; Supplementary Fig. 1). This pattern holds when the southern small-bodied subspecies from San Francisco Bay are included, with a PCA delineating body size on PC axis 1, which accounted for 12.48% of genomic variation (PC2: 3.48%; Fig. 1b). Population structure at K = 2 had the lowest cross-validation error, corresponding to grouping of the large- versus smaller-bodied subspecies; but structuring was apparent at K = 4 (ADMIXTURE; Supplementary Fig. 2). Genome-wide estimates of F_ST ( ± SD) indicated substantial divergence, with pairwise genome-wide values from 0.035 ± 0.026 in the control comparisons (merrilli-rufina) to 0.247 ± 0.112 (maxima-merrilli; Supplementary Table 1). Similarly, the genome-wide absolute genetic divergence (Dxy) was lowest between the control comparisons (mean ± SD; maxima-sanaka: 0.292 ± 0.087) and highest between maxima-merrilli (0.425 ± 0.046). Overall, the strongest signals of genetic differentiation were best explained by body size.

Identification of candidate genes

We identified a total of 25 elevated windows of F_ST that were shared by at least two pairwise comparisons of large- and smaller-bodied subspecies (i.e., windows exhibiting F_ST estimates in the 99.9th percentile of the genome-wide mean; Fig. 2). Several of these shared elevated 50 kb windows contained annotated genes (13 genes: sanaka and merrilli; 10: sanaka and rufina; 18: maxima and rufina; 19: maxima and merrilli). Overall, the mean F_ST values in the 50 kb outlier windows were elevated for each of the four pairwise comparisons (Mean F_ST: sanaka-merrilli = 0.671; sanaka-rufina = 0.625; maxima-rufina = 0.645; maxima-merrilli = 0.828), relative to low genome-wide F_ST averages (sanaka-merrilli = 0.159; sanaka-rufina = 0.176; maxima-rufina = 0.205; maxima-merrilli = 0.247; Fig. 2).

**Fig. 2: Genome-wide differentiation between northern subspecies.**

To specifically identify putative genes under selection for body size, we focused on genes that were shared between all four large- vs smaller-bodied comparisons (i.e., “candidate genes”). Six genes were shared by two subspecies comparisons (AIDA, TPPP, USP16, LTN1, C3ORF52, and unknown gene ENSTGUP00000002762), and two shared by three comparisons (TRIP13 and BRD9; Supplementary Table 2). Notably, across the shared 50 kb windows, 9 candidate genes were identified in all large- and smaller- bodied pairwise comparisons (FBXW2, GARNL3, RALGPS1, ZBTB34, ANGPTL2, ZBTB43, COL15A1, TGFBR1, TAF1A), with six candidate genes occurring on the same scaffold (contig 391/chr 17; Supplementary Table 2). None of these candidate genes were shared in control comparisons between the two larger (maxima and sanaka) and smaller (merrilli and rufina) subspecies pairs, as expected if these differentiated regions arose via divergent selection on body size (Fig. 3; Supplementary Table 2).

**Fig. 3: Genetic differentiation and selective sweeps within candidate genes.**

Signatures of selection

To search for evidence of local adaptation in body size, we calculated the composite likelihood ratio (CLR) test statistic to test for the presence of selective sweeps on the 3 contigs containing the 9 candidate genes (Fig. 3). Selective sweeps were evident on all contigs (>99th percentile of contig-wide mean; Fig. 3). We also measured nucleotide diversity (π) and Tajima’s D across 50 kb windows in all four northern subspecies to characterize genetic diversity and detect evidence of selection. Commensurate with our hypothesis of natural selection, nucleotide diversity was reduced in the regions of the 9 candidate genes shared among all four comparisons (maxima = 0.0001, sanaka = 0.0002, merrilli = 0.0002, rufina = 0.0005), whereas the genome-wide average was an order of magnitude higher (maxima = 0.0016, sanaka = 0.0022, merrilli = 0.0032, rufina = 0.0026) (Supplementary Figs. 3–5). Tajima’s D was also reduced in candidate regions, especially in maxima (−0.332), sanaka (0.229), and merrilli (−0.789), as compared to genome-wide means (0.851, 0.892, and 0.889, respectively), but not in rufina (D_{Taj candidate} = 1.038; D_{Taj mean} = 1.219). Dxy estimated in candidate regions was also elevated in size-related pairwise comparisons (maxima-merrilli = 0.899, maxima-rufina = 0.710, sanaka-merrilli = 0.844, sanaka-rufina = 0.771) compared to controls of within-size comparisons (maxima-sanaka = 0.032, merrilli-rufina = 0.233; Supplementary Fig. 6). Taken together, these results provide evidence for the presence of selective sweeps and imply that the strength and direction of selection may vary among subspecies.

Validation of candidate genes

Given the identification of 9 genes associated with body size and shared by the 4 northern subspecies, we next tested if these candidate genes could be used to predict allele frequency commensurate with the smaller-bodied phenotype in the 5 smallest subspecies of song sparrows endemic to California (mean mass [range]: 19.2 [16.9–23.3 g]), for which we had prior genomic data²⁵. Allele frequency of the non-reference allele was assessed across the 467 SNPs located within the 9 candidate genes noted above revealed 8 SNPs that were highly correlated with mass (r > 0.90; p < 0.001; Fig. 4). Four of these SNPs were located within RALGPS1 (one of which is also in ANGTPL2), 1 SNP in GARNL3, and 3 SNPs in COL15A1 (Fig. 4). One SNP on RALGPS1 (position 57980) falls within the coding region of the gene, while the remaining SNPs are located within non-coding regions of genes. Notably, non-reference allele frequencies of the 8 candidate SNPs were also highly negatively correlated with average winter and summer temperature (range r = −0.79 to −0.85; p < 0.05; Fig. 4).

**Fig. 4: Relationship between non-reference allele frequency, body mass, and temperature for highly divergent SNPs.**

We observed consistent positive regressions of mass on non-reference allele frequency at each of the 8 focal SNPs (range: F(1,7) = 31.96–142.66; range R²: 0.820–0.953; p < 0.001; Supplementary Table 3). In contrast, regressing temperature on allele frequency revealed negative relationships at each of the focal SNPs (range: F(1,7) = 11.38–18.15; range R²: 0.619–0.722; p < 0.05; Supplementary Table 4). We used a partial Mantel test to further explore whether these relationships reflect a history of selection versus neutral processes using the 3 contigs containing the 9 candidate genes. Pairwise genetic distance and phenotype were significantly correlated across all 9 subspecies after controlling for geographic distance (partial Mantel test: r = 0.183; p = 0.004).

Per-site F_ST estimates for the 8 focal SNPs noted ranged from 0.63 to 0.96 among all large- and smaller-bodied subspecies (mean: 0.713–0.763, while control comparisons ranged from 0 to 0.272 (mean: 0.003–0.081; Supplementary Table 5). Across the 9 candidate genes, the percentage of fixed or nearly fixed SNPs between the large- and smaller-bodied subspecies ranged from 3.0 to 33.8%, whereas 0% of control comparisons contained fixed SNPs (Supplementary Table 5).

A phylogenetic tree reconstructed from individual genotypes of the 8 focal SNPs and 1–3 additional SNPs immediately up and downstream (54 total SNPs; Supplementary Fig. 7; see Methods) was consistent with the strong correlation between body size and non-reference allele frequency across subspecies. Across the nine subspecies, three clusters formed largely reflecting body size and geography, comprising two subspecies of intermediate mass (rufina, merrilli), two large Aleutian endemics (maxima, sanaka), and five small California endemics (gouldii, heermanni, maxillaris, pusillula, samuelis). Whilst these findings suggest compelling support for our hypothesis that historical and/or ongoing natural selection has contributed to genetic differentiation of body size between subspecies, lower coverage for the San Francisco Bay populations may limit our ability to fully resolve heterozygotes and should be validated using higher coverage, phased sequence data.

Discussion

We characterized 9 candidate genes associated with body mass in 4 subspecies of large- and smaller-bodied song sparrows that breed at the northern extent of the species’ range, and then used those candidates to predict genotype in 39 individuals from 5 small-bodied subspecies endemic to California that are ~50–70% smaller than subspecies that reside year-round in Alaska. Bergmann¹⁰ explained the tendency for endotherms to be larger in colder environments as an adaptation to minimize heat loss, though mechanisms remain uncertain^13,26. Our observations of fixed and shared genomic differences, their co-variation with climate and body mass, and evident signatures of selection support Bergmann’s Rule as influential in song sparrows reflect a capacity for and history of local adaptation to environment in this species. Moreover, because many other traits exhibit substantial additive genetic variation^23,27, respond to selection^21,28, co-vary with environment^11,25, and affect individual fitness in this species²³, we interpret these cumulative results as suggesting that song sparrows exhibit substantial capacity for contemporary evolution via context-dependent selection on individual phenotype and life history^20,29.

We observed multiple regions of elevated divergence in the genomes of large- vs smaller-bodied song sparrows versus a relatively homogenous background (Fig. 2; Supplementary Fig. 6), extending prior demonstrations of local adaptation at micro- to macrogeographic scales^20,25,30. Consistent clustering of genetic groups in the large- and smaller-bodied populations was evident in comparisons based on mass and subspecies identity (Fig. 1; Supplementary Fig. 1) but absent in shared regions of divergence in control comparisons among subspecies of similar size (Fig. 2; Supplementary Fig. 6). These contrasts also support our hypothesis that local adaptation in song sparrow is underlain, as least in part, by genes that play a causal role in the patterns Bergmann sought to explain.

Having identified candidate genes capable of differentiating large- and smaller-bodied subspecies in Alaska, we then validated these patterns by predicting the genotypes of 39 birds from the five smallest subspecies endemic to California. Of the 9 candidate genes in regions shared by all large- and smaller-bodied pairwise comparisons, four contained SNPs (r > 0.90) that predicted allele frequency of small-bodied subspecies in California (RALGPS1, COL15A1, GARNL3, and ANGTPL2; Fig. 4), and included genes linked to body mass index (BMI), height, and fat distribution in humans (Supplementary Table 2; see methods). RALGPS1 (mean F_ST: 0.797, range: 0.071–0.974) and COL15A1 (mean F_ST: 0.514, range: 0.010–0.945) similarly associate with percent and distribution of body fat, size and BMI in humans^31,32. ANGPTL2 contained several highly differentiated SNPs correlated with body mass and temperature (mean F_ST: 0.917, range: 0.845–0.959). ANGPTL2 codes for angiopoietin-like protein 2 which has important roles in lipid, glucose and energy metabolism across taxa, and has also been found to be associated with human height and obesity^33,34. In contrast, GARNL3 (mean F_ST: 0.791, range: 0.055–0.973) was informative in differentiating subspecies by body size and predicting the genotype of smaller-bodied song sparrows, but it has not yet been linked to phenotype in other species. Notably, there were high proportions of fixed or nearly fixed SNPs (F_ST > 0.95) within RALGPS1, GARNL3, and ANGTPL2 (18/151; 7/59; 6/14 SNPs, respectively). These results support further our supposition that some or all of the genes we identified are directly or indirectly associated with thermoregulatory capacity and influential in their contributions to the roughly 300% increase in the body mass of song sparrows observed from California to Alaska.

Although five of the candidate genes did not include SNPs highly correlated with body mass in small-bodied subspecies endemic to California, four of these genes have previously been associated with BMI, body fat mass, height, or Marfan syndrome in humans (e.g., FBXW2, ZBTB43, TGFBR1, TAF1A;^{35,36,37,38,39,40}; Supplementary Table 2), whereas ZBTB34 appears to reflect a novel association to mass. Because complex phenotypes often arise via many genes of small effect⁴¹, it is likely that such smaller effects would be challenging to discern given our statistical approach, stringent identification of candidates, and modest sample sizes.

Despite strong associations between body mass and the candidate genes noted above, genetic drift, variation in recombination rate and genetic architecture, and selection on correlated traits are also expected to influence population-level variation in the phenotype of song sparrows across their North American range⁴². Thus, whilst genome scans clearly offer valuable insights into the origins of adaptive evolution^43,44,45, more detailed genomic and field studies are needed to validate gene expression, the effects of natural selection on the adaptive capacity of populations, phenotype, and fitness, and to understand the consequences of immigration on genetic architecture. Methods such as QTL mapping of hybrid individuals in populations with precise pedigrees will be needed to elucidate the additional effects of chromosomal rearrangements, insertions, deletions, or duplications, on phenotype, plasticity, and the process of speciation^23,29. Similarly, a more precise understanding of the influence of plasticity versus direct or indirect selection on traits underlying Bergmann’s or other ecogeographic rules is needed to estimate the capacity of populations to maintain fitness given environmental change^13,26,46,47.

Song sparrows exhibit a stunning range of variation in external phenotype and life history that correlates closely with local environmental conditions^{11,18,19,20,25}. Because many phenotypic traits that affect individual fitness have an additive genetic basis in this species^21,23,28, it is plausible that spatial and temporal variation in natural selection have contributed substantially to local adaptation and divergence across its range^{11,18,19,20,25}. Our results reveal strong evidence of natural selection acting on genes which appear to contribute to phenotypic variation in body mass among nine subspecies distributed over a dramatic environmental cline, and augment our prior work demonstrating local adaptation in the genomes of individuals across macro- to micro-geographic scales^25,48. Furthermore, our novel approach to validating gene function strongly suggests that the same genes have experienced historic and/or ongoing selection in multiple populations across the species’ range. Our findings thus support and extend our prior results at local to landscape scales^23,25,30, implying a substantial capacity for eco-evolutionary adaptation to environment change in the song sparrow, and reinforcing its value as a model species for understanding the genomic underpinnings of adaptive evolution.

Methods

Study system and sampling

To identify candidate genes underlying Bergmann’s rule, we intentionally sampled large- and smaller-bodied song sparrows from the northern extent of the range where these populations in close geographic proximity vary substantially in mass. In total, we sampled 40 male song sparrows representing two large-bodied subspecies, M. m. maxima (n = 12; mean ± SD = 46.9 ± 2.5 g) and M. m. sanaka (n = 8; 44.4 ± 1.1 g), and two smaller-bodied subspecies, M. m. merrilli (n = 8; 23.2 ± 1.2 g) and M. m. rufina (n = 12; 29.0 ± 1.1 g). Tissue samples for these individuals were provided by the University of Alaska Museum. All birds were collected between 1997 and 2000 (Supplementary Data 1). We then used these candidate genes to predict the genotypes of small-bodied subspecies from California: M. m. gouldii (n = 10; mean ± SD = 18.44 ± 0.84 g); M. m. heermanni (n = 8; 21.22 ± 0.95 g); M. m. samuelis (n = 6; 17.95 ± 0.70 g); M. m. pusillula (n = 9; 18.57 ± 0.95 g); and M. m. maxillaris (n = 6; 20.25 ± 1.08 g). Because extreme cold and heat are both expected to influence body size we used mean winter (Dec, Jan, Feb) and summer (Jun, Jul, Aug) temperature data, which were acquired from ClimateNA v.6.00⁴⁹ for each of the sample locations to visualize the relationship between body mass and temperature (Fig. 1). All subspecies are year-round residents across their range except rufina and merrilli, which are considered to be partial migrants. At our sampling locations, however, rufina are known residents and merrilli are known migrants, therefore we estimated the mean winter temperature for merrilli based on Patten and Pruett¹⁸.

Whole-genome sequencing and variant discovery

Genomic DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen, CA, USA). DNA concentrations were quantified using the Qubit BR dsDNA Assay Kit (Life Technologies). Using 150 ng of DNA from each sample, we prepared individually barcoded libraries with a 300 bp insert size following the protocol for the NEBNext Ultra II FS DNA Library Prep Kit (Illumina, CA, USA). Libraries for M. m. sanaka and M. m. merrilli were sequenced on a single Illumina NextSeq lane at the Cornell Institute for Biotechnology core facility, while those of M. m. maxima and M. m. rufina were sequenced on a single Illumina NovaSeq lane at Novagene (The University of California at Davis campus). Whole-genome sequencing yielded a mean of 48,428,940 reads per individual in our British Columbia and Alaska populations.

We assessed library quality using FastQC v.0.11.8 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). We used AdapterRemoval v.2.1.1 for sequence trimming, adapter removal, and quality filtering (requiring a minimum Phred quality score of 30), and we merged overlapping paired-end reads. We aligned filtered reads to the song sparrow reference genome based on a small-bodied subspecies (M. m. morphna) from Southern British Columbia⁵⁰ using the default settings in Bowtie2 v.2.4.2⁵¹ and obtained alignment statistics from Qualimap v.2.2.1⁵². Mean alignment rates for the four subspecies comparisons were 97.98%. Mean coverage and mapping quality were 7X (range: 4X–16X), and 18.62 (Table S3). We used SAMtools v.1.9⁵³ to convert all resulting SAM files to BAM files and to sort and index files. We used Picard Tools v.2.8.2 (https://broadinstitute.github.io/picard/) to add index groups and mark duplicates. We used mpileup module in Bcftools v.1.12 for SNP variant discovery and genotyping for all song sparrows and filtered out variants that were not biallelic, had minor allele frequencies less than 5%, mean coverage less than 2X or more than 50X, and more than 20% missing data. This resulted in a total of 13,089,663 SNPs across the four subspecies. The mean missing data across individuals was 16.9% (Supplementary Data 1). We identified three samples that exhibited 50% or greater relatedness to another individual. Because three (rufina, sanaka, and maxima) of our subspecies were sampled from islands and are likely to have higher levels of inbreeding⁵⁴, this signal is likely biologically relevant. To assess whether retaining related individuals had any impact on population structure, exploratory PCA plots were run both with and without these individuals. This comparison produced similar results (Fig. 1, Supplementary Fig. 1); therefore, all downstream analyses were conducted with the full dataset, including related individuals.

Population genomic analyses

We visualized genetic clustering in the SNP dataset by performing a PCA using snpgdsPCA function in the SNPRelate package⁵⁵ in R v.4.0.5⁵⁶. To quantify the level of genome-wide differentiation between subspecies, we calculated F_ST⁵⁷ between populations using VCFtools v.0.1.14⁵⁸ across 50 kb windows. Manhattan plots were generated using the Manhattan function in the qqman package in R⁵⁹. Before plotting windowed F_ST estimates, we filtered out all scaffolds with fewer than four windows and less than 10 SNPs. We also used VCFtools to calculate inbreeding coefficients (F_IS) for all individuals. We then calculated Tajima’s D, nucleotide diversity (π), and absolute genetic divergence (Dxy) using the popgenWindows.py script (S. Martin; https://github.com/simonhmartin/genomics_general) to further provide insight into the evolutionary processes that have shaped population-level genetic variation. Putative chromosomal locations of different scaffolds were obtained by aligning them to the zebra finch (Taeniopygia guttata) assembly (bTaeGut1_v1.p; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_003957565.1/; SAMN02981239) using the pseudo-chromosome scaffolding command in SatsumaSynteny⁶⁰. To analyze patterns of genetic structure among the four subspecies, Admixture v.1.23⁶¹ analyses were run using a filtered dataset (1,989,848 SNPs) that contained no missing data and was pruned to avoid linkage using the script ldPruning.sh (https://github.com/speciationgenomics/scripts/blob/master/ldPruning.sh). We investigated one to four population clusters with 200 bootstrap resampling iterations.

Genome-wide divergence

We identified regions of elevated divergence among the four pairwise (or between size) comparisons including large- and smaller-bodied northern populations using the sliding window F_ST estimates. Elevated values of F_ST averaged over non-overlapping 50 kb windows were considered elevated if exceeding the 99.9^th percentile of genome-wide mean value (i.e., >0.498 when comparing sanaka and merrilli, > 0.493 for sanaka and rufina, 0.507 for maxima and rufina, and 0.518 for maxima and merrilli; Fig. 2). We compiled a list of genes within these outlier windows using Geneious V.11.1.5⁶². To characterize putative candidate genes, we used ontology information from the zebra finch Ensembl database⁶³ the annotated song sparrow reference genome⁵⁰, and functional information from the Uniprot database⁶⁴ (Supplementary Table 2). We additionally compared the identified list of genes to known genes involved in recent analyses of body size or BMI in other species (see discussion; NHGRI-EBI GWAS catalog; https://www.ebi.ac.uk/gwas/).

To control for phenotypic differences between subspecies in traits that are not body size (e.g., plumage differences, migratory behavior), we compared the two smaller-bodied populations, M. m. merrilli and M. m. rufina, that also differ in migratory behavior as a measure of ensuring that the candidate genes are correlated with our trait of interest. We also compared the two large-bodied populations, M. m. sanaka and M. m. maxima as an additional control for our smaller- vs large-bodied comparisons. There were no overlaps in elevated peaks between our large-small bodied (between size) comparisons with those identified within our smaller-bodied or large-bodied (within size) control comparisons, increasing our confidence that we have identified genes related to body size variation as opposed to migratory behavior or other traits differing between subspecies.

Among our large- and smaller-bodied (between size) comparisons, we classified outlier regions as shared if two or more paired comparisons identified the same gene within 50 kb of an elevated window (Supplementary Table 2). We narrowed the candidate gene set by keeping only genes identified in all four comparisons between large- and smaller-bodied subspecies. Using this approach, we identified 9 genes that are shared in all of the between body size comparisons. Within these 9 genes, there are 467 variants across the four subspecies of interest.

Signatures of selection

We used the candidate gene set, containing the 467 SNPs, to test for evidence of selective sweeps, identify SNPs highly correlated with body mass, and calculate the percentage of fixed or nearly fixed SNPs (F_ST > 0.95; Supplementary Table 5). To identify regions under selection, we calculated the composite likelihood ratio (CLR) test statistic using the program SweeD v3.3.2⁶⁵. We determined the genotypes of all individuals by phasing and imputing missing data using Beagle v.3.3.2⁶⁶. We ran SweeD separately for each population and on individual contigs of interest using the default parameters except for using a window size of 200 bp. The window within each contig with the highest CLR statistic is considered the likely location of a selective sweep⁶⁷.

Hypothesis testing for validation

To test the hypothesis that our narrowed set of 9 candidate genes is related to body size, we predicted the genotypes of more distantly related small-bodied subspecies (M. m. gouldii, M. m. heermanni, M. m. samuelis, M. m. pusillula, and M. m. maxillaris) from a different geographic region, the San Francisco Bay area of California. Whole-genome sequences for the 39 San Francisco Bay individuals were generated in a separate study²⁵ using library preparation and sequencing methods as detailed above. For the whole-genome sequencing summary statistics of the 39 San Francisco Bay individuals see Mikles and colleagues²⁵.

We first performed a partial Mantel test on the 3 contigs containing the 9 candidate genes across all 9 subspecies to assess whether patterns of divergence were due to neutral, including isolation by distance, or selective processes. Using the vegan⁶⁸ R package, we tested for correlations between mass and genetic distance, while accounting for geographic (Euclidean) distance.

To see how strongly allele frequency correlated with body size across all 79 individuals, we produced a matrix of the frequency of the non-reference allele using vcftools (--freq2 recode option) for the 467 SNPs. We used Pearson correlation coefficient (r) to quantify the relationship between the non-reference allele frequency and body mass (g), and between non-reference allele frequency and average summer and winter temperature (°C) at each of the SNPs. We selected the top 8 SNPs with a r > 0.90 between non-reference allele frequency and mass: contig 391 (chr 17) positions: 19152, 57980, 61835, 86656, 122228; contig 3361 (chr 2) positions: 255216, 255218, 262569. We also performed linear regression analyses to further explore the relationships between allele frequency and mass and temperature at each of the focal SNPs. For each analysis, allele frequency served as the independent variable, while mass and temperature were the dependent variables. Model parameters, including regression coefficients, intercepts, confidence intervals, effect sizes, R-squared values, p-values, and leverage and influence statistics can be found in Supplementary Tables 3 and 4. Relationships were then visualized using ggplot2⁶⁹ (Fig. 4).

In addition to looking at correlation between subspecies-level allele frequencies and mass, we assessed correlation with phenotype and genotype for individuals (heterozygotes or homozygotes for the reference/alternate alleles at our focal SNPs). To do this, we used the R package vegan⁶⁸ to calculate a distance matrix between the 8 SNPs identified above and included 1–3 additional SNPs immediately up and downstream of each focal SNP for visualization purposes (54 total SNPs). Less than three SNPs were used in cases where the SNP was located at the end of a contig or adjacent to another focal SNP. We then used the Ward algorithm as a grouping method to express the relationships between sites, which was implemented using the hclust function in vegan. We plotted the output using the plot.phylo function in the R package ape⁷⁰ and visualized genotypes by constructing a modified heatmap in which the base pairs of each individual at a locus are represented in the two halves of a diagonally split pixel (Supplementary Fig. 7).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw sequencing data generated in this study have been deposited in the National Center for Biotechnology Information (NCBI) BioProject database under accession code PRJNA1013697. Raw sequencing data for the California song sparrow subspecies used in this study are available in the NCBI BioProject database under accession code PRJNA1018990. Source data are provided with this paper.

Code availability

Scripts and bioinformatic pipelines used in this study are available at https://github.com/kcarbeck/SOSP_body_size (https://doi.org/10.5281/zenodo.8365146).

References

Darwin, C. The Origin of Species (John Murray, 1859).
Mayr, E. Animal Species and Evolution. (Harvard Univ. Press, 1963).
Coyne, J. A. & Orr, H. A. Speciation: a catalogue and critique of species concepts. Philos. Biol. Anthol. 272–292 (2004).
Schluter, D. Evidence for ecological speciation and its alternative. Science 323, 737–741 (2009).
Article ADS CAS PubMed Google Scholar
Koski, M. H. & Ashman, T.-L. Floral pigmentation patterns provide an example of Gloger’s rule in plants. Nat. Plants 1, 1–5 (2015).
Article Google Scholar
McDowall, R. M. Jordan’s and other ecogeographical rules, and the vertebral number in fishes. J. Biogeogr. 35, 501–508 (2008).
Article Google Scholar
VanderWerf, E. A. Ecogeographic patterns of morphological variation in Elepaios (Chasiempis spp.): Bergmann’s, Allen’s, and Gloger’s Rules in a Microcosm—Patrones Ecogeográficos de Variación Morfológica en Chasiempis spp.: Las Reglas de Bergmann, Allen y Gloger en un Microcosmos. Ornithol. Monogr. 73, 1–34 (2012).
Article Google Scholar
Zheng, S., Hu, J., Ma, Z., Lindenmayer, D. & Liu, J. Increases in intraspecific body size variation are common among North American mammals and birds between 1880 and 2020. Nat. Ecol. Evol. 7, 347–354 (2023).
Article PubMed Google Scholar
Gaston, K. J. The Structure and Dynamics of Geographic Ranges (Oxford University Press, 2003).
Bergmann, C. Über die Verhältnisse der Wärmeökonomie der Thiere zu ihrer Größe (1847).
Aldrich, J. W. Ecogeographical variation in size and proportions of song sparrows (Melospiza melodia). Ornithol. Monogr. iii–134 (1984). https://doi.org/10.2307/40166779.
Johnston, R. F. & Selander, R. K. House sparrows: rapid evolution of races in North America. Science 144, 548–550 (1964).
Article ADS CAS PubMed Google Scholar
Teplitsky, C., Mills, J. A., Alho, J. S., Yarrall, J. W. & Merilä, J. Bergmann’s rule and climate change revisited: Disentangling environmental and genetic responses in a wild bird population. Proc. Natl Acad. Sci. 105, 13492–13496 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Millien, V. Relative effects of climate change, isolation and competition on body-size evolution in the Japanese field mouse, Apodemus argenteus. J. Biogeogr. 31, 1267–1276 (2004).
Article Google Scholar
Yom-Tov, Y. & Yom-Tov, S. Climatic change and body size in two species of Japanese rodents. Biol. J. Linn. Soc. 82, 263–267 (2004).
Article Google Scholar
MacColl, A. D. C. The ecological causes of evolution. Trends Ecol. Evol. 26, 514–522 (2011).
Article PubMed Google Scholar
Miller, A. H. Ecologic factors that accelerate formation of races and species of terrestrial vertebrates. Evolution 10, 262–277 (1956).
Article Google Scholar
Patten, M. A. & Pruett, C. L. The Song Sparrow, Melospiza melodia, as a ring species: patterns of geographic variation, a revision of subspecies, and implications for speciation. Syst. Biodivers. 7, 33–62 (2009).
Article Google Scholar
Arcese, P., Sogge, M. K., Marr, A. B. & Patten, M. A. Song Sparrow (Melospiza melodia). The Birds of North America (The Birds of North America, Inc., 2002). https://doi.org/10.2173/bna.704.
Carbeck, K., Wang, T., Reid, J. M. & Arcese, P. Adaptation to climate change through seasonal migration revealed by climatic versus demographic niche models. Glob. Change Biol. 28, 4260–4275 (2022).
Article CAS Google Scholar
Schluter, D. & Smith, J. N. M. Natural selection on beak and body size in the song sparrow. Evolution 40, 221–231 (1986).
Article PubMed Google Scholar
Wolak, M. E. & Reid, J. M. Is pairing with a relative heritable? Estimating female and male genetic contributions to the degree of biparental inbreeding in song sparrows (Melospiza melodia). Am. Nat. https://doi.org/10.1086/686198 (2016).
Reid, J. M. et al. Immigration counter‐acts local micro‐evolution of a major fitness component: Migration‐selection balance in free‐living song sparrows. Evol. Lett. 5, 48–60 (2021).
Article PubMed PubMed Central Google Scholar
Pruett, C. L. & Winker, K. Chapter 13: Alaska Song Sparrows (Melospiza Melodia) demonstrate that genetic marker and method of analysis matter in subspecies assessments. Ornithol. Monogr. 67, 162–171 (2010).
Article Google Scholar
Mikles, C. S. et al. Genomic differentiation and local adaptation on a microgeographic scale in a resident songbird. Mol. Ecol. 29, 4295–4307 (2020).
Article PubMed Google Scholar
Gaston, K. J., Chown, S. L. & Evans, K. L. Ecogeographical rules: elements of a synthesis. J. Biogeogr. 35, 483–500 (2008).
Article Google Scholar
Smith, J. N. M. & Zach, R. Heritability of om morphological characters in a song sparrow population. Evolution 33, 460–467 (1979).
Article PubMed Google Scholar
Schluter, D. & Smith, J. N. M. Genetic and phenotypic correlations in a natural population of song sparrows. Biol. J. Linn. Soc. 29, 23–36 (1986).
Article Google Scholar
Bonnet, T. et al. Genetic variance in fitness indicates rapid contemporary adaptive evolution in wild animals. Science 376, 1012–1016 (2022).
Article ADS CAS PubMed Google Scholar
Clark, J. D., Benham, P. M., Maldonado, J. E., Luther, D. A. & Lim, H. C. Maintenance of local adaptation despite gene flow in a coastal songbird. Evolution 76, 1481–1494 (2022).
Article PubMed PubMed Central Google Scholar
Rask-Andersen, M., Karlsson, T., Ek, W. E. & Johansson, Å. Genome-wide association study of body fat distribution identifies adiposity loci and sex-specific genetic effects. Nat. Commun. 10, 339 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhu, Z. et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J. Allergy Clin. Immunol. 145, 537–549 (2020).
Article CAS PubMed Google Scholar
Barton, A. R., Sherman, M. A., Mukamel, R. E. & Loh, P.-R. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat. Genet. 53, 1260–1269 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hato, T., Tabata, M. & Oike, Y. The role of angiopoietin-like proteins in angiogenesis and metabolism. Trends Cardiovasc. Med. 18, 6–14 (2008).
Article CAS PubMed Google Scholar
Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed Google Scholar
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).
Article CAS PubMed Google Scholar
Adès, L. C. et al. FBN1,TGFBR1, and the Marfan-craniosynostosis/mental retardation disorders revisited. Am. J. Med. Genet. A 140A, 1047–1058 (2006).
Article Google Scholar
Tachmazidou, I. et al. Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits. Am. J. Hum. Genet. 100, 865–884 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Hoban, S. et al. Finding the genomic basis of local adaptation: pitfalls, practical solutions, and future directions. Am. Nat. 188, 379–397 (2016).
Article PubMed PubMed Central Google Scholar
Christe, C. et al. Adaptive evolution and segregating load contribute to the genomic landscape of divergence in two tree species connected by episodic gene flow. Mol. Ecol. 26, 59–76 (2017).
Article CAS PubMed Google Scholar
Lamichhaney, S. et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
Article ADS CAS PubMed Google Scholar
Pfeifer, S. P. et al. The evolutionary history of nebraska deer mice: local adaptation in the face of strong gene flow. Mol. Biol. Evol. 35, 792–806 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hendry, A. P., Schoen, D. J., Wolak, M. E. & Reid, J. M. The contemporary evolution of fitness. Annu. Rev. Ecol. Evol. Syst. https://doi.org/10.1146/annurev-ecolsys-110617-062358 (2018).
Husby, A., Visser, M. E. & Kruuk, L. E. B. Speeding up microevolution: the effects of increasing temperature on selection and genetic variance in a wild bird population. PLoS Biol. 9, e1000585 (2011).
Article CAS PubMed PubMed Central Google Scholar
Walsh, J. et al. Genomics of rapid ecological divergence and parallel adaptation in four tidal marsh sparrows. Evol. Lett. 3, 324–338 (2019).
Article PubMed PubMed Central Google Scholar
Wang, T., Hamann, A., Spittlehouse, D. & Carroll, C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE 11 (2016).
Feng, S. et al. Dense sampling of bird diversity increases power of comparative genomics. Nature 587, 252–257 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).
Article CAS PubMed Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Pruett, C. L. & Winker, K. Northwestern song sparrow populations show genetic effects of sequential colonization. Mol. Ecol. 14, 1421–1434 (2005).
Article CAS PubMed Google Scholar
Zheng, Q. et al. Systematic identification of genes involved in divergent skeletal muscle growth rates of broiler and layer chickens. BMC Genomics 10, 87 (2009).
Article PubMed PubMed Central Google Scholar
R. Core Team. R: A Language and Environment for Statistical Computing (R. Core Team, 2022).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
CAS PubMed Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Turner, S. D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. bioRxiv 005165. https://doi.org/10.1101/005165 (2014).
Grabherr, M. G. et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26, 1145–1151 (2010).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kearse, M. et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Article PubMed PubMed Central Google Scholar
Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).
Article CAS PubMed Google Scholar
The UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Article Google Scholar
Pavlidis, P., Živković, D., Stamatakis, A. & Alachiotis, N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol. Biol. Evol. 30, 2224–2234 (2013).
Article CAS PubMed PubMed Central Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Article CAS PubMed PubMed Central Google Scholar
Kim, Y. & Stephan, W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160, 765–777 (2002).
Article CAS PubMed PubMed Central Google Scholar
Oksanen, J. et al. Vegan: Community Ecology Package R Package Version 2.6-4 (CRAN, 2022).
Wickham, H., Chang, W. & Wickham, M. H. Package ‘ggplot2’: Create Elegant Data Visualisations Using the Grammar of Graphics Version 2, 1–189 (ggplot2, 2016).
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank the Cornell Lab of Ornithology, University of Alaska Museum, Berkeley Museum of Vertebrate Zoology, and Yvonne Chan for samples. B. Butcher, L. Campagna, D. Toews, T. Wang, and M. Whitlock provided essential assistance, comments, and support. Project funding was provided by the Natural Sciences and Engineering Research Council of Canada Discovery Grant RGPIN-2020-05268 (P.A.), University of British Columbia, and the Hesse Fellowship and Research Award (K.C.). The material in this manuscript is also based upon work supported by the National Science Foundation Postdoctoral Research Fellowship in Biology under grant No. DBI 1523719 (J.W.) as well as funding from the Fuller Evolutionary Biology Lab at the Cornell Lab of Ornithology (I.L.).

Author information

Authors and Affiliations

Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, T6T 1Z4, Canada
Katherine Carbeck & Peter Arcese
Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
Irby Lovette & Jennifer Walsh
Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14850, USA
Irby Lovette
Department of Biology, Ouachita Baptist University, Arkadelphia, AR, 71998, USA
Christin Pruett
University of Alaska Museum, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
Kevin Winker

Authors

Katherine Carbeck
View author publications
You can also search for this author in PubMed Google Scholar
Peter Arcese
View author publications
You can also search for this author in PubMed Google Scholar
Irby Lovette
View author publications
You can also search for this author in PubMed Google Scholar
Christin Pruett
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Winker
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Walsh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.C., P.A., and J.W. conceived and designed the study with input from I.L., K.W., and C.P. C.P. and K.W. conducted field work and collected samples. J.W. conducted laboratory work and K.C. carried out all bioinformatic analyses. Data analysis and interpretation were conducted by K.C. and J.W. with input from all co-authors. K.C. wrote the manuscript with input from all co-authors.

Corresponding author

Correspondence to Katherine Carbeck.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Haw Chuan Lim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Carbeck, K., Arcese, P., Lovette, I. et al. Candidate genes under selection in song sparrows co-vary with climate and body mass in support of Bergmann’s Rule. Nat Commun 14, 6974 (2023). https://doi.org/10.1038/s41467-023-42786-2

Download citation

Received: 10 January 2023
Accepted: 19 October 2023
Published: 07 November 2023
DOI: https://doi.org/10.1038/s41467-023-42786-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Phenotypic correlations and genomic divergence

Identification of candidate genes

Signatures of selection

Validation of candidate genes

Discussion

Methods

Study system and sampling

Whole-genome sequencing and variant discovery

Population genomic analyses

Genome-wide divergence

Signatures of selection

Hypothesis testing for validation

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links