The genomic footprint of whaling and isolation in fin whale populations

Nigenda-Morales, Sergio F.; Lin, Meixi; Nuñez-Valencia, Paulina G.; Kyriazis, Christopher C.; Beichman, Annabel C.; Robinson, Jacqueline A.; Ragsdale, Aaron P.; Urbán R., Jorge; Archer, Frederick I.; Viloria-Gómora, Lorena; Pérez-Álvarez, María José; Poulin, Elie; Lohmueller, Kirk E.; Moreno-Estrada, Andrés; Wayne, Robert K.

doi:10.1038/s41467-023-40052-z

Download PDF

Article
Open access
Published: 12 September 2023

The genomic footprint of whaling and isolation in fin whale populations

Nature Communications volume 14, Article number: 5465 (2023) Cite this article

5771 Accesses
3 Citations
130 Altmetric
Metrics details

Subjects

Abstract

Twentieth century industrial whaling pushed several species to the brink of extinction, with fin whales being the most impacted. However, a small, resident population in the Gulf of California was not targeted by whaling. Here, we analyzed 50 whole-genomes from the Eastern North Pacific (ENP) and Gulf of California (GOC) fin whale populations to investigate their demographic history and the genomic effects of natural and human-induced bottlenecks. We show that the two populations diverged ~16,000 years ago, after which the ENP population expanded and then suffered a 99% reduction in effective size during the whaling period. In contrast, the GOC population remained small and isolated, receiving less than one migrant per generation. However, this low level of migration has been crucial for maintaining its viability. Our study exposes the severity of whaling, emphasizes the importance of migration, and demonstrates the use of genome-based analyses and simulations to inform conservation strategies.

Hybrid speciation driven by multilocus introgression of ecological traits

Article Open access 17 April 2024

Diversity-dependent speciation and extinction in hominins

Article Open access 17 April 2024

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Introduction

Due to increasing recent human impacts, many vertebrate species have experienced drastic population declines and now persist as small and fragmented populations^1,2,3. Small populations are at higher risk of population declines due to stochastic environmental and genetic factors^4,5,6. Both anthropogenic and naturally occurring population declines reduce genetic diversity, and increase inbreeding and genetic load due to the stronger action of genetic drift which diminish the long-term survival and adaptive potential of populations^7,8. However, the impact of these processes depends on the often unknown population-specific demographic histories and life history traits. For example, gene flow as low as one effective migrant per generation may counteract genetic drift and reduce the frequency of deleterious variation^9,10,11, but might also reduce metapopulation genetic variation¹², or introduce strongly deleterious alleles¹³. Therefore, uncovering population history and determining how detrimental genetic patterns arise in declining populations are challenging questions, but the answers are critical to developing effective conservation strategies¹⁴.

Industrial whaling during the 20th century is arguably one of the most disruptive ecological events caused by humans¹⁵, which decimated all great whale species and drove many of them to the brink of extinction^16,17. Estimating the decline of whale populations is crucial to evaluate the full impact of whaling and designing appropriate recovery policies, not only on whale abundance but on entire ecosystems^15,17,18. However, quantifying the magnitude of known recent population declines in endangered vertebrate species from contemporary samples has proven difficult because the estimates based on genetic diversity capture long-term effective sizes rather than recent demographic events^19,20. Additionally, the long life span and generation time of whales complicate the inference of recent population size changes²¹ because less generation turnover occurs in a given amount of time. Given these challenges, previous genetic studies using contemporary samples have only indirectly inferred the impact of whaling by determining that historical abundance estimates obtained from whaling records and ecological studies are orders of magnitude lower than those based on the diversity of a few mitochondrial or nuclear markers^{17,18,19,20,21,22,23,24}, suggesting a slower recovery of whale populations after the end of whaling. Therefore, to overcome these challenges, we used high-coverage whole-genome sequence data and model-driven approaches to provide more power and resolution to directly detect recent demographic changes^19,25, such as whaling.

The fin whale (Balaenoptera physalus) is the second-largest whale and the one most impacted by industrial whaling worldwide. In the North Pacific alone, more than 75,500 fin whales were harvested²⁶. However, fin whales in the Gulf of California, Mexico, belong to a resident population that was not targeted by whalers^27,28. Nevertheless, this population has been small with limited gene flow from and to the Pacific for thousands of years^28,29,30,31. In contrast, the Eastern North Pacific population was large, interconnected, and overexploited²⁷, although the population along the U.S. west coast has shown evidence of growth at 3% per year since the 1990’s³².

Here, we provide direct genome-wide demographic reconstructions of whaling in a previously large population, in comparison to a never-whaled but small and isolated population. We analyze and model the whole-genome diversity of fin whale populations with contrasting demographic histories to identify the genetic and evolutionary impacts of population reductions in large, long-lived marine mammals. Understanding the complex interaction between demographic and evolutionary factors shaping the genetic diversity in whale populations is key to improving their conservation, especially given current and future whaling threats and the challenges of climate change and human inputs to marine ecosystems¹⁷. Evaluating the genomic consequences of contrasting population reductions in fin whale populations make our results relevant for the conservation of populations in other threatened or endangered species.

Results

Sampling, population structure, and differentiation

To assess the genome-wide impact of human-induced and natural bottlenecks on fin whale populations, we generated high coverage (average 27×) whole-genome resequencing data from 50 samples of free-ranging individuals collected between 1995 and 2017 (Fig. 1A; Table S1). Thirty individuals are from regions that survived intensive whaling pressure in the Eastern North Pacific (ENP), along the coasts of California (CA; N = 9), Oregon (OR; N = 4), Washington (WA; N = 2), British Columbia (BC; N = 3) and Alaska (AK; N = 12). Additionally, we included 20 individuals from a naturally small population in the Gulf of California, Mexico (GOC), that has maintained a low population size between 300 and 600 individuals for thousands of years and avoided the impacts of whaling^27,30,31.

**Fig. 1: Population structure and sample origins for the fin whale genomes obtained in this study.**

The sequences were aligned, genotyped, annotated and filtered using the minke whale genome as a reference (BalAcu1.0). We also genotyped a subset of ten individuals using a recently available fin whale genome assembly (GCA_023338255.1). We observed only a 1.5% overestimation of diversity when using the minke whale genome as reference, which could be due to a less accurate mapping (See Supplemental Discussion). Also, both reference genomes provide similar genotyping statistics and genomic diversity results (Table S2; Fig. S1; Supplemental Methods and Results), suggesting that using the minke whale genome as a reference does not introduce significant biases in our analyses (see discussion and significance tests in Supplemental Results and Discussion). Principal component analysis (PCA) separated the ENP and GOC individuals on PC1 with tight clustering of the GOC samples (Fig. 1B). A wider dispersion pattern is observed for the ENP samples, with the Alaska samples remaining relatively clustered, suggesting some degree of differentiation of this northern population from those to the south (Fig. S2). Admixture analysis of all the samples supports a K = 2 partition of ENP and GOC samples (Figs. 1C, S3). We identified one ~50% admixed individual from each population (ENPCA09 and GOC010) and a small admixture fraction from GOC in the ENP population (Fig. 1B, C). Additional admixture analysis of only ENP samples supports a K = 1 partition of this population (Fig. S4). F_ST values are higher between the GOC and ENP (F_ST = 0.073, p = 0.001) than between all locations within the ENP (F_ST = 0–0.008; Table S3). Assuming the highest F_ST of 0.008 observed within ENP, this substructure would at most inflate effective population size (N_e) estimates by 0.8%³³. Also, a phylogenetic analysis separated both populations into different groups, with the nodes within the ENP group showing no bootstrap support. The two admixed individuals clustered with ENP but showed early divergence (Fig. S5), suggesting their greater genetic differentiation. These results indicate there are two main populations in our sample, one off the Pacific coast and the other in the Gulf of California, consistent with previous microsatellite and mitochondrial data^30,31. In addition, our findings confirm the strong isolation of the geographically distinct Gulf population^30,34, whereas weak population substructure was observed in the eastern North Pacific.

Genome-wide patterns of variation and runs of homozygosity

We explored the genome-wide diversity patterns of fin whale populations by calculating average genome-wide heterozygosity and per-site heterozygosity in non-overlapping 1-Mb windows. In GOC individuals we found patterns of reduced variation, with an average 1.13 heterozygotes per kb (het/kb) and an increased proportion of genomic regions with low heterozygosity (46% of windows contain <1 het/kb). In contrast, the ENP population had much higher diversity (1.76 het/kb; two-tailed Mann–Whitney U [MWU] test p = 1.15E-10; Fig. 2A) and few regions of low heterozygosity (12% of windows with <1 het/kb; Figs. 2B, S6, S7). These genome-wide results imply contrasting demographic histories of long-term small and large population size in the Gulf and North Pacific, respectively³⁰. Compared with other cetaceans that experienced different levels of population contractions, such as the diminutive vaquita porpoise (Phocoena sinus) in the Gulf of California^35,36 (0.1 het/kb), abundant minke whale³⁷ (0.6 het/kb) and endangered blue whale³⁸ (2.1 het/kb), the GOC fin whales have maintained moderate genome-wide patterns of variation (Fig. 2A), suggesting that evolutionary mechanisms such as migration have maintained genetic diversity. However, the GOC population has an enriched number of 1-Mb windows with null or very low heterozygosity (0–0.1 het/kb) compared with more endangered mysticete species such as the North Atlantic right whale and blue whale (Fig. S8), indicating that populations of these endangered species were historically larger than the Gulf of California fin whale population and imply a reassessment towards a more threatened status of the GOC population may be needed.

**Fig. 2: ROH and distribution of heterozygosity across the genome.**

To characterize the history of inbreeding events, we identified runs of homozygosity (ROH), which are genomic stretches within an individual that are assumed to be identical by descent, using two model-based methods^39,40 (Fig. S9). Long ROH (≥5 Mb) typically result from recent close inbreeding whereas shorter ROH indicate either older inbreeding or older reductions in population size⁴¹. Overall, GOC individuals contained considerably more ROH segments than ENP individuals (two-tailed MWU test p = 9.42E-08), but most of the ROH were of short (0.1–1 Mb) or intermediate (1–5 Mb) length (Fig. 2A). Long ROH were present in all GOC individuals, except the admixed sample GOC010, and only in three ENP individuals. Nevertheless, they comprise a small fraction of total ROH length in both populations (F_{ROH ≥ 5M} = 0.4–3.1%; Table S4). To further explore the timing of inbreeding, we estimated the average time at which two homologous haplotypes could coalesce within our ROH categories for each population, assuming a recombination rate of 1 cM/Mb⁴². For short ROH, haplotypes coalesced on average approximately 145 and 250 generations ago in GOC and ENP, respectively, whereas for intermediate ROH the average haplotype coalescent time was 28 and 30 generations ago. These findings suggest a lack of recent inbreeding in both populations (Figs. 2A, S10). However, the higher number and longer ROH observed in the GOC fin whales (Figs. 2A, S9, S10), together with the high proportion of their genome contained in ROH larger than 1 Mb (F_{ROH ≥ 1M(GOC)} = 17.5–23.4%; Table S4), indicate that genomic segments in this population share a more recent common ancestor than they do in the Pacific population. Finally, we determined the relatedness between individuals in both populations and found significantly higher average kinship coefficient among GOC individuals (0.054) than in the ENP population (0.0032; two-tailed MWU test p < 2.2E-16), indicating greater identity-by-descent in the GOC, which further demonstrate higher inbreeding levels in this population (Fig. S11A). We divided the ENP into location groups to account for larger geographical coverage and continued to observe significantly higher kinship in the GOC (Fig. S11B, C). In summary, these results reflect the greater historical isolation and small population size of the GOC²⁹ and a lack of recent inbreeding in both populations.

Demographic inference of whaling, divergence and gene flow

We reconstructed the demographic history of fin whale populations using the site frequency spectrum (SFS) to assess the impact of whaling in the Eastern North Pacific population and to determine the demographic events that have shaped the genomic diversity of the Gulf of California population. First, using the SFS from each population, we tested different single-population effective size (N_e) change models, employing coalescent⁴³ (fastsimcoal2) and diffusion approximation⁴⁴ (∂a∂i) methods. We assumed a generation time of 25.9 years⁴⁵ and a mutation rate of 2.77E-08 mutation/bp/generation³⁷, and tested several nested models with increasing numbers of size-change epochs (Fig. S12). Both inference methods provided concordant findings and ∂a∂i results are shown throughout the text, except when noted (see Tables S5–S7, for fastsimcoal2 results and all 95% confidence interval [CI] values). Our demographic analyses show that a 3-epoch model was the best fit for the ENP population (Figs. 3A, B, S13A; Tables S5, S6) and revealed an expansion starting ~115 thousand years ago (kya; 4,424 generations), from an ancestral N_e of 16,479 to 23,913. This was followed by a severe decline only 26 (one generation ago for fastsimcoal2 estimate; 95% CI: 0–2) or 52 years before present (two generations ago for ∂a∂i estimate; 95% CI: 1.89–2.11) to a current N_e = 305 individuals (95% CI: 0–1137; Fig. 3A, B; Table S7), representing an ~99% reduction. To further verify the timing and size of this recent population reduction, we implemented a grid search (Fig. S14, see Supplemental Methods and Supplemental Results), performed additional inference runs varying the time for the whaling reduction (Tables S5, S7), used different optimization methods (Table S8), confirmed our power to detect such recent decline using coalescent SFS simulations under this model (Fig. S15), and ran supplementary inferences under a SFS without filtering on genotype calls to avoid bias against rare alleles (Tables S9, S10; Supplemental Methods and Supplemental Discussion). These additional analyses demonstrated that our findings reflect a drastic recent reduction one or two generations ago. Since the average collection year for samples from this population was 2006 (Table S1), the estimated times of the reduction correspond to the years 1954 to 1980, coinciding with the most intense whaling period this population suffered between 1940 and 1980^26,27.

**Fig. 3: Demographic history inferred for fin whale populations.**

For the Gulf of California population, none of the inferred SFS for the single-population models had a good fit to the data (Fig. S13B). Additionally, the models with the best likelihood did not show convergence or concordant parameter estimation between inference methods (Tables S5, S6, S7), which can indicate an overparameterization of the models (see Supplemental Results). Therefore, we inferred the demographic history of the Gulf whales using a two-population model (described below) because they have shown to contain more information than single-population models and improve demographic inference⁴⁶.

The time of divergence and migration rates between both populations were estimated by testing several two-population models based on the joint SFS between ENP and GOC (Figs. S16, 17; Table S5). The model of an ancestral size change before the populations diverged fits our data well (Figs. 3C, S17; Table S5), is consistent among inference methods (Tables S11, S12) and is biologically feasible, therefore it was chosen as our best model (see Supplemental Results). This model predicted that before the populations separated, the ancestral population expanded from ~16,000 effective individuals to ~25,000, more than 100 kya (4322 generations). Then, the populations split between 16 and 25 kya (616 and 960 generations, ∂a∂i and fastsimcoal2 estimates, respectively). Thereafter, the ENP population remained at N_e = 17,386 until it recently crashed due to whaling, as shown by the single-population model. By contrast, the GOC effective population size remained small after the divergence at N_e = 114. The model also inferred asymmetrical gene flow, with a higher migration rate from the Pacific into the Gulf population (3.42E-03; fraction of individuals that are migrants) than in the opposite direction (9.24E-05; Table S11). However, when scaled by the receiving population’s effective size, these rates represent a long-term effective migration of 0.39 immigrants per generation into the Gulf and 1.61 into the Pacific population (Fig. 3C).

To test if unsampled (ghost) populations contributed to migration into the GOC, we ran additional two-population models incorporating feasible ghost populations, the South Pacific and the western North Pacific (WNP). The ghost western North Pacific had a higher log-likelihood (Table S13) but did not considerably increase the total migration into the Gulf of California (the migration rate and effective migration from the ghost WNP into the GOC were 2.09E-04 and 0.01, respectively; Table S14; Fig. S18), demonstrating that migration from ghost populations into the GOC is negligible and does not affect our estimates. However, ghost population models revealed that the divergence between the ancestral ENP and ghost WNP populations match the expansion observed in both the single-population ENP and two-population models, around 4300 generations ago (Supplemental Discussion; Figs. 3A, C, S18; Tables S7, S11, S14).

Our results suggest the GOC population was founded at the end of the Wisconsin glaciation during the Last Glacial Maximum⁴⁷ and remained small and highly isolated since then, receiving less than one migrant per generation (Fig. 3C). These findings are substantially different from estimates based on mitochondrial and microsatellite loci that predicted more recent divergence times, ~2300 or 9300 years before present (123 or 360 generations ago, respectively) and ~1 migrant per generation^30,31 (see Supplemental Discussion). Therefore, our results emphasize the greater resolution of whole-genome resequencing data for demographic inference empowered by the sheer availability of independent genealogies sampled²⁰ compared with only a handful of microsatellite loci³⁰ and a maternally inherited non-recombining marker.

Putatively deleterious variation and genetic load

Our demographic inference analysis suggests a historically large population size and a recent contraction for the ENP population and a high degree of isolation for the GOC population. To assess how these demographic trajectories have impacted fitness, we examined variants in coding regions, which are more likely to have functional impacts. The derived alleles were classified into four mutation types: synonymous, tolerated nonsynonymous (SIFT score ≥0.05), putatively deleterious nonsynonymous (SIFT score <0.05), and loss-of-function (LOF; identified using snpEff, details in Methods). The synonymous and tolerated nonsynonymous mutations serve as a proxy for neutral variants whereas the putatively deleterious nonsynonymous and LOF mutations are proxies for putatively deleterious variants⁴⁸. Although amino-acid changing variants could serve as candidates for local adaptation, most of them are deleterious^49,50. Since the dominance for variants in natural populations is poorly quantified, we assumed two extreme scenarios. Specifically, the dominance of all variants is fully recessive (h = 0), or fully additive (h = 0.5).

For all four mutation types, heterozygosity is significantly depleted and homozygosity is significantly elevated in the GOC population (MWU tests p = 2.9E-12 in all comparisons; Table S15). This pattern has not been reported in other fin whale populations or great whale species²⁵ and is consistent with reduced genome-wide heterozygosity and small population size. The number of homozygous derived putatively deleterious nonsynonymous genotypes per individual was on average 39.68% higher in the GOC (2079) compared to the ENP population (1488). Similarly, the number of homozygous-derived LOF genotypes was on average 28.98% higher in the Gulf (140) compared with the Pacific population (108; Fig. 4A). Assuming that these putatively deleterious mutations are also at least partially recessive, this increased homozygosity in the GOC is predicted to result in reduced fitness⁵¹.

**Fig. 4: Increase in putatively deleterious variation in the GOC compared to the ENP fin whales.**

When deleterious mutations act in an additive manner, the genetic load is determined by counts of derived alleles per genome. We found that the ENP and GOC populations showed a similar number of derived neutral alleles as expected⁵² (Table S15). For the putatively deleterious class of mutations, only nonsynonymous alleles showed a significant 2.03% elevation in the GOC population (GOC average = 5983, ENP average = 5864, MWU test p = 1.20E-07), whereas the number of LOF alleles were similar in the two populations (p = 0.87; Fig. 4B). Assuming that these nonsynonymous alleles are slightly deleterious, the small population size of the GOC population likely increased the strength of genetic drift and decreased the efficacy of selection compared to the larger ENP population, allowing the persistence of deleterious variants in the Gulf. By contrast, the similar number of LOF alleles indicates that, in spite of the GOC population’s small size, purifying selection has remained effective at eliminating the most deleterious mutations. Overall, these results imply a slight increase in the genetic load in the GOC population if deleterious mutations are additive.

Finally, we computed the R_XY (relative accumulation of derived alleles) and R²_XY (relative accumulation of derived homozygotes) statistics that compare the expected number of the derived alleles or homozygotes occurring only in one population⁵³ (Fig. 4C). Among the four mutation types, only the deleterious nonsynonymous alleles showed a relative accumulation of derived alleles in GOC (R_GOC/ENP = 1.04, Z-score p = 0.02), similar to the allele counts pattern (Fig. 4B). However, the R²_XY was significantly elevated for all mutation types in the GOC population (Z score p < 0.001 for all comparisons), consistent with their higher homozygosity values in GOC (Fig. 4A). We repeated these analyses using snpEff’s mutation impact categories (i.e., high, moderate and low) to rule out software bias (see Methods) and found similar results (Fig. S19). In summary, these results suggest an increase in genetic load in the GOC population, both due to a shift towards higher homozygosity among all protein-coding variants, as well as an overall accumulation of putatively deleterious nonsynonymous alleles compared to the ENP population. However, the magnitude of the effect on fitness is unclear, given uncertainties about the selection and dominance coefficients of these mutations⁵¹.

Simulations of deleterious variation and genetic load

To further explore how fin whale demographic history and the recent whaling-induced decline has shaped patterns of deleterious variation and accumulation of genetic load, we ran forward-in-time genetic simulations using SLiM v.3.3.2⁵⁴. We simulated a 10 Mb chromosomal segment with a combination of intergenic, intronic, and exonic regions. Selection coefficients for nonsynonymous deleterious mutations were drawn from a distribution estimated from humans⁵⁵, and dominance coefficients were set such that the most deleterious mutations were highly recessive, though nearly neutral mutations were closer to additive (see Methods for details).

Using this simulation framework, we first investigated the extent to which the recent whaling bottleneck may have led to an increase in genetic load in the ENP population. Specifically, we simulated under our best-fit ENP demographic model, which includes a contraction to N_e = 305 two generations ago (Fig. 3A). After two generations at N_e = 305, we did not observe any changes in genetic load, heterozygosity, or levels of inbreeding, as expected given the short duration of this decline (Fig. 5A). To explore how various potential recovery scenarios may impact the viability of the ENP population in the future, we continued these simulations for an additional 18 generations following the decline, during which we observed increasing trends for genetic load and levels of inbreeding, though minimal impacts on genetic diversity (Fig. 5A). To test the impacts of a partial recovery in the ENP, we also ran simulations where we increased the effective population size to N_e = 1000 after two generations at N_e = 305. Here, we observe minimal increases in genetic load and inbreeding, suggesting that even a modest recovery would stave off any deleterious genetic effects (Fig. 5A). In conclusion, these results highlight the importance of a prompt recovery to minimize deleterious genetic impacts from the whaling bottleneck.

**Fig. 5: Simulations of heterozygosity, inbreeding coefficient, and genetic load.**

Our next aim for these simulations was to assess the importance of low levels of migration (0.39 effective migrants/gen from ENP to GOC) for maintaining genetic diversity and fitness in the small GOC population (N_e = 114) despite long-term isolation (~16 kya). We simulated under our best-fit two-population demographic model, running simulations that included the estimated rates of migration between the ENP and GOC (Fig. 3C) as well as simulations where no migration was allowed. When carrying out simulations that include the empirically inferred rate of migration from ENP to GOC, we observe a 26.7% reduction in heterozygosity and increase in F_{ROH > 1Mb} from 0 to 0.10 in the GOC population compared to the ENP population (Fig. 5B), in good agreement with the trends from our empirical dataset (35.7% empirical heterozygosity reduction; Fig. 2). Additionally, we find that average genetic load in the GOC population is elevated to 7.75% compared to 2.87% in the ENP population (Fig. 5B). However, this increase in genetic load appears to be counteracted by the removal of recessive strongly deleterious mutations (s < −0.01), which are reduced in frequency by 22.9% in the GOC population (Fig. S20). By contrast, we observe minimal differences in the numbers of moderately (−0.01 < s ≤ −0.001) or weakly (−0.001 < s ≤ −0.00001) deleterious alleles per individual (Fig. S20), suggesting that migration has helped keep these mutations from drifting to high frequency in the GOC population. In summary, these results suggest that isolation and small population size in the GOC may have resulted in a lowered fitness, though these fitness reductions have apparently not been substantial enough to impact population viability.

When simulating without migration, we observed far more dramatic changes in the genetic composition of the GOC population. Specifically, we found a near-complete loss of genetic diversity, higher levels of inbreeding (F_ROH>1Mb = 0.11), and a substantial increase in genetic load to 10.3% in the GOC population (Fig. 5B). The loss of diversity was also confirmed in theoretical calculations (see Supplemental Results). This increase in genetic load appears to be driven primarily by fixation of moderately deleterious alleles (9.22% gain in the isolated GOC population compared with the migration scenario; Fig. S20). Thus, these simulations suggest that, in the absence of migration, the GOC population would have experienced a much more substantial increase in genetic load, which may have been substantial enough to drive extinction. In conclusion, these results highlight the importance of low levels of migration in maintaining viability in the GOC population over its long period of isolation.

Discussion

Detecting recent population bottlenecks in endangered species using estimates of genetic diversity in contemporary samples has been challenging^19,20, especially in long-lived species with long generation times, such as the great whales^21,56. Specifically, the influence of changes in population size on genetic diversity is slow relative to temporal scale of human-induced events¹⁹ and the overall loss of genetic variation depends on the duration of the bottleneck relative to the life history traits^57,58 such as life-span and generation time. Although genomic data can improve our ability to detect the impact of bottlenecks, most studies analyzing whole-genome data have failed to detect signals of whaling in blue³⁸ and gray whales⁵⁹, presumably due to small sample sizes. Recently, low-coverage sequencing of North Atlantic fin whales may have recovered a signal of whaling, although the results did not completely rule out the alternative scenario of a more gradual decline over the last 600 years rather than an abrupt whaling bottleneck²⁵, two scenarios which are challenging to disentangle, particularly with added uncertainties associated with low-coverage data. Here, we show that using high-coverage genome resequencing (~27×), sampling a high number of individuals (~30 per population) at a single timepoint, and implementing SFS-based demographic inference approaches, anthropogenic population contractions, such as the one imposed by the 20th-century whaling on fin whales^26,27 can be identified (Supplemental Discussion). In addition to our sampling and methodological approaches, the combination of a high pre-whaling genetic variation possessed by the fin whales in the Eastern North Pacific^30,31,34,60 together with an extreme reduction of two orders of magnitude, even if short, likely caused a deficit in low-frequency variants in present-day individuals that we were able to detect²⁰ (Fig. 3B). Therefore, our research demonstrates that even very recent human-driven population bottlenecks leave a detectable genomic footprint in the SFS derived from genome-wide data of contemporary individuals, and this signal can be used to identify the demographic and genetic effects of recent exploitation and model current and future impacts on populations.

Our study examines the natural experiment of whale populations that have experienced both natural and anthropogenic population bottlenecks, providing unique contrasts not available in single-population studies²⁵. Despite a 99% decline in effective population size, the Eastern North Pacific fin whales have retained most of their pre-whaling genetic diversity (Figs. 2, 5A). They do not exhibit a substantial decrease in genome-wide heterozygosity nor an increase in inbreeding or genetic load (Figs. 2, 4 and 5A), similar to that found in a North Atlantic population²⁵. Since genetic diversity declines exponentially with the number of generations passed from the contraction, this lagging impact on genetic diversity is likely a consequence of the long generation time of fin whales⁴⁵ (~25.9 yrs) relative to the duration of the whaling bottleneck (~70 years) and a partial recovery following the whaling moratorium beginning in 1985^32,58,61. The contraction, although severe, only lasted for two generations (see Supplemental Results). However, other detrimental effects remain alarming. The reduction in 99% of pre-whaling effective size has likely had strong ecological consequences^15,18,62. Additionally, if the ENP population does not completely recover and remains relatively small, it may experience a loss of adaptive potential to resist future climate change or disease⁶³. Furthermore, this reduced effective population size in the ENP could also imperil the viability of the Gulf of California population by further diminishing or completely halting migration into this population, which our simulations have shown can accelerate the accumulation of deleterious load and loss of genetic diversity. These simulations allowed us to explore genomic consequences under various conservation scenarios (Fig. 5), an important perspective not yet adopted in other great whale genomic studies^25,38,59. Both empirical and simulation findings show that continuing the current moratorium and enhancing population size remains essential for fin whale recovery and long-term persistence^17,26.

Regarding the Gulf of California fin whale population, our results show that immigration from ghost populations is negligible (see Supplemental Discussion) and as few as 0.39 migrants per generation have been sufficient to maintain genetic diversity and fitness in this population over ~16,000 years of isolation (Fig. 5B), which is consistent with other genetic and ecological studies describing the isolation of this population^28,30,34. By contrast, when omitting migration from our simulations, we observe a near-complete loss of genetic diversity and a substantial increase in levels of inbreeding and genetic load (Fig. 5B). Thus, these results highlight the importance of gene flow for maintaining population viability over long evolutionary timescales^11,64, even when levels of migration are far lower than the classic rule of thumb of ‘one migrant per generation’¹⁰. This rule has been widely applied in conservation, however, it is based on a neutral model that makes numerous simplifying assumptions and does not consider deleterious variation¹². Here, we combine empirical observations with more realistic models including deleterious variation to demonstrate that small populations can be maintained by exceedingly low levels of migration, even when modest levels of genetic load may accumulate⁶⁵. These results have important implications for conserving other small and isolated populations, where maintaining high levels of migration may not be feasible.

Population persistence in the GOC also appears to be enabled in part by eliminating strongly deleterious mutations, as has been shown in other small vertebrate populations^66,67 including marine mammals³⁶. Specifically, our simulations suggest a 22.9% reduction in the frequency of these mutations in the GOC (Fig. S20) due to its long-term small population size, occurring despite the impact of gene flow continually reintroducing these mutations¹³. However, we were unable to detect this decrement in our empirical dataset, where we observed similar numbers of putatively deleterious LOF mutations in the GOC and ENP populations (Fig. 4). This discrepancy could be partially explained by LOF mutations being an imperfect proxy of strongly deleterious variation^68,69, as shown in empirical studies⁴⁸. Although it could be argued that some genomic patterns of deleterious variation might reflect local adaptation in the GOC population, this explanation seems unlikely. For example, only drift would cause increased homozygosity in all mutation categories as observed, specifically, increased homozygosity in synonymous variants is not expected under a scenario of local adaptation (Fig. 4A, C). Moreover, local adaptive events occur more rarely than genetic drift and purifying selection that is constantly ongoing in natural populations⁷⁰.

Here, we have assessed the genomic impacts of both natural and anthropogenic bottlenecks on the second-largest mammal. We demonstrate that it is possible to confidently estimate the magnitude and timing of recent human-driven population bottlenecks, and to determine the key role that gene flow and potential purging of deleterious variants play in the persistence of small isolated populations by analyzing whole-genome resequencing data from contemporary samples together with individual-based simulations. From a conservation perspective, our findings expose the severity of whaling and indicate that it is necessary to reassess the recovery goals for the ENP fin whales and the regional threatened status of the GOC population, which may warrant specific conservation actions to maintain gene flow and avert additional impacts from climate change, mortality by entanglement²⁸ or microplastic contamination⁷¹. Therefore, our study contributes to fulfilling the overdue promise of genomics to conservation biology concerning the genetic effects of very recent population reductions caused by anthropogenic activities and identifying the evolutionary and ecological processes that promote the viability of small populations⁷². Finally, we demonstrate the importance of using both genomic and simulated data to inform the conservation of intensely exploited species.

Methods

Samples and sequencing

Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle^73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented by individuals from the coasts of California [9], Oregon [4], Washington [2], British Columbia [3], and Alaska [12]; Table S1), and the Gulf of California (GOC; N = 20, from seven different localities; Bahía de La Paz [3], Loreto [6], Bahía de los Angeles [5], Bahía Kino [3], North of Tiburon Island [1], Puerto Refugio [1] and out of Bahía Los Frailes [1]). All samples from the Gulf of California were obtained under the appropriate collecting permits issued by the Mexican Wildlife Agency (Dirección General de Vida Silvestre, Subsecretaría de Gestión para la Protección Ambiental, Secretaría del Medio Ambiente y Recursos Naturales; permit numbers: D0070(2)−0598, D00700(2)−14093, D00750-1537 and SGPA/DGVS/−0576). Samples from the Eastern North Pacific were collected by the Southwest Fisheries Science Center (California, USA) under US Marine Mammal Protection Act permits (NMFS-873, NMFS-1026, NMFS-774-1437, NMFS 0782-1438, NMFS-774-1714, NMFS-774-1437, NMFS-14097, and NMFS-19091). DNA from the samples was extracted using the QIAamp DNA Mini Kit (Qiagen; California, USA. Catalog number: 51304). The genomic libraries were prepared from extracted DNA using the Illumina TruSeq DNA PCR-free standard kit (Illumina; California, USA. Catalog number: 20015962) following the manufacturer’s instructions. Whole-genome sequencing was performed using the 150-bp paired-end protocol on Illumina HiSeqX or NovaSeq6000 platforms. Library preparation and sequencing were performed in Fulgent genetics’ sequencing core facility (Fulgent genetics LLC; California, USA).

To compare the fin whales’ genomic characteristics within Mysticeti, previously generated whole-genome resequencing fastq data from four representative Mysticeti species were downloaded from the NCBI Sequence Read Archive: the minke whale (Balaenoptera acutorostrata; SRR1802584), a stable and abundant rorqual; the humpback whale (Megaptera novaeangliae; SRR5665639), the closest relative with fin whales; the North Atlantic right whale (Eubalaena glacialis; SRR5665640) and the blue whale (Balaenoptera musculus; SRR5665644), the most endangered baleen whales (Table S1).

Read processing and alignment

We followed the sequence reads processing and genotyping pipeline adapted from the Genome Analysis Toolkit (GATK) Best Practices Guide⁷⁵. Read quality was first checked using FastQC v.0.11.8⁷⁶. Illumina adapters were removed from the paired-end sequence reads using picard (v.2.20.3) MarkIlluminaAdapters. The adapter-free paired-end reads were aligned against the minke whale (Balaenoptera acutorostrata scammoni) reference genome (GCF_000493695.1 [BalAcu1.0]; Scaffold N50: 12,843,668, Downloaded on November 12, 2019) using BWA-MEM v.0.7.17⁷⁷. Mapping statistics were generated using QUALIMAP v.2.2⁷⁸ and samtools v.1.9⁷⁹. We used the minke whale genome as a reference because the available fin whale genome assemblies are much more fragmented and poorly annotated (GCA_008795845.1; Scaffold N50: 871,016) or they did not have a publicly available genome annotation as of November 2022 (GCA_023338255.1), and the blue whale genome (GCF_009873245.2) did not have genome annotation in 2019 (Supplemental Methods; Table S16; Fig. S21). The fin whale and minke whale are in the same genus, with a divergence time of ~10 million years ago³⁸. The average mapping rate of fin whale reads to the minke whale genome is 99.09 ± 0.21% (Table S1), which is similar to the 99.49% mapping rate to the most recent fin whale reference genome (GCA_023338255.1; Table S2), obtained from a subset of samples (n = 10; see Supplemental Methods), suggesting that the divergence time with minke whales did not strongly impact read alignment.

Genotype calling and filtration

Joint genotype calling at all sites (including invariant positions) across the reference genome was performed using GATK⁸⁰ (v.3.8). We removed PCR duplicates from the bam files using picard MarkDuplicates. Raw variant calling was performed for each individual using GATK’s HaplotypeCaller using the default settings for removing low-quality reads (min_mapping_quality_score=20; min_base_quality_score=20). Joint genotype calls for the 50 fin whales were generated from the raw variants using GATK GenotypeGVCF, excluding scaffolds shorter than 1 Mbp. The total scaffold length used for genotyping was 2,324,429,847 bp, with the excluded scaffolds comprising only 4.4% of the total genome length (107,257,851 bp out of 2,431,687,698 bp).

Since we do not have a database of known variants, we did not perform base quality score recalibration (BQSR) or variant quality score recalibration (VQSR). Instead, we performed a stringent set of quality and depth filters for the genotype calls, keeping only high-quality biallelic SNPs and monomorphic sites with the latter including all homozygous reference or all homozygous alternate genotypes (Fig. S22). Sites that (1) had low Phred score (QUAL < 30); (2) failed GATK recommended hard filters (QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0 || SOR > 3.0); or (3) fell within repeat regions identified by WindowMasker⁸¹, RepeatMasker⁸² or CpG islands identified by UCSC genome browser (total length: 1,247,900,490 bp), were marked as failed filtration (Fig. S22A). For the sites that passed the above filters, we performed genotype-level filtration. Specifically, for each individual, only genotypes with a minimum depth of eight reads and maximum depth of 2.5x mean depth; a minimum Phred score of 20, and expected allele balance (the following thresholds were used for the allele balance, defined as the read depth for the reference allele divided by the total read depth: ≥0.9 for homozygous reference genotypes; between ≥0.2 and ≤ 0.8 for heterozygous genotypes; and ≤ 0.1 for homozygous alternative genotypes) were kept. Genotypes that failed these filters were converted to missing (Fig. S22B). Thereafter, sites were further filtered if they had more than 20% missing genotypes or more than 75% heterozygous genotypes (Fig. S22A). We repeated the genotype calling and filtration pipeline with four additional baleen whales included with 50 fin whale samples. The derived dataset (“f50b4” in the following text) was only used in the construction of neighbor-joining tree and generation of genome-wide heterozygosity comparison. An additional variant dataset (“genotype-filter-free” dataset) for the ENP individuals without any genotype-level filters was generated and used in confirmatory demographic inference (Supplemental Methods). We also performed the same genotyping pipeline using the most recent fin whale genome as reference (GCA_023338255.1) in a subset of 10 individuals (10-fin-ref dataset) to determine if there were significant differences in genomic diversity estimates caused by the reference genome used (minke whale vs fin whale; see Supplemental Methods, Results, and Discussion). The total number of sites that passed all the filters in our genotyping pipeline for the different datasets we analyzed is reported in Table S17.

Variant annotations and identification of neutral regions

We annotated variant sites using two softwares, snpEff v.4.3.1⁸³ and SIFT4G v.6.0⁸⁴. We used the minke whale genome annotation gtf file to build custom snpEff and SIFT4G databases with default settings. We then annotated and predicted the effects of variants with -canon option in snpEff and -t option in SIFT4G. The most deleterious effect was selected per site.

Although a recent fin whale genome assembly (GCA_023338255.1) has been annotated²⁵, this annotation is not publicly available at the present time, preventing us to use it to identify putatively neutral regions for our demographic and deleterious variation analyses. In addition, if the annotation of this fin whale genome assembly would be available it is unlikely it will significantly affect our main results and conclusions (See Supplemental Discussion).

We used the minke whale as an outgroup to classify the allele ancestral states, and considered the sites in the minke whale reference sequence as ancestral. Because the minke whale has evolved since the common ancestor with these two populations of fin whales, the ancestral alleles identified may not represent the true ancestral state. However, this error is not expected to bias the relative comparison of variants between the ENP and GOC fin whales since they are equally diverged from the minke whale. To detect the putatively neutral regions for demographic modeling, we first extracted sites that passed all filters and are at least at 20 kb distance from exons or coding regions and not in CpG islands or repetitive regions using bedtools v.2.28.0⁸⁵. The identified regions were aligned to the zebra fish genome, using BLAST v.2.7.1⁸⁶, regions with a hit with e-value lower than 1E-10 were further removed, as they could represent conserved regions and not evolving neutrally. 397,627,899 sites were defined as neutral.

Evaluation of population structure

Population structure analyses were performed using the R package SNPRelate v.1.16.0⁸⁷ and gdsfmt v.1.22.0⁸⁸. We selected biallelic sites in the vcf that passed variant filtration criteria and converted them to gds format using function snpgdsVCF2GDS. Linkage disequilibrium pruning was implemented (snpgdsLDpruning) with an r² cutoff of 0.2, and a minor allele frequency cutoff of 0.10. A total of 30,350 SNPs were kept for PCA, kinship, and F_ST analyses.

We performed the PCA analysis using the function snpgdsPCA. After observing the overall population structure, an additional PCA was performed within ENP individuals to inspect variation among locations. The kinship between sample pairs was assessed using PLINK’s identity-by-descent method of moments approach (snpgdsIBDMoM). We calculated kinship at three different levels: (1) populations (groups: ENP and GOC), (2) sampling locations (groups: AK, BC, OR, WA, CA, and GOC); and (3) merged middle ENP locations combining samples from BC, WA and OR (groups: AK, MENP and GOC). The two-tailed MWU test was used to compare the average kinship coefficients among groups. F_ST between populations, sampling locations and merged ENP locations were calculated using the Weir and Cockerham estimator⁸⁹, with a SNP missing rate at 20% (function snpgdsFst, missing.rate = 0.2). The significance of F_ST was estimated using 999 permutations described in ref. ⁹⁰. Due to the low sample size in BC, OR and WA locations, we only estimated the significance of F_ST between populations and merged ENP locations. To determine the potential influence from population substructure within ENP on N_e estimates, we calculated the population size inflation factor by 1/(1- F_ST)³³, using the highest F_ST value found in the ENP.

The LD pruned SNP set was converted to PLINK ped format using function seqGDS2VCF in R package SeqArray v.1.26.2⁸⁸ and PLINK v.1.90⁹¹. ADMIXTURE⁹² (v.1.3.0) analyses were performed using values of K from two to six, with 10 iterations per K. Mean cross-validation (CV) error for each K was used to select the best number of ancestral populations (K). To further test a substructure in the ENP, additional ADMIXTURE analyses were performed within ENP individuals, using values of K from one to six, with the same settings described above. A neighbor-joining phylogenetic tree was constructed from 32,191 LD pruned SNPs in the “f50b4” dataset using function nj in R package ape v.5.3⁹³, and visualized using ggtree v.2.0.4⁹⁴. 1000 bootstraps were performed, and the North Atlantic right whale (“EubGla01”) was designated as the outgroup (Fig. S5).

Heterozygosity and identification of runs of homozygosity

We defined heterozygosity as the number of heterozygous genotypes divided by the total number of called genotypes, including monomorphic sites, that passed variant filtration standards⁴⁸. We first calculated the genome-wide heterozygosity for all scaffolds used for genotyping. Two-tailed MWU tests were used to evaluate if the genome-wide heterozygosity varied significantly between the ENP and GOC populations. We also calculated the per-site heterozygosity in non-overlapping 1 Mb windows across the scaffolds. Windows with more than 80% missing data were excluded. The missing data in these windows derive from regions that failed site filtering criteria described above.

For identifying ROH, we first separated the vcf file for ENP and GOC individuals and reestimated allele frequencies within each population. ROH were identified using bcftools roh -G30 in bcftools v.1.9³⁹. Three individuals were excluded from bcftools ROH analyses to avoid biasing allele frequency estimations [ENPCA09 and GOC010 due to admixture proportion > 0.25; ENPOR12 due to low genotyping rate (Fig. S22)]. Additional ROH analysis was performed using R package RZooRoH v.0.2.3⁴⁰, which can classify ROH segments into different age classes. A model with ten classes (9 ROH and 1 non-ROH) and a successive rate of three was applied (zoomodel, K = 10, base = 3). A minor allele frequency cutoff of 0.05 was used but no individual was excluded. For both methods, ROH segments less than 100 kb were discarded. The rest of the segments were divided in three length categories, short (0.1 Mb ≤ ROH < 1 Mb), intermediate (1 Mb ≤ ROH < 5 Mb) and long (≥5 Mb). The concordance of the two methods was confirmed (Fig. S9) and the output from the RZooRoH analysis is shown in the main text. The proportion of genomes with ROH (F_ROH) was calculated as the total length of ROH passing a certain length threshold (e.g. ROH > 100 kb) within an individual divided by the total scaffold length used for genotyping (2,324,429,847 bp). We used the two-tailed MWU test to compare total number of ROH segments in all length categories obtained in the two populations.

To determine if the inbreeding observed in both fin whale populations were due to recent or older events, we estimated the average time at which two haplotypes would coalesce in each of the ROH categories (short, intermediate and long). The length of ROH associated with inbreeding (L) decreases due to recombination in each generation and follows an exponential distribution^95,96,97. The mean length of ROH in the exponential distribution is E[L] = 100/2tr, where E[L] is the mean ROH length (in Mb), the constant 100 represents large segments belonging to the common ancestor in cM, t is the number of generations to the common ancestor and r is the assumed constant recombination rate of 1 cM/1Mb^42,98. Therefore, we calculated on average how many generations ago two haplotypes shared a common ancestor in each of the ROH categories as t = 100/2E[L]r⁴².

Projected site frequency spectra

A vcf file comprising only putatively neutral SNPs was used to obtain the site frequency spectrum (SFS) within and between populations. To avoid introducing bias to our demographic inferences from known contributing factors, such as uneven read depth⁹⁹, admixture proportions⁴⁴ and highly related individuals¹⁰⁰, six individuals were discarded in SFS projection (Low genotype depth: “ENPOR12”; Admixture proportion > 0.25: “ENPCA01”, “ENPCA09”, “GOC010”; Kinship > 0.15: “GOC080”, “GOC111”). To avoid uncertainties in ancestral state classifications, we computed a folded SFS. This SFS was calculated based on a hypergeometric projection implemented using easySFS v.0.0.1 (https://github.com/isaacovercast/easySFS), which minimizes the effects of missing genotypes¹⁰¹ (https://dadi.readthedocs.io/en/latest/user-guide/manipulating-spectra/#projection). From this projection, an optimal number of haploid individuals with a maximized number of SNPs are identified and this number is then used to construct the folded SFS. Both the single-population SFS for each population (projected haploid size: ENP = 44, GOC = 30; projected number of SNPs: ENP = 3,410,730, GOC = 1,532,968) and the joint two-population SFS were generated (projected number of SNPs: ENP-GOC = 3,418,226). Thereafter, the count of monomorphic sites was calculated and incorporated as follows: for the single-population SFS, monomorphic sites in the neutral regions that were called in at least the number of haploid individuals in the projection were added to the 0-bin already calculated by the projection. For the two-population SFS, monomorphic sites were computed by counting the number of monomorphic sites that were called in at least 44 haploid individuals in the ENP population and at least 30 haploid individuals in the GOC population. These sites were added to the previous 0-0-bin of the projection.

Demographic history reconstruction

We utilized the projected neutral SFS generated above to reconstruct the demographic history of fin whales surveyed in this study using two methods: ∂a∂i⁴⁴ (v.2.2.1; Diffusion Approximations for Demographic Inference) and fastsimcoal2⁴³ (v.2.6; fast sequential Markov coalescent simulation).

To explore a variety of possible demographic scenarios, we first tested the following single-population models on the ENP and GOC populations separately (Fig. S12; Table S7). All the models are described forward in time. For population size parameters (N_ANC, N_CUR, etc.), all values are in units of numbers of diploids. For time parameters (T, T_CUR, etc.), all values are in units of generations. For the ENP population, we explored two additional 3Epoch models fixing the T_CUR to two generations (3EpochTcur2) or three generations (3EpochTcur3).

1.
1Epoch: single epoch model with no population size change. This model provides a “null model” that estimates ancestral population size (N_ANC).
2.
2Epoch: two epoch model with one size change event, from the ancestral size (N_ANC) to the current size (N_CUR) occurring T generations ago.
3.
3Epoch: three epoch model with two size change events. The first event changed from the ancestral size (N_ANC) to a bottleneck size (N_BOT) and lasted for T_BOT generations. The second event changed from the bottleneck size (N_BOT) to the current size (N_CUR) occurring T_CUR generations ago.
4.
4Epoch: four epoch model with three size change events. The first event changed from the ancestral size (N_ANC) to a bottleneck size (N_BOT) and lasted for T_BOT generations. The second event changed from the bottleneck size (N_BOT) to a recovery size (N_REC) and lasted for T_REC generations. The third event changed from the recovery size (N_REC) to the current size (N_CUR) occurring T_CUR generations ago. For the 3Epoch and 4Epoch models, we note that despite the population sizes were named as a “bottleneck size” or “recovery size”, we did not restrict the direction of size changes (expansion or contraction) for any events.

Next, we tested the following two-population models (Fig. S16; Table S11) to elucidate the divergence time and gene flow in the ENP and GOC populations:

1.
Split-NoMigration: a simple population split model with no migrations. The ancestral population (N_ANC) diverged into the ENP (N_ENP) and GOC (N_GOC) populations occurring T generations ago. Two populations remained isolated since then.
2.
Split-SymmetricMigration: an isolation-migration model. The ancestral population (N_ANC) diverged into the ENP (N_ENP) and GOC (N_GOC) populations occurring T generations ago. The ENP and GOC populations maintained a symmetric migration rate of m.
3.
Split-AsymmetricMigration: another isolation-migration model. This model is similar to model 2 (Split-SymmetricMigration), but the ENP and GOC populations were allowed to have different values of migration rate, with m_ENP->GOC measured as the fraction of individuals each generation in the GOC population that are new migrants from ENP, and vice versa for m_GOC->ENP
4.
Split-AsymmetricMigration-ENPChangeTw2: this model is based on model 3 (Split-AsymmetricMigration), but an ENP population size change event to N_ENP2 is introduced after population divergence, with a fixed T_W = 2 generations before present. This size change event after divergence is used to model the impact of whaling bottleneck.
5.
AncestralSizeChange-Split-AsymmetricMigration: this model is based on model 3 (Split-AsymmetricMigration), but an ancestral size change event from N_ANC to N_ANC2 that lasted for T_A generations was introduced before population divergence.
6.
AncestralSizeChange-Split-Isolation-AsymmetricMigration: this model is based on model 5 (AncestralSizeChange-Split-AsymmetricMigration), but after population divergence, an isolation period lasted for T_D, during which there is no migration between the ENP and GOC populations. Asymmetric migrations between two populations occurred T_C generations before present.
7.
AncestralSizeChange-Split-AsymmetricMigration-GOCChange: this model is based on model 5 (AncestralSizeChange-Split-AsymmetricMigration), but after population divergence, the GOC population remained at N_GOC for T_D generations. The GOC population then experienced a size change event from N_GOC to N_GOC2 that occurred T_C generations before present.

To evaluate if unsampled (ghost) populations contribute to the total migration into the GOC population, we included two feasible ghost populations into the selected two-population model, the South Pacific (SP), which diverged from the North Pacific ~1.8 Mya according to mtDNA data³¹; and the Western North Pacific (WNP) population, which has been suggested to breed separately from the ENP²⁷ potentially since the recent Pleistocene’s interglacial periods²³. For our demographic inference with ∂a∂i, we ran only one ghost model using the same initial parameters as in our chosen model. The initial parameter for the divergence time of ghost population was set at the expansion time in the ENP population 3Epoch model, and the size of the ghost population was fixed to the size of the ancestral population before divergence to find the best parameter space. In contrast, for fastsimcoal2 we constrained the lower and upper bounds for the divergence time of the ghost populations based on the previous knowledge mentioned above to 35,000 ~ 200,000 generations ago for the SP population and 100 ~ 10,000 generations ago for the WNP. We also fixed the size of the ghost populations to 30,000 haploids, approximately the same size of the ancestral population before the divergence.

Fastsimcoal

The coalescent simulation approach fastsimcoal2 was employed to infer parameters and composite likelihoods for the demographic models specified above. Each inference was performed using the Expectation‐Conditional Maximization (ECM) algorithm¹⁰², using 60 ECM cycles (-L 60), in which each E-step consisted of 1,000,000 coalescent trees (-n 1000000), computing only the SFS for the minor allele (-m) with the following command line.

fsc26 -t $header.tpl -e $header.est -n 1000000 -m -M -L 60 -q

The starting parameters were chosen from a uniform distribution with an imposed minimum value and flexible upper boundary. The expected SFS under the fastsimcoal2 model parameters were compared to the empirical SFS and the multinomial log-likelihood was calculated. For single-population and joint populations models, we performed 100 and 50 replicates of the inference, respectively, to confirm that both parameters and log-likelihoods converged and parameters with the maximum log-likelihood were chosen. This difference in the number of replicates is due to the inference of two-population model parameters being more computationally expensive and time-consuming. All estimated size parameters were obtained as the number of haploids and converted to diploids, whereas time parameters were inferred as the number of generations before present day. To control for inflations in log-likelihood estimates in models with more parameters, we performed a likelihood ratio test (LRT) for nested models with its more immediate complex model (e.g., 2Epoch vs. 1Epoch, 3Epoch vs. 2Epoch) using the equation: –2 * [loglikelihood (simple)–loglikelihood (complex)]. The LRT significance was evaluated with a chi-square test (iχ²) with one or two degrees of freedom, depending on the number of parameter differences between models.

The parameter confidence intervals were obtained using a parametric bootstrap⁴³ following the simulation functionality described in fastsimcoal2’s manual (http://cmpg.unibe.ch/software/fastsimcoal26/man/fastsimcoal26.pdf page. 56). For each model, we simulated 100 SNP-based SFS from the best-fit parameters in the observed data with ~4 million (3,927,079 for ENP single-population models, 3,908,444 for GOC single-population models and 3,864,185 for two-population models) non-recombining segments of 100 bp, mimicking the same number of observed sites. Parameters were estimated from 20 random starting conditions for the 100 bootstrapped SFS datasets using the same settings as described above for the empirical data. 95% confidence intervals of the best-fit parameters were obtained adding and subtracting two standard deviations of the 100 bootstrap estimated parameters from the empirical best-fit parameters.

∂a∂i

For demographic inference using ∂a∂i, haploid sample sizes plus 5, 15, and 25 were used as extrapolation grid points⁴⁴. Lower and upper bounds of model parameters were imposed based on prior knowledge of population history, and starting parameters under these boundaries were chosen from previous knowledge or outputs from nested runs and randomized with a fold=1. We used the optimize_log function as our optimization algorithm, and calculated the multinomial log-likelihood for the expected SFS obtained from each optimization.

Best‐fit parameter sets of each model were scaled using N_ANC calculated by the equation $\theta=4{N}_{{ANC}}\mu L$, where L is the total sequence length of the neutral region (392,707,916 bp for ENP single-population models, 390,844,414 bp for GOC single-population models and 386,418,461 bp for two-population models), μ is the fin whale mutation rate (2.77E-08 mutations/generation/bp)³⁷, and θ is the optimal value of theta for the given model. Population size parameters were adjusted by N_ANC into diploids and time parameters were re-scaled by 2N_ANC into generations. The model uncertainty was assessed by estimating 95% confidence intervals of the best-fit parameters using a Godambe Information Matrix (GIM) with bootstrapped data¹⁰³. The bootstrapped data was obtained by dividing the genome into fragments of 4 Mb and generating 100 bootstrap pseudo-replicate datasets by resampling from those, which in total amounts for sampling 400 Mb that approximates the length of the putatively neutral data analyzed in our demographic inferences.

One hundred replicates of each model were performed with randomized starting parameters to assess convergence of the inferred parameters and composite likelihood. Parameters with the maximum log-likelihood among replicates from each model were selected and the expected SFS under these parameters was compared with the empirical SFS. LRT was calculated as previously described.

Additionally, to ensure that the results from the ENP population 3-epoch model were in fact reflecting the recent bottleneck caused by whaling, we simulated the SFS under ∂a∂i’s inferred demographic scenario using msprime v.0.7.4¹⁰⁴. The simulated SFS were generated using a recombination rate of 1E-8 cross-over events per base pair per generation and a mutation rate of 2.77E-8 per base pair per generation³⁷, with 1000 replicates and a chunk size of 2 Mb. Visual inspection was performed to validate the fit of simulated SFS to the empirical data. We also performed ∂a∂i inference on msprime simulated SFS using the same settings for empirical SFS and tested if we could obtain similar parameter estimates as the empirical data to confirm that we had the power to detect a recent population contraction.

To account for the correlations of current population size (N_CUR) and time of most recent contraction (T_CUR), we carried out grid searches to find the range of possible parameter pairs that are within two log-likelihood units of the maximum likelihood estimate (M.L.E; see Supplemental Methods).

Model selection

We selected the models that more likely represent the demographic history of the populations from the demographic models without any constraints (i.e., not fixing any of the parameters to a certain value). To select the best demographic model, we considered several features of our demographic inference results. First, the log-likelihood of the models should be the highest given the satisfaction of the following criteria. Second, a good fit of the expected SFS to the empirical SFS. Third, the estimated parameter values between the two inference methods that we used (i.e., fastsimcoal2 and ∂a∂i) should be consistent, especially the direction of population size change (expansion vs contraction). Fourth, the log-likelihood of the top 10 replicated runs for each model should converge. We consider that a model has good convergence if the log-likelihood difference between the best run and the 10^th best run of the model was no more than 25 log-likelihood units. Fifth, the model should have significantly better LRT than the more immediate nested model and this LRT significance should be consistent in fastsimcoal2 and ∂a∂i. Sixth, the range of the confidence intervals should not be unrealistically large. Models meeting the above criteria, were chosen as the ones representing the demographic history of fin whale populations. For the ENP single-population model, after choosing the 3Epoch model according to the previous criteria, we tried to confirm the findings of this unconstrained model by running it with the parameter reflecting the time of the putative whaling bottleneck fixed at 2 and 3 generations. Results show that models with fixed parameters have better log-likelihoods and do not significantly change the parameter values obtained with the unconstraint model, indicating that the estimates of the unconstrained model are a good representation of the demographic history of this population. For the two-population models, we ran the Split-AsymmetricMigration-ENPChangeTw2 model with the time of the whaling bottleneck fixed at two generations, such model was not selected.

Quantifying putatively deleterious variation

Two lines of evidence were used to quantify relative levels of putatively deleterious variation in the ENP and GOC populations. We focused on mutations within protein-coding regions, which are more likely to have direct fitness impacts and identified derived alleles within four mutation types: synonymous, tolerated nonsynonymous, deleterious nonsynonymous, and LOF. The nonsynonymous mutations were classified as putatively tolerated (SIFT score ≥0.05) or deleterious (SIFT score <0.05) based on phylogenetic constraints using SIFT4G⁸⁴. The LOF mutations are predicted to eliminate or severely inhibit gene function and include splice acceptor, splice donor, start lost and stop gained mutations. LOF mutations were identified using the default settings in snpEff⁸³, which utilized the LOF definition in ref. ⁶⁹. We normalized for differences in missing data across individuals by the average number of called genotypes using R package vcfR v.1.12.0¹⁰⁵. Since the dominance for variants in natural populations is poorly quantified, we assumed two extreme scenarios: (1) when the dominance of all variants is recessive (h = 0) and the fitness is only reduced in homozygous derived genotypes; or (2) when variants are additive (h = 0.5) and the fitness decreases linearly to the number of derived alleles. The real-life fitness impact probably lies between these two scenarios. We did not assume dominant variants (0.5 < h ≤ 1) given that segregating deleterious variants are very unlikely to be dominant⁵¹.

First, two-tailed MWU tests were used to evaluate if the normalized count of derived alleles and homozygotes varied significantly between the ENP and GOC populations in these four mutation types⁴⁸. The count of derived putatively deleterious alleles, including the deleterious nonsynonymous and LOF alleles, are considered a proxy for additive genetic load, while the count of derived homozygotes provides a proxy for recessive load^106,107.

Second, we calculated the relative accumulation of mutations R_XY and homozygous mutations R²_XY for the four mutation types using methods adapted from ref. ⁵³. Here we designated the GOC population as population X and the ENP population as population Y. At each polymorphic site $i$, we defined ${d}_{X}^{i}$ as the count of derived alleles at that site in a sample of ${n}_{X}^{i}$ haploid genomes from population X and ${d}_{Y}^{i}$ as the count of derived alleles in a sample of ${n}_{Y}^{i}$ haploid genomes from population Y. The expected number of derived mutations observed only in population X but not in population Y is defined as:

$${L}_{X,{notY}}={\mathop{\sum }\limits_{i}}({d}_{X}^{i}/{n}_{X}^{i})(1-{d}_{Y}^{i}/{n}_{Y}^{i})$$

(I)

And the expected number of homozygous derived mutations observed only in $X$ but not in $Y$ is defined as:

$${L}_{X,{notY}}^{2}={\mathop{\sum }\limits_{i}}\left(1-\frac{2{d}_{X}^{i}({n}_{X}^{i}-{d}_{X}^{i})}{{n}_{X}^{i}({n}_{X}^{i}-1)}\right)\left(\frac{2{d}_{Y}^{i}({n}_{Y}^{i}-{d}_{Y}^{i})}{{n}_{Y}^{i}({n}_{Y}^{i}-1)}\right)$$

(II)

The ratio statistics is further defined as:

$${R}_{{XY}}={L}_{X,{notY}}/{L}_{Y,{notX}}$$

(III)

$${R}_{{XY}}^{2}={L}_{X,{notY}}^{2}/{L}_{Y,{notX}}^{2}$$

(IV)

The standard errors of R_XY and R²_XY were estimated from a weighted-block jackknife⁵³. If selection has been equally effective and mutation rates remain the same in both populations, the R_XY and R²_XY statistics are expected to be 1. Z score test was used to evaluate the significance of the deviation from the null expectation.

Lastly, we assessed the robustness of the four mutation types across the genome using an additional mutation impact scoring system implemented by snpEff. SnpEff classifies variants’ impact severity into HIGH, MODERATE, LOW and MODIFIER categories based on their effect types. We excluded the MODIFIER category because these mutations are mostly non-protein-coding. We additionally limited the MODERATE and LOW categories within the gtf identified coding sequence (CDS) region to exclude non-protein-coding mutations as well. Two-tailed MWU tests and ${R}_{{XY}}$ analyses were performed as described above to evaluate the variation in the count of derived alleles and homozygotes (Fig. S19). For all above analyses, we removed the six individuals that were also discarded in the demographic inference.

Genetic load simulations

We conducted forward-in-time population genetic simulations using SLiM v.3.3.2⁵⁴. For our simulations, we assumed a 10 Mb chromosomal segment with a uniform recombination rate of 1E-8 cross-over events per base pair per generation and randomly generated intergenic, intronic, and exonic regions, following ref. ¹⁰⁸. The length of the 10 Mb chromosomal segment was chosen as a tradeoff between computation efficiency and genomic representation. Within this chromosomal segment, mutations occurred at a rate of 2.77E-8 per base pair per generation³⁷, with deleterious (nonsynonymous) mutations occurring only in exonic regions at a ratio of 2.31:1 to neutral (synonymous) mutations¹⁰⁹. Selection coefficients for deleterious mutations were drawn from a distribution estimated from human data⁵⁵. We assumed an inverse relationship between selection coefficients and dominance coefficients, given empirical evidence that strongly deleterious mutations also tend to be highly recessive^51,110. Specifically, we assumed that strongly deleterious mutations (s < −0.01) were fully recessive (h = 0.0), moderately deleterious mutations (−0.01 ≤ s < −0.001) were partially recessive (h = 0.1), and weakly deleterious mutations (−0.001 <s ≤ −0.00001) were nearly additive (h = 0.4).

Using this simulation framework, we simulated under our two best-fit demographic models, including a single-population model for the ENP population, and a two-population divergence model for the ENP and GOC populations (see above for details). For both models, we assumed a burn-in duration of 10x the ancestral population size. During the simulation, we kept track of several quantities for each simulated population, including mean genetic load (the reduction in individual fitness, calculated multiplicatively across sites), mean genome-wide heterozygosity, mean inbreeding coefficient (here measured as F_ROH, where the minimum ROH length was 1 Mb), and the mean number of strongly deleterious alleles (s < −0.01), moderately deleterious alleles (−0.01 ≤ s < −0.001), and weakly deleterious alleles (−0.001 <s ≤ −0.00001) per individual. These quantities were estimated using a sample size of 40 individuals. For all simulations, we ran 25 replicates and averaged these quantities across replicates.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw sequence data generated in this study are deposited in NCBI’s Sequence Read Archive (SRA) database under accession numbers SRR23615109 - SRR23615158 (BioSample SAMN33439338 - SAMN33439387; BioProject PRJNA938516; see Table S1 for details). The sequence data for the additional mysticete species used in this study are available in NCBI’s SRA database under accession numbers SRR5665640, SRR1802584, SRR5665644, and SRR5665639, please see Table S1 for details. The cpg island data are available in the UCSC genome browser (http://hgdownload.soe.ucsc.edu/goldenPath/balAcu1/database/). The balenopterid genomes assemblies used for the comparison shown in Table S16 are available in NCBI’s Assembly database under accession numbers GCA_008795845.1, GCA_023338255.1, GCF_000493695.1, GCF_009873245.2, GCA_004329385.1, or in the DNA Zoo database under accession names Balaenoptera_physalus (https://dnazoo.s3.wasabisys.com/index.html?prefix=Balaenoptera_physalus/) and Balaenoptera_ricei (https://dnazoo.s3.wasabisys.com/index.html?prefix=Balaenoptera_ricei/). Source data are provided in this paper.

Code availability

The scripts used to perform the sequence data processing and analyses are publicly available in a GitHub repository that can be accessed through Zenodo¹¹¹ at https://doi.org/10.5281/zenodo.7980107.

References

Ceballos, G. & Ehrlich, P. R. Mammal population losses and the extinction crisis. Science 296, 904–907 (2002).
Article ADS CAS PubMed Google Scholar
Pimm, S. L. et al. The biodiversity of species and their rates of extinction, distribution, and protection. Science 344, 1246752–1246752 (2014).
Article CAS PubMed Google Scholar
Waters, C. N. et al. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science 351, aad2622–aad2622 (2016).
Article PubMed Google Scholar
Lande, R. Risks of population extinction from demographic and environmental stochasticity and random catastrophes. Am. Nat. 142, 911–927 (1993).
Article PubMed Google Scholar
Reed, D. H. & Frankham, R. Correlation between fitness and genetic diversity. Conserv. Biol. 17, 230–237 (2003).
Article Google Scholar
Melbourne, B. A. & Hastings, A. Extinction risk depends strongly on factors contributing to stochasticity. Nature 454, 100–103 (2008).
Article ADS CAS PubMed Google Scholar
Frankham, R. Genetics and extinction. Biol. Conserv. 126, 131–140 (2005).
Article Google Scholar
Willi, Y., Van Buskirk, J. & Hoffmann, A. A. Limits to the adaptive potential of small populations. Annu. Rev. Ecol. Evol. Syst. 37, 433–458 (2006).
Article Google Scholar
Wright,, S. Evolution in Mendelian populations. Genetics 16, 97 (1931).
Article CAS PubMed Google Scholar
Mills, L. S. & Allendorf, F. W. The one-migrant-per-generation rule in conservation and management. Conserv. Biol. 10, 1509–1518 (1996).
Article Google Scholar
Frankham, R. Genetic rescue of small inbred populations: meta-analysis reveals large and consistent benefits of gene flow. Mol. Ecol. 24, 2610–2618 (2015).
Article PubMed Google Scholar
Wang, J. Application of the one-migrant-per-generation rule to conservation and management. Conserv. Biol. 18, 332–343 (2004).
Article Google Scholar
Kyriazis, C. C., Wayne, R. K. & Lohmueller, K. E. Strongly deleterious mutations are a primary determinant of extinction risk due to inbreeding depression. Evol. Lett. 5, 33–47 (2021).
Article PubMed Google Scholar
Díez-del-Molino, D., Sánchez-Barreiro, F., Barnes, I., Gilbert, M. T. P. & Dalén, L. Quantifying temporal genomic erosion in endangered species. Trends Ecol. Evol. 33, 176–185 (2018).
Article PubMed Google Scholar
Springer, A. M. et al. Sequential megafaunal collapse in the North Pacific Ocean: an ongoing legacy of industrial whaling? PNAS 100, 12223–12228 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Clapham, P. J., Young, S. B. & Brownell, R. L. Jr. Baleen whales: conservation issues and the status of the mostendangered populations. Mammal. Rev. 29, 35–60 (1999).
Article Google Scholar
Baker, C. S. & Clapham, P. J. Modelling the past and future of whales and whaling. Trends Ecol. Evol. 19, 365–371 (2004).
Article Google Scholar
Jackson, J. A., Patenaude, N. J., Carroll, E. L. & Baker, C. S. How few whales were there after whaling? Inference from contemporary mtDNA diversity. Mol. Ecol. 17, 236–251 (2008).
Article CAS PubMed Google Scholar
Palsbøll, P. J., Peery, M. Z., Olsen, M. T., Beissinger, S. R. & Bérubé, M. Inferring recent historic abundance from current genetic diversity. Mol. Ecol. 22, 22–40 (2013).
Article PubMed Google Scholar
Beichman, A. C., Huerta-Sanchez, E. & Lohmueller, K. E. Using genomic data to infer historic population dynamics of nonmodel organisms. Annu. Rev. Ecol. Evol. Syst. 49, 433–456 (2018).
Beland, S. L., Frasier, B. A., Darling, J. D. & Frasier, T. R. Using pre- and postexploitation samples to assess the impact of commercial whaling on the genetic characteristics of eastern North Pacific gray and humpback whales and to compare methods used to infer historic demography. Mar. Mammal. Sci. 36, 398–420 (2020).
Article Google Scholar
Roman, J. & Palumbi, S. R. Whales before whaling in the North Atlantic. Science 301, 508–510 (2003).
Article ADS CAS PubMed Google Scholar
Alter, S. E., Rynes, E. & Palumbi, S. R. DNA evidence for historic population size and past ecosystem impacts of gray whales. Proc. Natl. Acad. Sci. USA 104, 15162–15167 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Ruegg, K. et al. Long-term population size of the North Atlantic humpback whale within the context of worldwide population structure. Conserv. Genet. 14, 103–114 (2013).
Article Google Scholar
Wolf, M., de Jong, M., Halldórsson, S. D., Árnason, Ú. & Janke, A. Genomic impact of whaling in North Atlantic Fin Whales. Mol. Biol. Evol. 39, msac094 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rocha, R. C., Clapham, P. J. & Ivashchenko, Y. V. Emptying the oceans: a summary of industrial Whaling catches in the 20th century. Mar. Fish. Rev. 76, 37–48 (2014).
Article Google Scholar
Mizroch, S. A., Rice, D. W., Zwiefelhofer, D., Waite, J. & Perryman, W. L. Distribution and movements of fin whales in the North Pacific Ocean. Mammal. Rev. 39, 193–227 (2009).
Article Google Scholar
Jiménez, M. E. L., Palacios, D. M., Legorreta, A. J., Urbán, J. R. & Mate, B. R. Fin whale movements in the Gulf of California, Mexico, from satellite telemetry. PLoS ONE 14, e0209324 (2019).
Article Google Scholar
Nigenda-Morales, S., Flores-Ramirez, S., Urban-R,, J. & Vazquez-Juarez, R. MHC DQB-1 polymorphism in the gulf of california fin whale (Balaenoptera physalus) population. J. Heredity 99, 14–21 (2008).
Article CAS Google Scholar
Rivera-León, V. E. et al. Long-term isolation at a low effective population size greatly reduced genetic diversity in Gulf of California fin whales. Sci. Rep. 9, 12391 (2019).
Article ADS PubMed PubMed Central Google Scholar
Pérez-Alvarez, M. J. et al. Contrasting phylogeographic patterns among Northern and Southern Hemisphere fin whale populations with new data from the Southern Pacific. Front. Mar. Sci. 8, 630233 (2021).
Moore, J. E. & Barlow, J. Bayesian state-space model of fin whale abundance trends from a 1991-2008 time series of line-transect surveys in the California Current: Bayesian trend analysis from line-transect data. J. Appl. Ecol. 48, 1195–1205 (2011).
Article Google Scholar
Rousset, F. Genetic structure and selection in subdivided populations. (Princeton University Press, 2004).
Bérubé, M., Urbán, J., Dizon, A. E., Brownell, R. L. & Palsbøll, P. J. Genetic identification of a small and highly isolated population of fin whales (Balaenoptera physalus) in the Sea of Cortez, México. Conserv. Genet. 3, 183–190 (2002).
Article Google Scholar
Morin, P. A. et al. Reference genome and demographic history of the most endangered marine mammal, the vaquita. Mol. Ecol. Resour. 21, 1008–1020 (2021).
Article CAS PubMed Google Scholar
Robinson, J. A. et al. The critically endangered vaquita is not doomed to extinction by inbreeding depression. Science 376, 635–639 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Yim, H.-S. et al. Minke whale genome and aquatic adaptation in cetaceans. Nat. Genet. 46, 88–92 (2014).
Article CAS PubMed Google Scholar
Árnason, Ú., Lammers, F., Kumar, V., Nilsson, M. A. & Janke, A. Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Sci. Adv. 4, eaap9873 (2018).
Article ADS PubMed PubMed Central Google Scholar
Narasimhan, V. et al. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 32, 1749–1751 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bertrand, A. R., Kadri, N. K., Flori, L., Gautier, M. & Druet, T. RZooRoH: an R package to characterize individual genomic autozygosity and identify homozygous‐by‐descent segments. Methods Ecol. Evol. 10, 860–866 (2019).
Article Google Scholar
Kirin, M., Mcquillan, R., Franklin, C. S., Campbell, H. & Mckeigue, P. M. Genomic runs of homozygosity record population history and consanguinity. PLoS One 5, 13996 (2010).
Article ADS Google Scholar
Browning, S. R. Estimation of pairwise identity by descent from dense genetic marker data in a population sample of haplotypes. Genetics 178, 2123–2132 (2008).
Article PubMed PubMed Central Google Scholar
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust demographic inference from genomic and SNP data. PLOS Genet. 9, e1003905 (2013).
Article PubMed PubMed Central Google Scholar
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
Article PubMed PubMed Central Google Scholar
Taylor, B. L., Chivers, S. J., Larese, J. & Perrin, W. F. Generation length and percent mature estimates for IUCN assessments of cetaceans. http://swfsc.noaa.gov/BarbTaylorPubs.aspx (2007).
McCoy, R. C., Garud, N. R., Kelley, J. L., Boggs, C. L. & Petrov, D. A. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population. Mol. Ecol. 23, 136–150 (2014).
Article CAS PubMed Google Scholar
Clark, P. U. et al. The last glacial maximum. Science 325, 710–714 (2009).
Article ADS CAS PubMed Google Scholar
Robinson, J. A. et al. Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction. Sci. Adv. 5, eaau0757 (2019).
Article ADS PubMed PubMed Central Google Scholar
Eyre-Walker, A. & Keightley, P. D. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8, 610–618 (2007).
Article CAS PubMed Google Scholar
Boyko, A. R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).
Article PubMed PubMed Central Google Scholar
Huber, C. D., Durvasula, A., Hancock, A. M. & Lohmueller, K. E. Gene expression drives the evolution of dominance. Nat. Commun. 9, 2750 (2018).
Article ADS PubMed PubMed Central Google Scholar
Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).
Article CAS PubMed PubMed Central Google Scholar
Do, R. et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat. Genet. 47, 126 (2015).
Article CAS PubMed PubMed Central Google Scholar
Haller, B. C. & Messer, P. W. SLiM 3: forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 36, 632–637 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kim, B. Y., Huber, C. D. & Lohmueller, K. E. Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples. Genetics 206, 345–361 (2017).
Article PubMed PubMed Central Google Scholar
Baker, C. S. et al. Abundant mitochondrial DNA variation and world-wide population structure in humpback whales. Proc. Natl. Acad. Sci. 90, 8239–8243 (1993).
Article ADS CAS PubMed PubMed Central Google Scholar
Nei, M., Maruyama, T. & Chakraborty, R. The bottleneck effect and genetic variability in populations. Evolution 29, 1–10 (1975).
Amos, B. Levels of genetic variability in cetacean populations have probably changed little as a result of human activities. Rep. Int. Whal. Comm. 46, 657–658 (1996).
Google Scholar
Brüniche-Olsen, A. et al. The inference of gray whale (Eschrichtius robustus) historical population attributes from whole-genome sequences. BMC Evol. Biol. 18, 87 (2018).
Article PubMed PubMed Central Google Scholar
Archer, F. I. et al. Mitogenomic phylogenetics of fin whales (Balaenoptera physalus spp.): genetic evidence for revision of subspecies. PLoS One 8, e63396 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Aguilar, A. & García-Vernet, R. Fin whale: balaenoptera physalus. in Encyclopedia of marine mammals 368–371 (Elsevier, 2018).
Essington, T. E. 5. Pelagic ecosystem response to a century of commercial fishing and whaling. In: Whales, Whaling, and Ocean Ecosystems (eds. et al.) 38–49 (University of California Press, 2007).
Hoffmann, A. A., Sgrò, C. M. & Kristensen, T. N. Revisiting adaptive potential, population size, and conservation. Trends Ecol. Evol. 32, 506–517 (2017).
Article PubMed Google Scholar
Slatkin, M. Gene flow and the geographic structure of natural populations. Science 236, 787–792 (1987).
Article ADS CAS PubMed Google Scholar
Hedrick, P. W. & Garcia-Dorado, A. Understanding inbreeding depression, purging, and genetic rescue. Trends Ecol. Evol. 31, 940–952 (2016).
Article PubMed Google Scholar
Xue, Y. et al. Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science 348, 242–245 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Grossen, C., Guillaume, F., Keller, L. F. & Croll, D. Purging of highly deleterious mutations through severe bottlenecks in Alpine ibex. Nat. Commun. 11, 1–12 (2020).
Article Google Scholar
Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 12, 628–640 (2011).
Article CAS PubMed Google Scholar
MacArthur, D. G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Johri, P., Charlesworth, B. & Jensen, J. D. Toward an evolutionarily appropriate null model: jointly inferring demography and purifying selection. Genetics 215, 173–192 (2020).
Article PubMed PubMed Central Google Scholar
Fossi, M. C. et al. Are baleen whales exposed to the threat of microplastics? A case study of the Mediterranean fin whale (Balaenoptera physalus). Mar. Pollut. Bull. 64, 2374–2379 (2012).
Article CAS PubMed Google Scholar
Shafer, A. B. et al. Genomics and the challenging translation into conservation practice. Trends Ecol. Evol. 30, 78–87 (2015).
Article PubMed Google Scholar
Lambertsen, R. H. A biopsy system for large whales and its use for cytogenetics. J. Mammal. 68, 443–445 (1987).
Article Google Scholar
Harlin, A. D., Würsig, B., Baker, C. S. & Markowitz, T. M. Skin swabbing for genetic analysis: application to dusky dolphins (Lagenorhynchus obscurus). Mar. Mammal. Sci. 15, 409–425 (1999).
Article Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11–10 (2013).
Google Scholar
Andrews, S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).
Article CAS PubMed Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinforma. (Oxf., Engl.) 22, 134–141 (2006).
CAS Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. http://www.repeatmasker.org (2013).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1 (2016).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article PubMed PubMed Central Google Scholar
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zheng, X. et al. SeqArray – a storage-efficient high-performance data format for WGS variant calls. Bioinformatics 33, 2251–2257 (2017).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
Hudson, R. R., Boos, D. D. & Kaplan, N. L. A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9, 138–151 (1992).
CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Article CAS PubMed Google Scholar
Yu, G., Smith, D., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Article Google Scholar
Pool, J. E. & Nielsen, R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181, 711–719 (2009).
Article PubMed PubMed Central Google Scholar
Thompson, E. A. Identity by descent: variation in meiosis, across genomes, and in populations. Genetics 194, 301–326 (2013).
Article CAS PubMed PubMed Central Google Scholar
Foote, A. D. et al. Runs of homozygosity in killer whale genomes provide a global record of demographic histories. Mol. Ecol. 30, 6162–6177 (2021).
Article PubMed Google Scholar
Dumont, B. L. & Payseur, B. A. Evolution of the genomic rate of recombination in mammals. Evolution 62, 276–294 (2008).
Article CAS PubMed Google Scholar
Han, E., Sinsheimer, J. S. & Novembre, J. Characterizing bias in population genetic inferences from low-coverage sequencing data. Mol. Biol. Evol. 31, 723–735 (2014).
Article PubMed Google Scholar
Blischak, P. D., Barker, M. S. & Gutenkunst, R. N. Inferring the demographic history of inbred species from genome-wide SNP frequency data. Mol. Biol. Evol. 37, 2124–2136 (2020).
Article CAS PubMed PubMed Central Google Scholar
Beichman, A. C. et al. Genomic analyses reveal range‐wide devastation of sea otter populations. Mol. Ecol. 32, 281–298 (2022).
Meng, X.-L. & Rubin, D. B. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993).
Article MathSciNet MATH Google Scholar
Coffman, A. J., Hsieh, P. H., Gravel, S. & Gutenkunst, R. N. Computationally efficient composite likelihood statistics for demographic inference. Mol. Biol. Evol. 33, 591–593 (2016).
Article CAS PubMed Google Scholar
Kelleher, J., Etheridge, A. M. & McVean, G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12, e1004842 (2016).
Article ADS PubMed PubMed Central Google Scholar
Knaus, B. J. & Grünwald, N. J. VCFR: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 17, 44–53 (2017).
Article CAS PubMed Google Scholar
Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Beichman, A. C. et al. Aquatic adaptation and depleted diversity: a deep dive into the genomes of the Sea Otter and Giant Otter. Mol. Biol. Evol. 36, 2631–2655 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mooney, J. A. et al. Understanding the hidden complexity of latin American population isolates. Am. J. Hum. Genet. 103, 707–726 (2018).
Article CAS PubMed PubMed Central Google Scholar
Huber, C. D., Kim, B. Y., Marsden, C. D. & Lohmueller, K. E. Determining the factors driving selective effects of new nonsynonymous mutations. Proc. Natl. Acad. Sci. 114, 4465–4470 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Agrawal, A. F. & Whitlock, M. C. Inferences about the distribution of dominance drawn from yeast gene knockout data. Genetics 187, 553–566 (2011).
Article CAS PubMed PubMed Central Google Scholar
Nigenda, S., Lin, M. & Nuñez-Valencia, P.G. snigenda/Fin_whale_Population_Genomics: V1.0. https://doi.org/10.5281/zenodo.7980107 (2023).
Vihtakari, M. ggOceanMaps: plot data on oceanographic maps using ‘ggplot2’ R package version 0.4.3. https://mikkovihtakari.github.io/ggOceanMaps (2020).
Amante, C. & Eakins, B. W. ETOPO1 1 Arc-minute global relief model: procedures, data sources and analysis. NOAA Tech. Memorandum NESDIS NGDC-24, 19 (2009).
Google Scholar
Grant, K. M. et al. Sea-level variability over five glacial cycles. Nat. Commun. 5, 5076 (2014).
Article ADS CAS PubMed Google Scholar

Download references

Acknowledgements

We dedicate this work to Robert K. Wayne, a pioneer in the field of conservation genetics and conservation biology. We would like to thank the members of the research program of marine mammals from Universidad Autónoma de Baja California Sur for their help collecting the samples in the Gulf of California. Sergio Flores-Ramírez for his initial support in the Gulf of California project. Cei Abreu-Goodger for his support and for providing laboratory space during the initial analysis of the data, Phil Morin for reviewing an early draft of the manuscript. Unpublished genome assemblies and sequencing data for the Rice’s whale and fin whale are used with permission from the DNA Zoo Consortium (dnazoo.org). For the fin whale DNAzoo assemblies, the sample for the assembly was collected by The Marine Mammal Center under the Marine Mammal Health and Stranding Program (MMHSPR) Permit No. 18786-04 issued by the National Marine Fisheries Service (NMFS) in accordance with the Marine Mammal Protection Act (MMPA) and Endangered Species Act (ESA). The work at DNA Zoo was performed under Marine Mammal Health and Stranding Response Program (MMHSRP) Permit No. 18786-03. This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by UCLA Institute for Digital Research and Education’s Research Technology Group. This work was supported by the Mexican National Council for Science and Technology (CONACYT) grant FONCICYT/50/2016, National Science Foundation (DEB Small Grant #1556705), UCMEXUS-CONACYT collaborative grant 2006. SNM was supported by CONACYT Postdoctoral Fellowship 724094 and the Mexican Secretariat of Agriculture and Rural Development Postdoctoral Fellowship. M.L. was supported by the University of California, Los Angeles Department of Ecology and Evolutionary Biology (EEB) Summer Research Fellowship. K.E.L. and C.K. were supported by NIH grant R35GM119856 to K.E.L. A.C.B. was supported by the Biological Mechanisms of Healthy Aging Training Program NIH T32AG066574. M.J.P.-A. was supported by ANID under Grant Program FONDECYT Iniciación 11170182. E.P. and M.J.P.-A. were supported by ANID Millennium Science Initiative Program ICN2021_002.

Author information

Sergio F. Nigenda-Morales
Present address: Department of Biological Sciences, California State University San Marcos, San Marcos, CA, 92096, USA
Meixi Lin
Present address: Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, 94305, USA
Aaron P. Ragsdale
Present address: Department of Integrative Biology, University of Wisconsin, Madison, WI, 53706, USA
These authors contributed equally: Sergio F. Nigenda-Morales, Meixi Lin.
Deceased: Robert K. Wayne.

Authors and Affiliations

Advanced Genomics Unit, National Laboratory of Genomics for Biodiversity (Langebio), Center for Research and Advanced Studies (Cinvestav), Irapuato, Guanajuato, 36824, Mexico
Sergio F. Nigenda-Morales, Paulina G. Nuñez-Valencia, Aaron P. Ragsdale & Andrés Moreno-Estrada
Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Meixi Lin, Christopher C. Kyriazis, Kirk E. Lohmueller & Robert K. Wayne
Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Morelos, México
Paulina G. Nuñez-Valencia
Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
Annabel C. Beichman
Institute for Human Genetics, University of California, San Francisco (UCSF), San Francisco, CA, 94143, USA
Jacqueline A. Robinson
Departamento de Ciencias Marinas y Costeras, Universidad Autónoma de Baja California Sur (UABCS), La Paz, Baja California Sur, Mexico
Jorge Urbán R. & Lorena Viloria-Gómora
Marine Mammal and Turtle Division, Southwest Fisheries Science Center, La Jolla, CA, 92037, USA
Frederick I. Archer
Escuela de Medicina Veterinaria, Facultad de Medicina y Ciencias de la Salud, Universidad Mayor, Santiago, Chile
María José Pérez-Álvarez
Millennium Institute Biodiversity of Antarctic and Subantarctic Ecosystems (BASE), Universidad de Chile, Santiago, Chile
María José Pérez-Álvarez & Elie Poulin
Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA, 90095, USA
Kirk E. Lohmueller
Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
Kirk E. Lohmueller

Authors

Sergio F. Nigenda-Morales
View author publications
You can also search for this author in PubMed Google Scholar
Meixi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Paulina G. Nuñez-Valencia
View author publications
You can also search for this author in PubMed Google Scholar
Christopher C. Kyriazis
View author publications
You can also search for this author in PubMed Google Scholar
Annabel C. Beichman
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline A. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Aaron P. Ragsdale
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Urbán R.
View author publications
You can also search for this author in PubMed Google Scholar
Frederick I. Archer
View author publications
You can also search for this author in PubMed Google Scholar
Lorena Viloria-Gómora
View author publications
You can also search for this author in PubMed Google Scholar
María José Pérez-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Elie Poulin
View author publications
You can also search for this author in PubMed Google Scholar
Kirk E. Lohmueller
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Moreno-Estrada
View author publications
You can also search for this author in PubMed Google Scholar
Robert K. Wayne
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.N.-M., A.M.-E., and R.K.W. conceived the study. A.M.-E. and R.K.W. contributed reagents, materials, and analysis tools. J.U.R., L.V.-G., and F.I.A. collected and contributed the samples and sample information. S.N.-M. carried out laboratory work. S.N.-M., M.L., P.N.-V., and C.K. performed the analysis of the data. A.C.B., J.A.R. and A.R. provided scripts for some analyses. A.M.-E., A.C.B., J.A.R., A.R., M.J.P.-A., E.P., and K.E.L. provided guidance and advised the project. A.M.-E., K.E.L., and R.K.W. performed funding acquisition. S.N.-M., M.L., P.N.-V., CC.K., and R.K.W. wrote the manuscript with input from all the authors.

Corresponding authors

Correspondence to Sergio F. Nigenda-Morales, Meixi Lin, Kirk E. Lohmueller or Andrés Moreno-Estrada.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Carlos Carreras and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nigenda-Morales, S.F., Lin, M., Nuñez-Valencia, P.G. et al. The genomic footprint of whaling and isolation in fin whale populations. Nat Commun 14, 5465 (2023). https://doi.org/10.1038/s41467-023-40052-z

Download citation

Received: 08 June 2022
Accepted: 10 July 2023
Published: 12 September 2023
DOI: https://doi.org/10.1038/s41467-023-40052-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.