The genomic footprint of whaling and isolation in fin whale populations

Twentieth century industrial whaling pushed several species to the brink of extinction, with fin whales being the most impacted. However, a small, resident population in the Gulf of California was not targeted by whaling. Here, we analyzed 50 whole-genomes from the Eastern North Pacific (ENP) and Gulf of California (GOC) fin whale populations to investigate their demographic history and the genomic effects of natural and human-induced bottlenecks. We show that the two populations diverged ~16,000 years ago, after which the ENP population expanded and then suffered a 99% reduction in effective size during the whaling period. In contrast, the GOC population remained small and isolated, receiving less than one migrant per generation. However, this low level of migration has been crucial for maintaining its viability. Our study exposes the severity of whaling, emphasizes the importance of migration, and demonstrates the use of genome-based analyses and simulations to inform conservation strategies.


Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. No humans were used in this study.
The research sample consist of 50 fin whale (Balaenoptera physalus) individuals from two different populations. For all this individuals biopsy punches were taken, DNA was extracted and whole genome resequencing was performed. These samples were collected between 1995 and 2017. We aimed to sample at least 20 individuals per population. We were able to sample 30 individuals from the Eastern North Pacific population and 20 for the Gulf of California population. The rationale behind selecting these populations for our research is that the Eastern North Pacific population was severely depleted by whaling activities during the 20th century, whereas the Gulf of California population has been small and isolated for several generations. These differences in demographic history will allow us to explore the genomic consequences of different types of population reductions. The geographic distribution of the samples obtained in the Eastern North Pacific population is as follows: California (9), Oregon (4), Washington (2), British Columbia (3) and Alaska (12). The geographic distribution for the samples obtained in the Gulf of California is as follows: Bahía de La Paz (3), Bahía de Loreto (6), Bahía de los Angeles (5) We sampled 20 to 30 individuals from each population because this should be enough to reach statistically significant results, specially when whole genome data at relatively high coverage (27x) is analyzed. The determination of age and sex were not relevant for our study. However, we have sex information, determine by PCR amplification, for the 30 Eastern North Pacific individuals. Of those, 50% were males and 50% females. Since a high-quality reference genome for the fin whale does not exist, we used the Minke whale genome as reference. This genome is deposited in the NCBI database (BalAcu1.0, Assembly GCF_000493695.1). For some genomic variation comparative analysis, we used existing whole genome sequence data of four baleen whale species. The data was downloaded from NCBI's Sequence Read Archive (SRA) database. The accession numbers of these sequences data are shown (in parenthesis): Eubalaena glacialis (SRR5665640), Balaenoptera acutorostrata (SRR1802584), Balaenoptera musculus (SRR5665644), Megaptera novaeangliae (SRR5665639).
The tissues used in this project are small skin biopsies that were collected at sea from a small boat using stainless steel biopsy darts. Briefly, once a whale was observed the boat approached slowly and no closer than 20 meters (22 yards). When the whale emerged to breath the stainless-steel modified dart was deployed using a biopsy rifle or crossbow to take the skin sample of the dorsal part of the animal close to the dorsal fin. Once the dart is observed to be on target the boat waited for the whale to go away and then approached to retrieve the biopsy dart that was floating. Then, the biopsy is preserved in an ethanol solution at 80 -90% of concentration. The biopsy dimensions are usually 4 millimeters of diameter and 3 centimeters long. Before deployment the stainlesssteal biopsy dart was sterilized using ethanol with a concentration of 90%. This protocol only takes a small skin sample and the animal is not harassed for long periods of time.
No statistical methods were used to predetermine sample size. The sample size was selected before analysis was begun based on available samples and budgetary constraints for sequencing. We sought to include 20 samples per population, which based on the literature of non-model organisms might be sufficient to obtain power for statistical comparisons.
The 50 fin whales' tissue samples used in this study were previously collected during field work for research on other projects. The tissue samples were obtained following a standard protocol to obtain biopsies from free-ranging cetacean species using a biopsy riffle and a stainless-steel modified dart. The authors S. F. Nigenda-Morales and A. C. Beichman coordinated for DNA extractions and library preparations. DNA extraction was performed using the QIAGEN QIAamp DNA Mini Kit (Qiagen; California, USA). The genomic libraries were prepared from extracted DNA using the Illumina TruSeq DNA PCR-free standard kit (Illumina; California, USA) following the manufacturer instructions. Whole genome sequencing was performed using the 150-bp paired-end protocol on Illumina HiSeqX or NovaSeq6000 platforms. Library preparation and sequencing were performed in Fulgent genetics' sequencing core facility (Fulgent genetics LLC; California, USA). The authors S. F. Nigenda-Morales and M. Lin recorded the sequencing data. The authors J. Urbán R. and L. Viloria-Gómora recorded the collection years and locations for the Gulf of California samples. The author F. I. Archer recorded the collection years and locations for the Eastern North Pacific samples.
As stated above, the samples utilized in this study have been compiled from multiple field trips occurring in the US and Mexico. For the timing, all samples used in this study were collected from 1995 to 2017. The frequency and periodicity of sampling is not applicable to our study since genomic resequencing data does not vary with fine scale collection time. We chose this timing of collections to only include individuals that were collected after the whaling moratoriums took effect in the 1980s. Whenever data were excluded, we describe the exclusions and the rationale in the text. Specifically, we performed data exclusions in three settings. First, for the exclusion of low quality genotype calls, we used pre-established exclusion criteria recommended by the GATK best practice guideline (https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows). We performed a stringent set of quality and depth filters for the genotype calls, keeping only high-quality biallelic SNPs and monomorphic genotypes. For each individual, only genotypes with a minimum depth of eight reads and maximum depth of 2.5x mean depth; a minimum Phred score of 20 and expected allele balance (  0.9 for homozygous reference genotypes;  0.2 &  0.8 for heterozygous genotypes and  0.1 for homozygous alternative genotypes) were kept. Each site was then filtered using the following criteria. Sites that 1) failed GATK

March 2021
Reproducibility Randomization Blinding Did the study involve field work?
We have taken extensive measures to verify the reproducibility of our findings. In our study design, we aim to sample adequately within the two populations with at least 20 individuals per population and achieve higher sequence coverage (at least 20X, the actual mean coverage was 27X). In empirical analyses at the individual's level (such as the population structure, genome-wide pattern of variation, runs of homozygosity and patterns of deleterious variation), the findings observed within each population are consistent across the individuals, serving as a confirmation of the reproducibility in individual's patterns. In simulations, we ran 25 replicates of each demographic scenarios tested, the observed patterns were consistent across the replicates, confirming the reproducibility. In our analyses, we perform the same analyses using different and well established softwares whenever possible to reproduce the findings regardless of the softwares used. For example, in the runs of homozygosity analyses, we used both bcftools and RZooRoH. In the demographic inference, we tested different models employing both coalescent (fastsimcoal2) and diffusion approximation (ai) methods. In the identification of putatively deleterious mutations, we employed two mutation impact scoring system implemented by snpEff or SIFT. The results across softwares are always reproducible.
In the demographic analyses, we included additional reproducibility tests including performing additional inference runs varying the time for the whaling reduction, using different optimization methods and performing coalescent SFS simulations to confirm our power to detect such recent decline in the single population Eastern North Pacific demographic model. All attempts to repeat the experiments as noted above were successful. The data analysis code is publicly available here: https:// doi.org/10.5281/zenodo.7980107. We will also make the raw data and important derived data necessary to reproduce the results publicly available upon acceptance. During review period, these data are available upon request.
The individuals collected from the Eastern North Pacific and the Gulf of California were randomly subsetted from the available tissue collection archives. No future randomization is necessary given the nature and scope of population genomics data.
Blinding during our data collection was relative because sampling was opportunistic due to the biology and behavior of free-ranging cetaceans. Although sampling efforts are made in areas where fin whales are know to be present, the encounters with fin whales individuals occur randomly. Therefore, there is blinding because it is unknown if any samples will be collected any given day and if samples are collected, the sampled individuals are unknown. The library preparation and DNA sequencing were performed in Fulgent genetics' sequencing core facility (Fulgent genetics LLC; California, USA). Fulgent is blind to the individual's geographical origin or previous genetic knowledge. All the rest of the genomic populations analyses performed do not require blinding because the results are not affected by knowing the identity of the analyzed individuals.
The fin whale samples used for whole genome resequencing were collected throughout a 22 year time period. Environmental conditions were not relevant for our study and are therefore not reported here.
The locations (latitude and longitude) at which we obtained the samples were: