Abstract
Parallel evolution provides strong evidence of adaptation by natural selection due to local environmental variation. Yet, the chronology, and mode of the process of parallel evolution remains debated. Here, we harness the temporal resolution of paleogenomics to address these long-standing questions, by comparing genomes originating from the mid-Holocene (8610-5626 years before present, BP) to contemporary pairs of coastal-pelagic ecotypes of bottlenose dolphin. We find that the affinity of ancient samples to coastal populations increases as the age of the samples decreases. We assess the youngest genome (5626 years BP) at sites previously inferred to be under parallel selection to coastal habitats and find it contained coastal-associated genotypes. Thus, coastal-associated variants rose to detectable frequencies close to the emergence of coastal habitat. Admixture graph analyses reveal a reticulate evolutionary history between pelagic and coastal populations, sharing standing genetic variation that facilitated rapid adaptation to newly emerged coastal habitats.
Similar content being viewed by others
Introduction
Parallel adaptation can arise from selection repeatedly acting upon standing genetic variation1,2. However, the timing and independence of selection is still poorly understood for most natural study systems3,4. For example, selection can act upon standing genetic variation independently in multiple derived populations, representing parallel adaptation. It can also act upon standing genetic variation in the ancestral population(s), which then splits into different populations, producing a pattern that can be wrongly inferred as parallel adaptation. Alternatively, standing genetic variation can be shared among derived populations through gene flow, before (i.e. parallel) or after selection (i.e. not parallel)5,6,7,8,9.
In this study, we investigate the temporal dynamics and independence of adaptation to coastal habitat from standing genetic variation in common bottlenose dolphins (Tursiops truncatus) derived from pelagic ancestors using contemporary and ancient (8610–5626 years before present (BP)) genomes. Coastal and pelagic ecotypes of bottlenose dolphins have recurrently formed in different regions of the world10,11,12,13. Coastal populations are thought to have been founded by pelagic individuals, and show lower genetic diversity and smaller effective population size than pelagic populations10,11,12,13,14,15. Morphological differences are found between the two ecotypes but are not consistent across geographical regions12,13,16,17. The coastal ecotype shares behavioural traits across its range such as restricted dispersal, high site fidelity and socially transmitted foraging techniques10,18. Single nucleotide polymorphisms (SNPs), also called variants hereafter, were found to be under parallel selection in geographically distant coastal habitats, that is under both homogenising selection among coastal populations from different ocean basins and divergent selection between coastal and pelagic ecotypes within each ocean basin. We refer to this process as parallel linked selection, as these previous analyses may have identified the targets of selection and/or loci physically linked to these targets19. The coastal-associated variants are found mainly in ancient ancestry tracts in the genomes of coastal dolphins; these genomic regions have a time to the most recent common ancestor between coastal and pelagic counterparts of 0.6 to 2.3 million years BP instead of the genome-wide mean of 0.1 to 0.4 million years BP. Coastal-associated variants are present at low frequency as standing variation in pelagic populations19. However, the mode and chronology of parallel linked selection on coastal-associated genetic variation in bottlenose dolphins remain unresolved. Here, we explore potential scenarios, incorporating ancient genomes dating from the estimated time of formation of local coastal populations12,20.
Four subfossil samples dredged from the southern North Sea bed in the eastern North Atlantic (ENA) and curated at the Natural History Museum Rotterdam, were radiocarbon-dated to 5979–5626 calendar years BP for the youngest sample (SP1060, 95% CI) and 8610–8243 years BP for the oldest sample (NMR10326, Supplementary Table 1, Fig. 1a, b and Supplementary Data 1). These ages fall within the 95% CI range of the estimated split time of pelagic and coastal ecotypes in the ENA (47,800–4300 years BP12) and the emergence of coastal habitat in northern European waters20. This coastal habitat emerged when Doggerland was submerged by the North Sea around 8000 to 6000 years BP following post-glacial sea-level rise21. During this warmer period of the early to mid-Holocene, subfossil evidence from the region confirms common bottlenose dolphin occurrence in the North Sea22,23. It is thought this species entered the North Sea through the English Channel after it opened up following deglaciation22. The emergence of new coastal habitats, with different abiotic and biotic environmental conditions such as salinity, depth and prey species, may have affected the physiology, foraging strategies and behaviour of bottlenose dolphins and created opportunities for local adaptation to occur.
In this study, to understand the importance of these four early- to mid-Holocene ancient samples in the chronology of coastal adaptation, we first established their relationship to contemporary dolphins. We sequenced an ancient dolphin genome, SP1060, at 3x effective coverage (i.e. post quality control filtering, repeat masking, removing duplicates, base quality recalibration), to compare with a dataset of 60 contemporary genomes19. We sequenced an additional three ancient dolphin genomes at ultra-low effective coverage (<0.05x; Supplementary Table 1) to verify whether SP1060 was representative of the genetic variation within the ancient population, rather than, for example, a rare admixed individual. We show that parallel adaptation occurred rapidly after the emergence of new coastal habitats. We also provide insights into how past admixture retained coastal-adapted ancestry at low frequency in pelagic populations, which selection can then act upon to promote rapid adaptation.
Results and discussion
Relationship between ancient and contemporary dolphin genomes
To explore the relationship between the ancient subfossils and contemporary individuals, we first looked at mitochondrial ancestry. The mitogenome sequence of the oldest ancient sample NMR10326 (8610–8243 years BP, Fig. 1b) clustered with contemporary North Atlantic pelagic haplotypes, while the mitogenome sequences of the three other ancient samples clustered with contemporary Mediterranean and the Black Sea samples in a Bayesian phylogenetic analysis (Supplementary Fig. 2). Then, with the nuclear data, we used principal component analysis24,25 (PCA) and a factor analysis26 (FA) method that can incorporate sample age, and therefore allows us to correct for temporal genetic drift when comparing populations26. Given the differences in depth of coverage and the potential biases inherent in mapping and comparing ancient and contemporary genomes27, we ran the multivariate analyses using several approaches: (i) comparing pseudo-haploid genomes generated by sampling a random single base for all genomic positions and using a PC projection of the ancient genomes on to the principal components segregating the contemporary genomes; (ii) sampling a single read for each site with no projection but including sites covered in all contemporary and at least one ancient sample, and (iii) comparing called genotypes for the contemporary individuals and SP1060 at sites with no missing data in SP1060 and both a projection and factorial analysis approaches. Data processing and filtering followed Louis et al.19 with some modifications specific to ancient DNA data such as recalibration of base quality scores according to DNA damage patterns, which are detailed in the Methods section.
Due to the fragmented and damaged nature of ancient DNA, reads with non-reference alleles are less likely to be mapped than those with reference alleles, creating reference bias. To reduce reference bias, we ran the multivariate analyses with the genomic data mapped to two references genomes: (i) the common bottlenose dolphin reference genome (GenBank: GCA_001922835.1), which is a coastal western North Atlantic (WNA) individual, with relaxed parameters in BWA28,29 and (ii) the killer whale reference genome (GenBank GCA_000331955.2)30. Using relaxed parameters or mapping to another species should help to include a better representation of alternate alleles. We also ran all analyses both including and excluding transitions, as C-T transitions are in excess at molecule ends of the SP1060 sequence reads, reflecting post-mortem deamination damage (see details in the Methods section). Additionally, we down-sampled one contemporary individual from the ENA coastal population to 0.03x to evaluate whether the lower-coverage ancient samples could be pulled towards the middle of the PCAs due to large amounts of missing data. This ENA coastal individual clustered with the other ENA coastal individuals after downsampling in the PC projection (Supplementary Fig. 3a) and single-read sampling approaches (Supplementary Fig. 3b), indicating differences in coverage are not responsible for the observed patterns of variation.
As per Louis et al.19, the major axis of differentiation is between the Pacific and the Atlantic populations (PC 1, Fig. 1c). Independent genetic drift in each coastal population drives this pattern, while pelagic populations from both oceanic regions cluster in the centre of the PCA. Contemporary coastal samples from other locations in the ENA (northern France, Ireland and West Scotland) form a cline from the pelagic populations to the East Scotland samples. This could be consistent with a northwards range expansion of ENA coastal populations20. The second axis of differentiation separated the two Atlantic coastal populations.
Along PC 1, we observe affinity towards the two North Atlantic coastal populations for SP1060 and NMR2273, and to a lesser extent for NMR10151. The ancient samples cluster with the pelagic samples along PC 2 (Fig. 1c). We observe similar results in all analyses, regardless of filtering and mapping strategy (Fig. 1c, Supplementary Figs. 4–8, Supplementary Notes). However, we note some reference bias for the ultra-low coverage samples (Supplementary Figs. 6, 8). The oldest genome (NMR10326, 8610–8243 years BP) clusters with the pelagic populations when mapped to the killer whale reference genome (Fig. 1c), while it does not do so when mapping to the bottlenose dolphin reference genome (Supplementary Figs. 6, 8), likely due to reference bias associated with ancient DNA giving NMR10326 closer affinity on the PCA to the WNA coastal population where the reference genome is from. NMR10326 is however still the closest ancient sample to the pelagic populations (Supplementary Figs. 6, 8). The clustering of genomic data in PCAs reflects shared mean coalescence times and Identity by State31. The position of the ancient genomes suggests that the genetic affinity of ancient samples to North Atlantic coastal populations increases as the age of the samples decreases.
Having established that SP1060 is broadly representative of our ancient samples, we focus on SP1060 in the rest of our analyses. We estimated the shared ancestry of SP1060 with contemporary individuals using a factor analysis, which takes genetic drift into account26. SP1060 shares the highest ancestry with the ENA coastal population (ENAc, 43%), followed by the WNA coastal (WNAc, 32%) and ENA pelagic (ENAp, 25%) populations (Fig. 1d). Clearly, the inferred ancestry proportions do not represent an admixed ancestry composed of multiple contemporary populations, as SP1060 was alive at approximately the time of their divergence32. Rather, they reflect ancestral genetic variation in SP1060, which later segregated in the different populations26,32.
The PCA and ancestry results were further confirmed by the sharing of derived alleles identified using D-statistic tests33 of the form D(H1,H2; SP1060, Orca), with H1 and H2 being two contemporary dolphins. The ancient sample shares a significant excess of derived alleles with both ENA and WNA coastal populations, compared with all other in-group populations (Supplementary Fig. 10a). The value of the statistic D(ENAc, WNAc; SP1060, Orca) is significantly negative (Z-scores of −6.9 to −8.8), indicating that SP1060 is more closely related to the ENA coastal dolphins than to the WNA coastal dolphins. Accordingly, SP1060 shares a higher excess of derived alleles with the ENA coastal population than with the WNA coastal population, in statistics of the form D(coastal, pelagic; SP1060, Orca) as shown by a stronger non-zero D-statistic when the coastal individual is from the ENA than from the WNA.
Having established the broad relationship of ancient samples to contemporary populations, in terms of shared ancestry, we next sought to reconstruct evolutionary history through time as an admixture graph34. Testing across all possible histories, we find one admixture graph with no outlier f-statistics (i.e. all |Z| were <3, the greatest deviation from 0 was −0.96) using qpBrute (Fig. 2), which explores the space of all possible admixture graphs of a given maximum complexity, under a brute-force approach35,36. Graphs were estimated using pseudo-haploid data mapped to the killer whale reference genome and using the killer whale as an outgroup. We find similar topologies when using called genotypes for the contemporary populations only and mapping to the bottlenose dolphin reference genome, indicating that the approach based on pseudo-haploid data is robust (Supplementary Fig. 11).
The best-fitting graph reveals a basal split between the lineage that gave rise to the contemporary North Atlantic coastal populations and SP1060, and the lineage that gave rise to the majority of ancestry in contemporary pelagic populations (Fig. 2). The results indicate that WNA coastal, which has recently been described as its own species, Tursiops erebennus16, and SP1060 are independent lineages. The ENA coastal is depicted as a clade whose ancestry is a mixture of the two ancestral groups leading to SP1060 and WNA coastal. Ancient sample SP1060 is therefore not a direct ancestor of the contemporary ENA coastal dolphins. The ancestry of both North Atlantic contemporary pelagic populations appears to be an admixture of ~30% of the lineage giving rise to the coastal populations, and ~70% from a deeply divergent lineage. Thus, the results provide a useful visualisation of how coastal-associated alleles may have been reintroduced into pelagic populations. The branches leading to the pelagic populations have null or small drift values, consistent with pelagic populations having large ancestral effective population sizes or showing little genetic structure, and indicating minimal drift from a shared ancestral population, consistent with previous demographic inference19,31,37. Those inferences showed that pelagic populations in different oceanic regions are closer to each other than to their parapatric coastal populations. They also showed that effective population sizes are larger and more stable through time in pelagic than in coastal populations19.
Patterns of selection to coastal habitat in the ancient individual
We have shown that due to its relationship with contemporary populations, the ancient sample SP1060 can provide a unique temporal resolution to the chronology and pace of genetic changes associated with the colonisation of coastal habitat in the eastern North Atlantic. Thus, we investigated and compared patterns of variation in SP1060 and contemporary populations at sites previously identified as evolving under parallel linked selection19. Coastal-associated genotypes at sites inferred as evolving under parallel linked selection in coastal populations19 are also found in the ancient sample SP1060 (Fig. 3a–c and Supplementary Fig. 12). This observation informs us about the speed of local adaptation. We observe covariance in alleles among contemporary coastal individuals within a region, but not between contemporary coastal and pelagic individuals, consistent with drift from the pelagic populations in the coastal contemporary individuals. However, based on their position in the PCA space, SP1060 and the other ancient samples do not share all the drift (and consequent covariance of alleles) experienced by the coastal populations (Fig. 1c). Yet, the genome of SP1060 shares the coastal-associated variation inferred to have evolved under parallel linked selection in the coastal populations19 (Fig. 3a, b). The ancient genome shows excess heterozygosity as found in coastal individuals19, despite its lower coverage (Fig. 3c and Supplementary Figs. 12,13). SP1060 is dated to 5979–5626 years BP, close or shortly after the colonisation time of coastal waters by bottlenose dolphins in the North Sea as indicated by genetic studies12,20, the subfossil record22,23 and coastal marine habitat emergence around 7–6000 years BP21. The fossil record shows that the North Sea was mostly an Arctic environment during the deglaciation, and that temperate marine mammal species appeared around 8–7000 years BP22,23. Palaeogeographic models indicate that the Southern North Sea was progressively submerged between 9000 and 6000 years BP and transitioned from salt marshes to marine waters21. This suggests these environmental changes resulted in the selection of standing genetic variation with rapid changes in population allele frequencies in newly founded coastal populations.
Combining contemporary and ancient samples, we also provide insight into the independence of parallel linked selection in coastal bottlenose dolphins from standing genetic variation present in pelagic populations. Selection would be considered independent if it acted upon standing genetic variation in each derived population (scenario i, Fig. 3d)4, meaning selection would act along branches that are not shared among the two North Atlantic coastal populations and SP1060 in Fig. 2. It would also be independent if adaptive standing genetic variation was shared before independent selection among coastal populations through gene flow (scenario ii). Selection would not be independent if selection acted upon standing genetic variation in a shared ancestral population (scenario iii)4, meaning along branches shared among the coastal populations4.
To identify which of these three scenarios best fits our data, we compared patterns of variation (Fig. 3c and Supplementary Figs. 12,13) and the neighbour-joining tree (Fig. 3b) obtained for our populations at the sites under parallel linked selection with the predictions under each scenario made by Lee and Coop (2019)4. For the SNPs under parallel linked selection, the coastal individuals cluster by population, but with inter-individual variation19. SP1060 clusters most closely with two individuals of the ENA coastal population (Fig. 3a), but diverges close to the basal node of all the Atlantic coastal populations (Fig. 3b). Such clustering of individuals by population is the pattern expected under scenario i), which describes an independent selection from standing genetic variation in each population, as it will generate only partially shared haplotypes. In contrast, scenarios (ii) and (iii) would have generated shared haplotypes among populations, and individuals from the coastal populations would be more mixed in the tree. While the Lee and Coop approach is based on trees for each region under selection, we based our analysis on different regions of the genome, thus assuming that the different regions fit the same selection scenarios. We compared the generated tree with ten trees generated using the same number of randomly sampled neutral SNPs. These latter trees do not show clustering by ecotypes nor strong differences in branch length between coastal and pelagic individuals in contrast to the tree based on the sites under selection (Supplementary Fig. 14). However, we acknowledge that the random sampling of SNPs across the genome may bias towards reconstructing the population consensus tree. Similarly, the topology of the concatenated coastal-associated SNPs may not capture variation in local topologies which are informative of the mode of adaptation. We could not investigate haplotypic differences as in Lee and Coop (2019)4 due to the lower coverage of the ancient sample, which would impact phasing.
We observe long branches in the neighbour-joining tree for all the coastal individuals (Fig. 3b). We previously identified these regions as representing old variants (0.6 to 2.3 million years); based upon the excess of mutations that had accumulated within them, compared with the genome-wide average19. We find coastal-associated variants in the genome of SP1060, despite it not sharing all the drift experienced by the ENA coastal population after divergence from the pelagic population (Fig. 1c). Thus, we hypothesise there was an abundance of coastal-associated standing genetic variation in the ancestral source population; for example, through a large influx of standing genetic variation from coastal populations in refugia, such as the Mediterranean Sea. Habitat models infer the Mediterranean Sea to have been a suitable habitat for bottlenose dolphins during the LGM20.
Although paleogenomics has been used to understand patterns of adaptation in humans38 and domesticated species39, it has so far mainly been used to understand demographic history in non-model wild species40,41,42 and only in one study to investigate the temporal dynamics of parallel evolution43. In our study, we harness the power of paleogenomics to disentangle the mode of parallel selection through direct observations of the chronology and genetic changes associated with the formation of coastal ecotype populations of bottlenose dolphins. Using subfossil samples pre-dating much of the drift experienced by coastal populations enables us to differentiate drift from coastal adaptation. Overall, our results suggest that coastal variants represent balanced polymorphisms sensu ref. 44 that were rapidly and repeatedly sieved from standing variation by ecological selection. We thereby provide rare direct evidence of rapid adaptation to newly emerged habitats from standing genetic variation. Our study contributes to the debates of the extent to which parallel evolution is common or rare, and to the role of standing genetic variation in favouring local parallel adaptation4,45. We show that paleogenomics can help solve these debates, and our study can be used as a roadmap for future studies of parallel evolution.
Methods
Ethics
We confirm our research complies with all relevant ethical regulations and was approved by the animal ethics committee of the School of Biology at the University of St Andrews on 26 July 2018 https://www.st-andrews.ac.uk/research/environment/committees/awerb/. The three new contemporary dolphin samples analysed in this study were collected under the relevant permits of each country. Biopsy sampling was carried out under licence from the National Parks and Wildlife Service in Ireland and under licence from the Direction Regionale de l’Environnement, de l’Amenagement et du Logement in France.
Samples
Ancient sample collection and geological analyses
The four common bottlenose dolphin subfossil samples were dredged from the southern part of the North Sea (i.e. Southern Bight and Smiths Knoll) by commercial trawlers, and were stored at the Natural History Museum of Rotterdam (Supplementary Table 1). We collected bone powder in a sterile environment from the four specimens following the Natural History Museum of Rotterdam guidelines. Radiocarbon dating was performed in previous studies at the Klaus-Tschira-AMS facility for SP106046 and at the University of Groningen for NMR2273 and NMR1032647. For NMR10151, we performed radiocarbon dating at the University of Oxford. We re-calibrated the age of all the samples using the marine20 correction48 in CALIB 8.249 applying a ΔR (i.e. localised reservoir correction) of 8 ± 38 as estimated for odontocete bones in Norway50.
Contemporary sample collection
We used whole-genome re-sequencing data from 57 bottlenose dolphins19 and generated new data from three coastal individuals from the eastern North Atlantic. Two individuals were sampled using biopsy sampling in Northern France (Normandy) and Ireland (Shannon estuary). The third sample was collected from a stranded animal in West Scotland. Ecotype was identified previously using microsatellite markers51. Samples were either frozen or stored in 95% ethanol.
Laboratory work
Ancient DNA labwork
Ancient sample SP1060
Laboratory work was conducted in a dedicated ancient DNA facility at the Max Planck Institute for Evolutionary Anthropology in Leipzig for sample SP1060 (Supplementary Table 1). The sample was treated with 0.5% bleach before extraction. Extraction and single-strand library preparation of the ancient sample are detailed in ref. 52. We sequenced the sample on two lanes of HiSeq 2500 with the 80 bp SE technology at the Danish National High-throughput Sequencing Centre of Copenhagen University.
Ancient samples NMR2273, NMR10326 and NMR10151
We processed the three additional samples (Supplementary Table 1) at a dedicated ancient DNA facility at the Centre for GeoGenetics, University of Copenhagen. We extracted DNA from around 50 mg of bone powder, twice, both without and with a 1% bleach treatment. We incubated the samples overnight under motion at 55 °C in 500 μl of extraction buffer (0.45 M EDTA, 0.1 M UREA, 150 μg proteinase K). After centrifugation of the samples at 1700 × g for 5 min, we collected the supernatant, concentrated and purified it using a Zymo-Spin V reservoir (Zymo Research Irvine, CA, USA) and Qiagen MinElute spin column (Qiagen, Inc., Valencia, CA, USA).
We built double-stranded libraries for these three samples following the blunt-end single-tube library building (BEST) protocol53. We index amplified the libraries with the following thermocycling conditions: 2 min at 95 °C, followed by 20 to 26 cycles of 30 s at 95 °C, 30 s at 60 °C, and 1 min 50 s at 72 °C, and a final, 10-min elongation step at 72 °C. We purified PCR products using Qiagen MinElute spin columns (Qiagen, Hilden, Germany) following the manufacturer’s instructions and eluted in 20 μl EB.
We performed two rounds of in-solution enrichment capture on pooled libraries from different PCR reactions to increase complexity using a custom capture-enrichment kit produced by MYbaits (Mybaits Whole-Genome Enrichment, MYcroarray, Ann Arbor)54. RNA bait libraries were constructed by transcribing a fragmented contemporary high molecular weight bottlenose dolphin DNA (sample was from ref. 30). We hybridised ancient DNA libraries to these RNA baits in a reaction containing adaptor blockers for 24 h. We used streptavidin-coated magnetic beads to retain hybridised fragments and discard unbound DNA55 following the manufacturer’s instruction (online version 3.02 – July 2016). We amplified captured libraries in two PCR reactions using KAPA HiFi polymerase (Kapa Biosystems) with the following thermocycling conditions: 2 min at 98 °C, followed by 14 cycles of 20 s at 98 °C, 30 s at 60 °C and 20 s at 72 °C, and a final, 5-min elongation step at 72 °C. We performed hybridisation capture a second time on the pool of PCR products. We purified, quantified and pooled the double-capture libraries at equimolarity. We sequenced the libraries on one lane of HiSeq 2500 with the 80 bp SE technology at the Danish National High-throughput Sequencing Centre of Copenhagen University.
Contemporary sample DNA labwork
We extracted DNA from epidermal tissue using standard phenol:chloroform extraction method56. We then sheared genomic DNA to an average size of ~500 bp using a Diagenode Bioruptor Pico sonication device. We built libraries on the DNA samples following the BEST protocol53. We pooled the libraries at equimolarity, together with samples from other projects. We sequenced the pool on an Illumina HiSeq 4000 with the 80 bp SE technology at the Danish National High-throughput Sequencing Centre of Copenhagen University.
Data processing
Read trimming
We used Illumina’s CASAVA 1.8.2 software for the four ancient samples and the three new contemporary samples to convert Illumina’s *.bcl files to fastq and perform demultiplexing allowing for no mismatch in the 6-nucleotide indices used for barcoding. For the ancient samples, we processed sequencing reads within the generated fastq files with ADAPTER-REMOVAL v.257 to trim residual adapter sequence contamination and to remove adapter dimer sequences as well as low-quality stretches at 3´ ends (i.e. consecutive stretches of N’s and of bases with a quality score of 2 or lower). We discarded sequence reads that were ≤30 bp following trimming. For the 57 contemporary samples from ref. 19 and the three new contemporary samples generated during this study, we trimmed sequencing reads using Trimmomatic v.0.3258 using default parameters, and we discarded sequence reads shorter than 75 bp- see details in ref. 19.
Mitochondrial genomes
We first mapped the filtered reads to a modified version of the bottlenose dolphin mitochondrial genome from Scotland (GenBank KF570351.1)59 as per ref. 60. For the contemporary samples, we used BWA (v. 0.7.15) mem and default parameters61. For the ancient samples, we used BWA aln with the seed disabled (-l 1024) to include reads with post-mortem damage at the read ends29. We also mapped the reads to two additional individuals, to ensure there was no reference bias, which can be an issue with ancient DNA, i.e. one individual from the Black Sea (GenBank KF570326.1) and one individual from the western North Atlantic (GenBank KF570378.1)59. We compared the sequences and checked they were the same. We discarded PCR duplicates using the rmdup function of samtools v. 1.262,63 for the ancient samples and using picard-tools v. 2.1.064 for the contemporary samples. We constructed a consensus sequence using the doFasta function in ANGSD v. 0.91325 setting the mapping quality to 25 and the phred score to 30. We changed nucleotide sites to “N” if a single nucleotide did not represent >75% of the reads. We aligned all sequences with ClustalW in MEGA X65 with 30 sequences from ref. 20 and 48 sequences from ref. 59 and we visually inspected them for indel and reading frames.
Nuclear genomes
We extracted reads that did not map to the mitochondrial genome from the bam file and converted them into a fastq file using samtools v. 1.2 and picard-tools v.2.1.0. For the contemporary samples, we processed the data as described in ref. 19. In short, it involved the same steps as described below for the ancient samples, without the tweaks to decrease reference bias. For the ancient samples, we then mapped these reads to the reference bottlenose dolphin genome assembly (GenBank: GCA_001922835.1, NIST Tur_tru v1) using BWA (v. 0.7.15) aln61 with the seed disabled (-l 1024) and both default and adjusted parameters which were shown to reduce reference bias and include a better representation of alternate alleles28. Those parameters were an edit distance (n) of 0.01 and a maximum number of gaps (o) of 2. To further evaluate reference bias, we also mapped the reads of the ancient and contemporary samples to the killer whale genome assembly (GenBank GCA_000331955.2)30 using default parameters in BWA. Then, we kept only the mapped reads with a mapping quality of at least 25. We removed repeated regions as identified using RepeatMasker66, regions of excessive coverage and mapping artefacts (sites which were out of Hardy Weinberg Equilibrium with negative inbreeding coefficients in all populations as they could be paralogues or other mapping artefacts), and the sex chromosomes using a combination of bedtools v. 2.25.0 and samtools (see details in ref. 19). We also removed all the scaffolds which were shorter than 10 Mbp.
We inferred endogenous content as the percentage of sequencing reads mapping to the reference genome after base quality filtering of q-score 30 and read duplicate removal (Supplementary Table 2).
Data analyses
Assessing post-mortem DNA damage and contamination
We used MapDamage 2.067 to characterise post-mortem DNA damage and thereby check the authenticity of the data. Analyses revealed that the sequencing reads showed post-mortem damage characteristic of ancient DNA (Supplementary Fig. 1) including an excess of C- > T transitions at both termini for SP1060 and an excess of C- > T transitions at the 5’ termini due to deamination and G- > A transitions at the 3’ termini for the three other samples on which double-stranded libraries were built. Therefore, we used MapDamage to rescale base quality scores according to damage patterns when phred quality score ≥30 both for the mitochondrial and nuclear data. We then refiltered the bam files to keep only bases with a phred quality score ≥30. We ran all downstream analyses both including all sites and only including transversions. Results were the same, unless otherwise specified.
We detected no heterozygous sites in the mitochondrial genomes of the ancient samples, indicating an absence of contamination from present-day DNA.
Mitochondrial DNA phylogeny
Time-calibrated models using the mtDNA
We first aligned mitochondrial genomes of T. truncatus, extracted from carbon-dated ancient subfossils (n = 4), contemporary samples (n = 70; including the 57 samples from ref. 19, three newly sequenced ones plus ten samples from Nykänen et al. (2019)20, see supplementary data on the DataSuds repository), or downloaded from GenBank (n = 68, see Supplementary Table 3 and Supplementary Data)59 in software MEGA X65 with a T. aduncus reference mitogenome (KF570335.1) downloaded from GenBank. Then, we built a topology tree in MrBayes v.3.2.7a68,69 with these sequences, using 2,000,000 Markov Chain Monte Carlo (MCMC) samples and 25% burn-in, after the initial model selection for the best substitution scheme, ref. 70 with a proportion of invariable sites and Gamma distributed rate variation among sites, in jModelTest271,72. The inspection of the resulting consensus tree revealed that one sample (sample 117696 collected from the coastal eastern North Pacific) was very differentiated from the rest of the samples forming its own monophyletic branch. Therefore, we also ran the topology tree without this sample using the same settings as before. We then inspected the consensus trees from the two analyses (with and without 117,696) to find the two most evolutionarily distant T. truncatus haplotypes from both trees using the R-function ‘cophenetic’ from ape R package v.5.073 in R v4.0.5. These sequences represented a sample obtained from coastal western North Atlantic (sample WNAC8) and sample 117696; and sample WNAC8 and a sample originating from the eastern Mediterranean Sea (sample EMED6).
To estimate time-calibrated phylogenies, we used a two-step methodology by ref. 60. The purpose of the first step (delphinid tree model) was to estimate a calibration point (divergence of the two most evolutionarily distant T. truncatus haplotypes) for the second step (T. truncatus tree model). The two steps are described below.
Delphinid phylogeny
We aligned thirteen protein-coding genes of the mitochondrial genome from the two most divergent T. truncatus haplotypes (WNAC8 and 117696, or WNAC8 and EMED6) with the gene sequences of 19 delphinid species downloaded from GenBank (Supplementary Table 3)59,60,74,75,76,77,78. We used the ‘greedy’ search in PartitionFinder (v1.1.0)79 to find the best partitioning scheme for each gene, and we built two time-calibrated phylogenetic trees (one with and one without the sample 117696) in BEAST2 v.2.680 with three different data partitions for nucleotide substitution models (Supplementary Table 4). We used all codon positions and only the third codon positions, both with a single strict clock model based on the results from a previous study20, and concatenated the individual gene trees into a single phylogeny to find the time to a most recent common ancestor (TMRCA) between the two most divergent T. truncatus haplotypes. We ran two independent models for each scenario with 40,000,000 MCMC steps, 10% pre-burn-in, and a sampling frequency of 4000. We used the divergence between Monodontidae and Delphinidae (mean = 10.08 Myr, SD = 1.41381) to calibrate the root of the tree and we used a Calibrated Yule prior for the branching rate. We checked the convergence of MCMC chains and the Effective Sample Size (ESS) values relating to the model parameters in Tracer v.1.7.182. After verifying convergence, we used LogCombiner v.2.6 and TreeAnnotator v.2.683 to combine and summarise the trees, respectively.
T. truncatus phylogeny
We extracted 13 protein-coding gene regions from the 142 common bottlenose dolphin samples in order to estimate the coalescence times of different T. truncatus clades. We removed duplicate haplotypes from this dataset, thus eliminating one ancient sample, NMR2273 (duplicate with another ancient haplotype SP1060), and 51 contemporary samples. We determined the best nucleotide partitioning scheme for the remaining 90 haplotypes using PartitionFinder (v1.1.0)79 (Supplementary Table 4). We used all codon positions and, as in ref. 60, only the third codon positions of the genes to minimise any possible effect of incomplete purifying selection on the coalescence times of the tree84,85. We built time-calibrated phylogenies in BEAST2 v.2.680 with two different data partitions for nucleotide substitution models (Supplementary Table 4). We applied a common strict clock model for all of the genes, following ref. 60 and ref. 20. For each tree built, we performed two independent runs with 100,000,000 MCMC steps, 10% pre-burn-in, and a sampling frequency of 10,000. We used the time to the most recent common ancestor (TMRCA) of all of the haplotypes, derived in the previous step when estimating the delphinid tree, to calibrate the root of the tree and a coalescent prior with constant population size was used for the tree branching rate. In addition, we applied tip calibrations with the carbon-dated subfossil samples. We ran altogether eight different scenarios for the trees, with and without the ancient sample NMR10151 which had low mitogenome coverage (2.3x) with 25% of the bases missing compared to the other subfossils, and with and without the contemporary sample 117696. We describe the different tree scenarios in Supplementary Table 5, and details of the priors can be found in the BEAST2 input.xml files on the DataSuds repository. For all tree models, we inspected the convergence of chains and model performance in Tracer82 after running each model twice, and we used LogCombiner and TreeAnnotator83 to combine the log- and tree-files and summarise the trees, respectively. We draw the resulting summary trees using FigTree v1.44 software (http://tree.bio.ed.ac.uk/).
Population structure
We used genotype likelihoods or pseudo-haploid data in ANGSD v.0.92125 to take into account differences in coverage between samples, where possible, or based our analyses on allele frequencies from called genotypes. We generated a vcf file including the contemporary individuals and SP1060 following the filtering steps as in ref. 19 and given in the supplementary scripts. In addition, in the vcf file, we kept only sites with data with 90% of the individuals and sites with no missing data in the ancient sample.
Projection of the ancient samples on the principal components (PC)
Note that we ran all the analyses mapping to the common bottlenose dolphin reference genome with relaxed BWA parameters and to the killer whale reference genome to check there was no effect of reference bias. Results were the same, unless otherwise specified.
We performed PCA projections of the ancient genomes on the principal components segregating the contemporary genome with smartpca from the eigensoft v.7.2.0 package24 using the option lsqproject. We ran smartpca both on (i) the contemporary samples and SP1060 and diploid genotype calls, including the contemporary individuals and (ii) SP1060 and the three other ancient samples using pseudo-haploid genotypes generated in ANGSD v.0.921.
We converted the vcf genotype file to the eigenstrat format using the utility vcf2eigenstrat.py from https://github.com/mathii/gdc/blob/master/vcf2eigenstrat.py.
For the pseudo-haploid genotype calls, we first used the following filters in ANGSD, including a MAF of 0.05 (minMinor 6) and data in 25% of the individuals (maxMis16): -dohaplocall 1 -doCounts 1 -minMapQ 25 -minQ 30 -C 50 -baq 1 -minMinor 6 -maxMis 16 -skipTriallelic 1 -remove_bads 1 -uniqueOnly 1 -ref ref.fa.
We then used the utility haplotoPlink from ANGSD to transform the haplo file to Plink tped and tfam. We used Plink v.1.9 to transform those into bed/bim/fam files which we converted into eigenstrat using convertf from eigensoft v.6.1.3. We used the options–allow-extra-chr–chr-set 85 (killer whale reference genome scaffolds >10 Mbp) or 56 (bottlenose dolphin reference genome scaffolds >10 Mbp) in Plink.
Factorial analysis (tfa)
We also ran the tfa method, which takes drift into account26 on the diploid genotypes only, including the contemporary individuals and SP1060 and sites with no missing data in SP1060, mapped to the bottlenose dolphin reference genome with relaxed parameters. We imputed missing values for the contemporary samples using the R package LEA v.3.2.0 for the run with the highest likelihood for K = 6 (this value was previously described as the best-fitted number of populations19) using the 'impute' function86. We also adjusted the genotypic data for coverage, as described in the method using the function “coverage_adjust” of the tfa R package, which uses a latent factor regression model. We used the function choose_lambda() to decide the lambda value for which the effect of sample age was removed from the fifth factor of the tfa analysis, similar values were obtained when choosing smaller factor values. We ran the tfa for K = 5 as the two pelagic Atlantic populations are very closely related. We also estimated ancestry coefficients for SP1060 including iteratively all populations. SP1060 did not show any ancestry relationship with the Pacific populations. Therefore, we estimated the ancestry coefficients for SP1060 including ENAc, WNAc and ENAp populations (including WNAp instead of ENAp gave the same results, likely due to the two populations being closely related).
ANGSD single-read sampling PCA approach
We also ran the single-read PCA method in ANGSD v.0.921, which involves random sampling of a single read for each sample at each site. In this analysis, the ancient samples were included in the PC computations and not projected onto PCs of contemporary samples. This provides a quality control measure, such as if the ancient samples presented sequencing or sequence data processing errors, they would appear as outliers in the PCA in comparison with the contemporary samples. We ran the analyses, with and without the transitions, and on the data mapped to the bottlenose dolphin reference genome and the killer whale reference genome, on (i) the 60 contemporary individuals and SP1060, allowing no missing data, (ii) the 60 contemporary individuals, SP1060 and NMR10151 allowing no missing data, (iii) the 60 contemporary individuals, SP1060, NMR2273 and NMR10326 allowing for missing data in one individual and (iv) the 60 contemporary individuals and all four ancient individuals allowing for missing data in two individuals. As the results were consistent, we only present those including all the samples.
We used the following options in ANGSD: samples.bamlist -nThreads 9 -doIBS 1 -doCounts 1 -doMajorMinor 1 -minFreq 0.05 -minInd 61 -maxMis 2 -rmTrans 1 -output01 0 -makeMatrix 1 -doCov 1 -minMapQ 25 -minQ 30 -out out_rmTrans -GL 1 -skipTriallelic 1 -remove_bads 1 -uniqueOnly 1.
Evolutionary relationships
D-statistics analysis
We used D-statistics to assess the relationships of ancient sample SP1060 to contemporary populations. We first performed D-statistics on all 15 possible combinations including two contemporary samples and the ancient sample, with the killer whale, Orcinus orca, as the outgroup. The D-statistic describes an excess of shared derived alleles between taxa which could be the result of introgression or ancestral population structure. It thus allows to detect departure from the ‘tree-ness’ of a given topology33,34,87. We considered that H1 and H2 are two contemporary dolphin populations. We used the D-statistics to evaluate if the data are consistent with the null hypothesis that the tree (((H1,H2), SP1060), Orca) is correct and that there has been no gene flow between the ancient sample and neither H1 nor H2. The definition of the D-statistics used here is the one of ref. 33.
where nABBA is the number of sites where only H2 and H3 share a derived allele (ABBA sites) and nBABA is the number of sites where only H1 and H3 share a derived allele (BABA sites). Under the null hypothesis that the given topology is the true topology, we expect an equal number of ABBA and BABA sites and thus D = 0. A statistic differing significantly from 0 indicates either gene flow between one population within the in-group and H3, or that the tree is incorrect. We implemented the tests in ANGSD v.0.921, and sampled a single base at each position of the genome to remove bias caused by differences in sequencing depth and only considering sites covered in all individuals. We calculated D-statistics using one individual per population and we repeated the test three times using a different individual each time. We ran the analyses on the data mapped to the common bottlenose dolphin reference genome with relaxed BWA parameters and to the killer whale reference genome to check for reference bias, and both without and keeping the transitions, to make sure there was no impact from DNA damage patterns. We used the same filters as described above for the PCAs. We assessed the significance of the deviation from 0 using a Z-score based on blocked jackknife estimates of the standard deviation of the D-statistics (block size was 5 Mb which should be higher than the LD in the populations). This Z-score relies on the assumption that the D-statistics, under the null hypothesis, is normally distributed with mean 0 and a standard deviation equal to a standard deviation estimate computed using the “delete-m jackknife for unequal m” method described in Busing et al.88. An example of a command is: angsd -out ${1}_XchrmHWE_minIndKWcov -doAbbababa 1 -rmTrans 1 -blockSize 5000000 -enhance 1 -bam ${1}_XchrmHWE.filelist -doCounts 1 -useLast 1 -minMapQ 25 -minQ 30 -minInd 4 -maxMis 0 -remove_bads 1 -uniqueOnly 1.
Admixture graph analysis
We next reconstructed relationships between SP1060 and the Atlantic contemporary populations (n = 37) using an admixture graph analysis34. We excluded the Pacific populations from this analysis as there are many other populations in between the regions. We only included the already published ENAc individuals (n = 10) and did not include the newly generated three individuals as they belong to different populations51.
We used a heuristic search algorithm, qpBrute (https://github.com/ekirving/qpbrute)35,36 to explore the space of all possible admixture graphs fitted to the four Atlantic populations using qpGraph. We built the admixture graph using two datasets: (i) pseudo-haploid data including SP1060 and the contemporary Atlantic populations mapped to the killer whale reference as it minimises reference bias, and (ii) called genotypes for the contemporary Atlantic populations mapped to the bottlenose dolphin reference. We used these two datasets to check results were similar, irrespective of the data type (pseudo-haploid and called genotypes) and the reference genome.
For the pseudo-haploid data, we used 580,589 SNPs and a killer whale genome as the outgroup. We kept the transitions as evolutionary relationships were similar with and without the transitions (Supplementary Fig. 9). We used the following command in ANGSD: angsd -b SP1060_Atlantic_KW_10MbpKW.bamlist -dohaplocall 1 -doCounts 1 -minMapQ 25 -minQ 30 -C 50 -baq 1 -minMinor 4 -maxMis 9 -skipTriallelic 1 -remove_bads 1 -uniqueOnly 1 -ref /path/unplaced.scaf.fa -out Atlantic_KW_SP1060_mapKW.
We then converted the data to the eigenstrat format as described above for smartpca.
For the contemporary called genotype data, we used one Indo-Pacific bottlenose dolphin, Tursiops aduncus, individual to root the graph. We filtered the dataset as follows: a mapping quality of 30, a genotype quality of 20, genotype depth of at least 3x, less than 25% of missing data overall, genotype data in at least five individuals in each population, a MAF of 0.05, bi-allelic SNPs only, no missing data in the T. aduncus individual constituting the outgroup, scaffolds of at least 10 Mbp, and a distance of at least 5 Kb between SNPs. We obtained 213,488 SNPs. We also set the missing data to less than 10% and got similar results.
We ran qpBrute using the following parameters: outpop: NULL, useallsnps: YES, blgsize: 0.05 (5 Mb which is the block size for Jackniffe), forcezmode: YES, lsqmode: YES, diag: .0001, bigiter: 6, hires: YES, lambdascale: 1, inbreed: YES (for the pseudo-haploid data only).
As we have only one individual to root the graph, we did not attempt to use the allele frequency of the outgroup to normalise the weighting of each SNP in the ingroup and we used the option output: NULL that is SNPs are flat-weighted. We used the option inbreed:YES for the pseudo-haploid data. This allows for each pseudo-haploid to contribute to only one haplotype when applying the low sample size correction for the f2-statistics. This does not work when including only a single pseudo-haploid sample as we did with SP1060, but it should not be an issue as using the default option inbreed:NO gave similar results.
In qpBrute, leaf nodes were added to the graph using a stepwise addition order algorithm. At each step, the insertion of a new node was tested at all branches of the graph, apart from the outgroup branch. All possible admixture combinations were tried where a node could not be added without producing f4-statistics outliers (i.e. |Z| ≥3). The sub.graph was discarded when a node could not be inserted via these approaches. Where a node was successfully added, then the remaining nodes were recursively inserted into the graph. For the pseudo-haploid dataset, the package tried all possible 120 starting graph orders. We found only one graph with no f4 outliers among a total of 3663 unique graphs. For the called genotypes, the package tries all possible 24 starting graph orders and fitted 234 unique admixture graphs to our dataset. We found three possible graphs with no f4 outliers left, although two of them were mirror graphs. We computed the mean log-likelihoods of the three models and their Bayes Factors using the MCMC algorithm implemented in the R package ADMIXTUREGRAPH v.1.0.289 using the default chain settings implemented in qpBrute (two chains, each with two million iterations, five heated chains, a burn-in of 50%, and no thinning). We assessed the convergence of the chains using the output from the R package CODA v.0.19-490, also generated in qpBrute. The Bayes factors showed non-significant support of the first model in comparison to the other “mirror” two, which have similar log-likelihoods (Bayes factor of 0.88). We increased the chains to four million iterations with a burn-in of three millions, but it did not help in discriminating between the three graphs.
Inferences on the SNPs under parallel linked selection
Patterns of variation
We investigated and compared patterns of variation in SP1060 and contemporary populations at sites previously identified as evolving under parallel linked selection19. We plotted a PCA for the 2122 SNPs under parallel linked selection using the packages adegenet v.2.1.5 (glPCA function) and scales v.1.2.191 and a neighbour-joining tree using the R package ape73. We performed 100 bootstraps of the tree using the boot.phylo option of the package ape v.5.6.2 and indicated for which nodes bootstrap support values were higher than 95%. We compared the tree for the SNPs under parallel linked selection with ten trees generated using the same number of randomly sampled neutral SNPs, also computing bootstrap support values as described earlier. We also plotted the raw genotypes of the 2122 SNPs using the R packages vcfR v.1.12.092 and adegenet93.
Heterozygosity estimation
We estimated heterozygosity for the sites under parallel linked selection to coastal habitat by computing individual site-frequency-spectrum using ANGSD v.0.921, both with and without transitions. The first command was as follows: angsd -i file.bam -anc ancestral_state.fa -dosaf 1 -P 6 -gl 1 -out file -minMapQ 30 -minQ 23 -remove_bads 1 -uniqueOnly 1 We then added '-noTrans 1' to remove transitions. Then we ran: realSFS filet.saf.idx >est.ml.
The heterozygosity is the product of the number of variants in the second column of the est.ml file, which corresponds to the heterozygotes, divided by the total number of sites. We tested whether mean heterozygosity was different between coastal and pelagic populations within a region using two-sided t-tests or Wilcoxon tests depending on whether the data satisfied normality and homogeneity of variances.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Raw sequencing data generated as part of this project are part of Bioproject PRJNA955223. Biosample and SRA accession numbers are provided in Supplementary Table 1 and Supplementary data file 1. Mitochondrial genome sequences are accessible on GenBank at accession numbers OR120165-OR120228 as detailed in Supplementary Data File 1. Supplementary Data file 1 also indicates where the samples are housed and who should be contacted to access the material. We also used the whole-genome re-sequencing data from Bioproject PRJNA724031. We have also included mitogenome sequences from Genbank as indicated in Supplementary Table 3. Source data are provided with this paper in the Source Data file and the data, codes and related documentations that support the findings of this study are openly available in the DataSuds repository (IRD, France) at https://doi.org/10.23708/DABAWD. Data reuse is granted under a CC-BY licence. Source data are provided with this paper.
Code availability
The data, codes and related documentations that support the findings of this study are openly available in the DataSuds repository (IRD, France) at https://doi.org/10.23708/DABAWD. Data reuse is granted under a CC-BY licence.
References
Barrett, R. D. H. & Schluter, D. Adaptation from standing genetic variation. Trends Ecol. Evol. 23, 38–44 (2008).
Waters, J. M. & McCulloch, G. A. Reinventing the wheel? Reassessing the roles of gene flow, sorting and convergence in repeated evolution. Mol. Ecol. 30, 4162–4172 (2021).
Roesti, M., Gavrilets, S., Hendry, A. P., Salzburger, W. & Berner, D. The genomic signature of parallel adaptation from shared genetic variation. Mol. Ecol. 23, 3944–3956 (2014).
Lee, K. M. & Coop, G. Population genomics perspectives on convergent adaptation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 374, 20180236 (2019).
Van Belleghem, S. M. et al. Evolution at two time frames: polymorphisms from an ancient singular divergence event fuel contemporary parallel evolution. PLoS Genet. 14, e1007796 (2018).
Montejo-Kovacevich, G. et al. Repeated genetic adaptation to altitude in two tropical butterflies. Nat. Commun. 13, 4676 (2022).
De-Kayne, R. et al. Genomic architecture of adaptive radiation and hybridization in Alpine whitefish. Nat. Commun. 13, 4479 (2022).
Le Moan, A., Gagnaire, P.-A. & Bonhomme, F. Parallel genetic divergence among coastal-marine ecotype pairs of European anchovy explained by differential introgression after secondary contact. Mol. Ecol. 25, 3187–3202 (2016).
Magalhaes, I. S. et al. Intercontinental genomic parallelism in multiple three-spined stickleback adaptive radiations. Nat. Ecol. Evol. 5, 251–261 (2021).
Rosel, P. E., Hansen, L. & Hohn, A. A. Restricted dispersal in a continuously distributed marine species: common bottlenose dolphins Tursiops truncatus in coastal waters of the western North Atlantic. Mol. Ecol. 18, 5030–5045 (2009).
Lowther-Thieleking, J. L., Archer, F. I., Lang, A. R. & Weller, D. W. Genetic differentiation among coastal and offshore common bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Mar. Mamm. Sci. 31, 1–20 (2015).
Louis, M. et al. Ecological opportunities and specializations shaped genetic divergence in a highly mobile marine top predator. Proc. Biol. Sci. 281, 20141558 (2014).
Costa, A. P. B. et al. Ecological divergence and speciation in common bottlenose dolphins in the western South Atlantic. J. Evol. Biol. https://doi.org/10.1111/jeb.13575 (2019).
Natoli, A., Peddemors, V. M. & Hoelzel, A. R. Population structure and speciation in the genus Tursiops based on microsatellite and mitochondrial DNA analyses. J. Evol. Biol. 17, 363–375 (2004).
Hoelzel, A. R., Potter, C. W. & Best, P. B. Genetic differentiation between parapatric ‘nearshore’ and ‘offshore’ populations of the bottlenose dolphin. Proc. Biol. Sci. 265, 1177–1183 (1998).
Costa, A. P. B., Mcfee, W. & Wilcox, L. A. The common bottlenose dolphin (Tursiops truncatus) ecotypes of the western North Atlantic revisited: an integrative taxonomic investigation supports the presence of distinct species. Zool. J. Linn. Soc. https://doi.org/10.1093/zoolinnean/zlac025/6585199 (2022).
Perrin, W. F., Thieleking, J. L., Walker, W. A., Archer, F. I. & Robertson, K. M. Common bottlenose dolphins (Tursiops truncatus) in California waters: cranial differentiation of coastal and offshore ecotypes. Mar. Mamm. Sci. 27, 769–792 (2011).
Machado, A. M. S. et al. Homophily around specialized foraging underlies dolphin social preferences. Biol. Lett. 15, 20180909 (2019).
Louis, M. et al. Selection on ancestral genetic variation fuels repeated ecotype formation in bottlenose dolphins. Sci. Adv. 7, eabg1245 (2021).
Nykänen, M. et al. Postglacial colonization of northern coastal habitat by bottlenose dolphins: a marine leading-edge expansion? J. Hered. 110, 662–674 (2019).
Shennan, I. et al. Modelling western North Sea palaeogeographies and tidal changes during the Holocene. Geol. Soc. Lond. Spec. Publ. 166, 299–319 (2000).
Post, K. A Weichselian marine mammal assemblage from the southern North Sea. Deinsea 11, 21–28 (2005).
Aaris-Sørensen, K. Diversity and Dynamics of the Mammalian Fauna in Denmark throughout the Last Glacial-Interglacial Cycle, 115-0 kyr BP (John Wiley & Sons, 2010).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinforma. 15, 356 (2014).
François, O. & Jay, F. Factor analysis of ancient population genomic samples. Nat. Commun. 11, 4661 (2020).
Günther, T. & Nettelblad, C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 15, e1008302 (2019).
Martiniano, R., Garrison, E., Jones, E. R., Manica, A. & Durbin, R. Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph. Genome Biol. 21, 250 (2020).
Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics 13, 178 (2012).
Foote, A. D. et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 47, 272–275 (2015).
McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).
Lawson, D. J., van Dorp, L. & Falush, D. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 9, 3258 (2018).
Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011).
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Ní Leathlobhair, M. et al. The evolutionary history of dogs in the Americas. Science 361, 81–85 (2018).
Liu, L. et al. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nat. Commun. 10, 1992 (2019).
Peter, B. M. A geometric relationship of F2, F3 and F4 -statistics with principal component analysis. Philos. Trans. R. Soc. B Biol. Sci. https://doi.org/10.1098/rstb.2020.0413 (2022).
Marciniak, S. & Perry, G. H. Harnessing ancient genomes to study the history of human adaptation. Nat. Rev. Genet. 18, 659–674 (2017).
Frantz, L. A. F., Bradley, D. G., Larson, G. & Orlando, L. Animal domestication in the era of ancient genomics. Nat. Rev. Genet. 21, 449–460 (2020).
Star, B. et al. Ancient DNA reveals the Arctic origin of Viking Age cod from Haithabu, Germany. Proc. Natl Acad. Sci. USA 114, 9152–9157 (2017).
Barlow, A. et al. Middle Pleistocene genome calibrates a revised evolutionary history of extinct cave bears. Curr. Biol. 31, 1771–1779.e7 (2021).
Wang, M.-S. et al. A polar bear paleogenome reveals extensive ancient gene flow from polar bears into brown bears. Nat. Ecol. Evol. 6, 936–944 (2022).
Kirch, M., Romundset, A., Gilbert, M. T. P., Jones, F. C. & Foote, A. D. Ancient and modern stickleback genomes reveal the demographic constraints on adaptation. Curr. Biol. 31, 2027–2036.e8 (2021).
Guerrero, R. F. & Hahn, M. W. Speciation as a sieve for ancestral polymorphism. Mol. Ecol. 26, 5362–5368 (2017).
Bolnick, D. I., Barrett, R. D. H., Oke, K. B., Rennison, D. J. & Stuart, Y. E. (Non)Parallel evolution. Ann. Rev. Ecol. Evol. Syst. https://doi.org/10.1146/annurev-ecolsys-110617-062240 (2018).
Korlević, P., Talamo, S. & Meyer, M. A combined method for DNA analysis and radiocarbon dating from a single sample. Sci. Rep. 8, 4127 (2018).
Kompanje, E. & Post, K. Remarkable mandibular healing in an early Holocene bottlenose dolphin (Tursiops truncatus). Lutra 60, 61–66 (2017).
Heaton, T. J. et al. Marine20—The marine radiocarbon age calibration curve (0–55,000 cal BP). Radiocarbon https://doi.org/10.1017/rdc.2020.68 (2020).
Stuiver, M. & Reimer, P. J. Extended 14C data base and revised CALIB 3.0 14C age calibration program. Radiocarbon https://doi.org/10.1017/s0033822200013904 (1993).
Mangerud, J., Bondevik, S., Gulliksen, S., Hufthammer, A. K. & Høisæter, T. Marine 14C reservoir ages for 19th century whales and molluscs from the North Atlantic. Quat. Sci. Rev. https://doi.org/10.1016/j.quascirev.2006.03.010 (2006).
Nykänen, M. et al. Fine‐scale population structure and connectivity of bottlenose dolphins, Tursiops truncatus, in European waters and implications for conservation. Aquat. Conserv. 29, 197–211 (2019).
Korlević, P. et al. Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. Biotechniques 59, 87–93 (2015).
Carøe, C. et al. Single‐tube library preparation for degraded DNA. Methods Ecol. Evol. 9, 410–419 (2018).
Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).
Carpenter, M. L. et al. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am. J. Hum. Genet. 93, 852–864 (2013).
Baker, C. S. et al. Hierarchical structure of mitochondrial DNA gene flow among humpback whales Megaptera novaeangliae, world-wide. Mol. Ecol. 3, 313–327 (1994).
Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 9, 88 (2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Moura, A. E. et al. Recent diversification of a marine genus (Tursiops spp.) tracks habitat preference and environmental change. Syst. Biol. 62, 865–877 (2013).
Morin, P. A. et al. Geographic and temporal dynamics of a global radiation and diversification in the killer whale. Mol. Ecol. 24, 3964–3979 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics https://doi.org/10.1093/bioinformatics/btp324 (2009).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Li, H. et al. The sequence alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Broad Institute. Picard tools, https://broadinstitute.github.io/picard (2016).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. https://doi.org/10.1093/molbev/msy096 (2018).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker open-4.0. 2013–2015. (2015).
Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics https://doi.org/10.1093/bioinformatics/btt193 (2013).
Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).
Hasegawa, M., Kishino, H. & Yano, T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174 (1985).
Posada, D. jModelTest: phylogenetic model averaging. Mol. Biol. Evol. 25, 1253–1256 (2008).
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Hassanin, A. et al. Pattern and timing of diversification of Cetartiodactyla (Mammalia, Laurasiatheria), as revealed by a comprehensive analysis of mitochondrial genomes. C. R. Biol. 335, 32–50 (2012).
Vilstrup, J. T. et al. Mitogenomic phylogenetic analyses of the Delphinidae with an emphasis on the Globicephalinae. BMC Evol. Biol. 11, 65 (2011).
Xiong, Y., Brandley, M. C., Xu, S., Zhou, K. & Yang, G. Seven new dolphin mitochondrial genomes and a time-calibrated phylogeny of whales. BMC Evol. Biol. 9, 20 (2009).
Arnason, U., Gullberg, A. & Janke, A. Mitogenomic analyses provide new insights into cetacean origin and evolution. Gene 333, 27–34 (2004).
Moura, A. E. et al. Phylogenomics of the genus Tursiops and closely related Delphininae reveals extensive reticulation among lineages and provides inference about eco-evolutionary drivers. Mol. Phylogenet. Evol. 146, 106756 (2020).
Lanfear, R., Calcott, B., Ho, S. Y. W. & Guindon, S. Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701 (2012).
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).
McGowen, M. R., Spaulding, M. & Gatesy, J. Divergence date estimation and a comprehensive molecular tree of extant cetaceans. Mol. Phylogenet. Evol. 53, 891–906 (2009).
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).
Ho, S. Y. W., Phillips, M. J., Cooper, A. & Drummond, A. J. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. https://doi.org/10.1093/molbev/msi145 (2005).
Ho, S. Y. W., Duchêne, S., Molak, M. & Shapiro, B. Time-dependent estimates of molecular evolutionary rates: evidence and causes. Mol. Ecol. 24, 6007–6012 (2015).
Frichot, E. & François, O. LEA: an R package for landscape and ecological association studies. Methods Ecol. Evol. https://doi.org/10.1111/2041-210x.12382 (2015).
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Busing, F. M., Meijer, E. & Van Der Leeden, R. Delete-m jackknife for unequal m. Stat. Comput. 9, 3–8 (1999).
Rajora, O. P. Population Genomics: Concepts, Approaches and Applications (Springer, 2019).
Plummer, M., Best, N., Cowles, K. & Vines, K. CODA: convergence diagnosis and output analysis for MCMC. R. N. 6, 7–11 (2006).
Wickham, H. Scales: scale functions for visualization. R package version 0. 4. 0 https://CRAN.R-project.org/package=scales (2016).
Knaus, B. J. & Grünwald, N. J. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 17, 44–53 (2017).
Jombart, T. & Ahmed, I. adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27, 3070–3071 (2011).
Acknowledgements
We especially thank Matthias Meyer for providing the SP1060 library, which allowed the generation of the ancient genome that was central to this study. We thank Ramon Fallon, Joseph Ward and Peter Thorpe from the St Andrews Bioinformatics Unit for help with bioinformatics. Bioinformatics and computational analyses were supported by the University of St Andrews Bioinformatics Unit which is funded by a Wellcome Trust ISSF award [grant 105621/Z/14/Z] and ran on cluster marvin and Crop Diversity HPC. We thank Kelly Roberston for the laboratory work for the contemporary samples from the USA and for arranging shipment. We thank M. Thomas P. Gilbert for our use of the ancient DNA facilities in Copenhagen. We thank Jazmín Ramos Madrigal for advice on data format conversion. We thank everyone involved in sample collection including Conor Ryan for sampling some of the stranded individuals in Ireland, the stranding networks in Ireland and Scotland and the Groupe d’Etude des Cétacés du Cotentin, and the NOAA Southeast and Southwest Fisheries Science Centre field crews and Simon Ingram for biopsy sample collection. Funding for ancient DNA labwork and sequencing was provided by the Total Foundation awarded to M.L. Funding for sample collection in Ireland was provided by Science Foundation Ireland. Funding for labwork and visits to the University of Copenhagen was provided by the Systematics Research Fund, a Godfrey Hewitt mobility award from the European Society for Evolutionary Biology (ESEB), Lerner-Gray Grants for Marine Research. M.L. was supported by a Fyssen Fellowship, Total Foundation, the University of St Andrews and the Greenland Research Council. Contemporary sample DNA extractions were supported by People’s Trust for Endangered Species and the sequencing costs were supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 663830 awarded to A.D.F., by the Total Foundation awarded to M.L., the University of Groningen awarded to M.C.F., and the Marine Alliance for Science and Technology for Scotland and The Russell Trust awarded to O.E.G. M.N. was funded by MASTS and the Crawford Hayes fund. A.D.F. was funded by Marie Skłodowska-Curie grant agreement No. 663830 and the European Research Council grant agreement No. ERC-COG-101045346.
Funding
Open access funding provided by Norwegian University of Science and Technology.
Author information
Authors and Affiliations
Contributions
M.L., M.N. and A.D.F. conceived the study; F.A., S.B., A.B., E.R., J.O.B., K.P., P.E.R., H.v.d.E. collected or curated samples/specimens, M.L., P.K., M.N., M.-H.S.S., N.W. and A.D.F. ran the laboratory work; M.-H.S.S. and NW provided guidance in the ancient lab; M.L., M.N. and F.R. analysed the data; all authors interpreted the data; M.C.F., O.E.G. and A.D.F. supervised the work, M.L. and A.D.F. wrote the manuscript, M.N., P.K., F.A., S.B., A.B., E.D.L., J.O.B., K.P., F.R., E.R., P.E.R., M.-H.S.S., H.v.d.E., N.W., M.C.F. and OE.G. commented and proof-read the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Fernando Diaz-Aguirre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Louis, M., Korlević, P., Nykänen, M. et al. Ancient dolphin genomes reveal rapid repeated adaptation to coastal waters. Nat Commun 14, 4020 (2023). https://doi.org/10.1038/s41467-023-39532-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-023-39532-z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.