Global phylogeography and genetic diversity of the long-finned pilot whale Globicephala melas, with new data from the southeastern Pacific

The matrilineal long-finned pilot whale presents an antitropical distribution and is divided into two subspecies, one in the temperate seas of the Southern Hemisphere and the other restricted to the North Atlantic and Mediterranean. Until now, population genetic and phylogeographic studies have included localities of most of its Northern Hemisphere distribution, while only the southwestern Pacific has been sampled in the Southern Hemisphere. We add new genetic data from the southeastern Pacific to the published sequences. Low mitochondrial and nuclear diversity was encountered in this new area, as previously reported for other localities. Four haplotypes were found with only one new for the species. Fifteen haplotypes were detected in the global dataset, underlining the species’ low diversity. As previously reported, the subspecies shared two haplotypes and presented a strong phylogeographic structure. The extant distribution of this species has been related to dispersal events during the Last Glacial Maximum. Using the genetic data and Approximate Bayesian Calculations, this study supports this historical biogeographic scenario. From a taxonomic perspective, even if genetic analyses do not support the subspecies category, this study endorses the incipient divergence process between hemispheres, thus maintaining their status and addressing them as Demographically Independent Populations is recommended.

Global and local haplotype networks. Adding previously published sequences from other oceanic basins, we detected that (1) two haplotypes were shared between G. m. edwardii (SP) and G. m. melas (NA and MED), and (2) one haplotype was shared by all SP localities but was absent from NA and MED (Table 1, Fig. 1).  Table 1. Summary of number of haplotypes and genetic indices for each included locality and per subspecies (abbreviations are detailed in the methods section). The first row indicates the names of each haplotype according to the original authors. Numbers 60 and 62 represent the two last digits of the GenBank codes provided by Miralles et al., (2016) for identification. Number of haplotypes (h), haplotype diversity (Hd), pairwise differences between sequences (Π) and nucleotide diversity (π) are also detailed.
Haplotypes P + U and S + R were the only ones detected within the distribution range of both G. melas subspecies (Table 1, Fig. 2a): haplotype P + U is present in the SP and NA, while S + R was found in all three basins (SP, NA and MED), and was pivotal in the development of local diversity. Haplotype Q + Y was shared by all SP localities, but absent from NA and MED. The remaining 12 haplotypes represent in situ diversification within each corresponding basin: eight for SP, four for NA and one for MED. The haplotypes found in the Mediterranean area are represented either by S + R or by D, which derives directly from it. The same occurs with half (2) of haplotypes encountered in the NA and half (4) of those exclusive to the SP.
In the haplotype network by locality, haplotype S + R is present in all ten studied areas and originates much of the local diversity within them (Table 1, Fig. 2b). Haplotype P + U plays a similar, but downscaled role, in the South Pacific; it originates half (4) of the diversity that is exclusive to this area. This haplotype is also much more abundant in the localities corresponding to G. m. edwardii than in the range of G. m. melas, where it was detected  in lower numbers in three localities (17 samples from FI, one from NWA and one from UK). One of the haplotypes that originates from P + U, haplotype Q + Y, is exclusive to the South Pacific and is present in all three localities. Ten of the total fifteen haplotypes (67%) were exclusively present in one of the following five localities: TAS (3), NZ (3), CL (1), IB (2) and NWA (1). In the South Pacific, 70% of haplotypes were private to one of the three localities; it was the only basin with at least one private haplotype in each. Three private haplotypes were encountered in the North Atlantic: one in the NWA and two in the IB.
Global and local genetic structure. Significant Table 3). All ϕ ST comparisons were statistically significant within the South Pacific. In the North Atlantic, only the Faroe Islands showed a statistically significant phylogeographic structure with all other localities, while NWA did with FI and UK. All comparisons among localities from different basins showed statistically significant differences. The AMOVA showed significant differentiation among the two distribution ranges, with greater variation among groups than within them (Table 4).
In the Correspondence Analysis, the first axis separated the localities according to their respective hemisphere of origin. The three South Pacific localities (TAS, NZ and CL) were clustered separately from those of the Northern Hemisphere. The second axis divided the Strait of Gibraltar and the Mediterranean Sea from the remaining localities in the Northern Hemisphere (Fig. 3). The calculation of Nei's (1987) net nucleotide divergence (d A ) among the subspecies G. m. edwardii and G. m. melas resulted in d A = 0.00158 (SD = 0.00019).

Historical biogeography.
The origin of the current disjunct distribution of Globicephala melas was explored with DIYABC, using two historical biogeographic models associated to the Last Glacial Maximum (LGM). The first model, scenario 1, represented a colonization event from the Southern Hemisphere to the Northern Hemisphere, followed by isolation and population growth in the Northern Hemisphere. The second model, scenario 2, tested a possible event of vicariance among both distributions.
Based on the genetic data provided, both scenarios were realistic (Fig. 4a), even though the simulations performed were more supportive of the founder effect population history scenario (Fig. 4b). Parameter estimations fell within the proposed ranges and the prior and posterior values from our dataset were not statistically different from simulations. A range expansion would have occurred around 12 900 years ago (t2), followed by a distribution split and population growth, 9 380 years ago (t1) (Fig. 4c). The mutation rate was estimated at u = 4.44 e −8 .

Discussion
Sampling cetacean species for population genetic studies has proven to be logistically challenging. A large number of individuals need to be sampled for this kind of studies, but they are generally widespread geographically and low in density. Mass strandings of cetaceans represent unique opportunities to collect large numbers of samples, as in the case of the franciscana Pontoporia blainvillei 29 , bottlenose dolphin Tursiops truncatus 30 , Cuvier's beaked whale Ziphius cavirostris 31 and the short-beaked common dolphin Delphinus delphis 32 . The long-finned pilot whale is a species for which almost all genetic data has been obtained from sampling mass strandings. A large part of its distribution range has been already sampled in the northwestern and northeastern Atlantic 23-26 , but within their broader distribution in the Southern Hemisphere, only the southwestern Pacific has been covered 20 . This study presents new data from two recent mass standings in southern Chile, in order to produce the first genetic information for Globicephala melas in the southeastern Pacific and to integrate this new information into the global phylogeography of the species.
Contrasting tissue quality was found between both strandings, which we attribute to the time of sampling of each event. Our sampling of the Isla Clemente stranding took place an estimated two to three months after occurring 33 , while the Isla Navarino individuals were sampled five days after stranding 34 , precluding tissue the deterioration encountered in the former event. This highlights the importance of swift responses to stranding events, in order to secure the best tissue quality possible.
Prior to conducting mtDNA analyses, we first decided to remove one polymorphic site that exhibited a strong homoplasy signal. This problem in mtDNA sequences of G. melas was already detected 20 , but not taken into account for the genetic analyses. Homoplasic sites have been described to hinder the resolution of mtDNA gene trees 35 and have also been pointed out as potential confounders of evolutionary analysis in the mtDNA control region of humpback whales 36 and human mtDNA coding regions 37 , among others.
Despite the addition of 90 new sequences from the southeastern Pacific to the set of available sequences in the literature -over 9000 km from New Zealand, the nearest sampled locality-the global haplotype network was expanded by just one private low-frequency haplotype, further confirming the worldwide low mitochondrial diversity of G. melas. Such low mtDNA diversity has been reported for other matrilineal odontocetes, including orcas (π = 0.52%) 38 , false killer whales (π = 0.01-0.30%) 39 and sperm whales (π = 0.131-0.407%) 40 . In contrast, cetaceans with more labile social cohesion such as mysticetes and non-matrilineal odontocetes, generally present much higher nucleotide diversity, a feature that appears to be common among these species 41 . Cultural hitchhiking has been regarded as a driver of their low mitochondrial diversity 41,42 . An analogous effect of behaviour on mitochondrial diversity was described in a resident coastal bottlenose dolphin population in central Chile 43 . This particular population operates akin to pilot whales as it is also composed of adult females and their descendants, both male and female. That study found that the genetic diversity of this matrilineal population (Hd = 0.63; π = 0.8%) was lower than that of the non-matrilineal transient-pelagic group adjacent to them (Hd = 0.95; www.nature.com/scientificreports www.nature.com/scientificreports/ π = 1.4%), suggesting the importance of social structure in shaping the pattern of genetic diversity in cetaceans (at least in odontocetes).
As previously reported for other areas 20,23,24,26 , our results show that southeastern Pacific long-finned pilot whales (n = 90) have low genetic diversity, in particular for haplotype richness (h = 4) and nucleotide diversity (π = 0.23%, Table 1). Within the South Pacific, the diversity indices of southeastern Pacific (i.e. Chilean) samples were similar to the values obtained for Tasmania. Nevertheless, genetic diversity in these two localities was very different from that of New Zealand. Despite accounting for 54% of the samples in this basin, the genetic and nucleotide diversity of pilot whales from New Zealand were the lowest (Table 1), particularly because of the high predominance of one haplotype (93%, haplotype P + U). Such overrepresentation of a single haplotype in a matrilineal species may derive from sampling bias, for example, by sampling a single, large mass stranding event. However, in that study, the sampling period spanned from 1993 to 2007 and included numerous single and mass strandings that took place in various localities 20 . Therefore, we can assume that this diversity accurately reflects what is present in this area. The distinctiveness of New Zealand from Tasmania and Chile is further supported by high ϕ ST values (ϕ ST = 0.421 and 0.342 respectively) ( Table 3), which are four and five times higher than between Chile and Tasmania (ϕ ST = 0.086). The absence of obvious geographic or oceanographic barriers between New Zealand and the other localities in the South Pacific does not allow a simple interpretation of the pattern of genetic structure found here. This genetic differentiation could have been attained through ecological specialization, as previously pointed out 20 . Similarly, sea surface temperature and its influence on prey distribution has been regarded as a possible ecological factor underlying genetic differentiation in extant G. melas in North Atlantic waters 28 and similar trends have been observed on the Scotian shelf 44 .
Differentiation over relatively short distances without any conspicuous geographical barriers has also been detected in other odontocetes. For example, genetic differentiation was detected in the Chilean dolphin Cephalorhynchus eutropia between two differing coastal habitats along the uninterrupted Chilean coastline, attributed to habitat adaptation and specific hunting strategies 45 . Also, the Eastern Pacific Barrier has been proposed as a driver behind the genetic differentiation of the short-finned pilot whale Globicephala macrorhynchus into two subspecies 46 .
The haplotype network of the specimens from New Zealand presents a typical star-like shape (Fig. 5a), suggesting that long-finned pilot whales around New Zealand represent a young population, perhaps tracing back to the Last Glacial Maximum, as suggested by DIYABC analyses. During this period, a 6-10 °C cooling occurred in superficial waters of southeast New Zealand, the strongest temperature drop reported in this area of the southwestern Pacific 47 . Such changes in environmental conditions, probably associated to a shift in the distribution  Table 1.
of marine biota, may have provoked a typical population contraction-expansion in the long-finned pilot whale population in this area, as described for cold-temperate and polar marine species 48 , including cetaceans 49,50 .
A similar case of strong genetic differentiation among populations of long-finned pilot whales is observed between the North Atlantic and Mediterranean populations, where the latter exhibits high phylogeographic and genetic differences with the North Atlantic localities 25 . In this case, geographic and oceanographic discontinuities between the Mediterranean Sea and the Atlantic Ocean provide a robust explanation for the observed genetic structure. The separation of Mediterranean populations from North Atlantic ones has been previously reported in various marine species, such as shallow water crustaceans 51 , sea stars 52 , white sharks 53 and other odontocetes, like in sperm whales 54 , striped dolphins Stenella coeruleoalba 55 as well as Cuvier's beaked whales Ziphius cavirostris 31 and Risso's dolphins Grampus griseus 56 .
Widespread cetacean taxa occurring in both the Northern and Southern Hemispheres generally exhibit strong phylogeographic structure and genetic divergence between regions. Such genetic differentiation has been generally exemplified by fixed substitutions in mtDNA control region sequences. This is the case of the fin whale Balaenoptera physalus, with one fixed difference between South and North Atlantic samples, thus presenting no shared haplotypes 57 , the harbour porpoise Phocoena phocoena, with high divergence among ocean basins 58 and the more closely related false killer whale Pseudorca crassidens 39 . In the latter species, although the study had a low sample size in the North Atlantic, no shared haplotypes were found and at least 10 substitutions separated the Atlantic populations from those in the Indo-Pacific. In the case of G. melas, despite its antitropical distribution and the large geographic discontinuity between northern and southern distribution areas, the subspecies shared their two most abundant haplotypes. This genetic pattern may reflect (1) contemporary gene flow between hemispheres, or alternatively (2) an ancestral polymorphism resulting from an incipient divergence process. The strong phylogeographic structure detected among the subspecies supports the second hypothesis.
The South Pacific network holds much of the species' genetic diversity and is therefore very similar to the global network in overall shape (Fig. 5b). In contrast, the star-like haplotype network of North Atlantic and Mediterranean samples is typically presented by recently expanding populations. A biogeographic scenario of a dispersal event over the equator during previous glacial period has been proposed 22 , with a split in distribution occurred after the Last Glacial Maximum, 10 000-15 000 years ago. This hypothesis was further expanded 20 , mentioning that a trans-equatorial dispersal event rather than vicariance might have taken place, on account of the lower diversity in the Northern Hemisphere subspecies. Thus, the hypothesis is supported by (1) the contrasting haplotype networks (Fig. 5c, d), (2) the higher mitochondrial and microsatellite diversity indices of South Pacific long-finned pilot whales compared to their counterparts in the North Atlantic and Mediterranean (Tables 1  and 2) and (3) the clear separation of areas by hemisphere in the haplotype presence/absence Correspondence Analysis (Fig. 3). The validity of the previously proposed scenario of population history was further explored here with DIYABC population history simulations, comparing a founder effect scenario with a vicariance scenario. The patterns of population genetic diversity and structure observed in G. melas were consistent with the data simulated under the former scenario, a trans-equatorial dispersal event followed by divergence (Fig. 4). Posterior estimation of scenario parameters allowed estimating the time at which the different events occurred, setting the dispersal from the Southern Hemisphere to the North Atlantic at around 13 000 years ago, followed by a population demographic expansion around 9 380 years ago. However, even considering this scenario of divergence, the absence of reciprocal monophyly does not qualify northern and southern G. melas as different Evolutionary Significant Units (ESU) following the criterion of Moritz (1994) 21 . It is likely that not enough time has passed to sort lineages, or some level of gene flow still occurs 59 , as they still share haplotypes, but in differing frequencies 20 . Additionally, net nucleotide divergence d A and Percent Diagnosability (PD) had been previously calculated for the long-finned pilot whale subspecies 60 , obtaining values of d A = 0.00128 and PD = 0.84286. Our calculation of net nucleotide divergence, which included new sequences from the southeastern Pacific and integrated other published sequences from the Northern Hemisphere, resulted in a slightly higher value of d A = 0.00158, which still is below the proposed subspecies interval 61 . Therefore, addressing them as Demographically Independent Populations (DIP), a transitional state between a panmictic population and separate ESUs 59 might be more accurate.
Finally, from a conservation perspective, even if genetic analyses do not support the subspecies category, we recommend maintaining their current taxonomic status, since these DIP might be undergoing a recent divergence process which has yet to mature into fully sorted lineages.

concluding Remarks
The results here presented should be considered as preliminary evidence, as the use of a single mtDNA marker for phylogeographic and demographic inferences has been deemed problematic before 62 . As previously stated, such molecular studies should include nuclear markers together with mitochondrial DNA, even more when delimitating species and subspecies 18 . However, to date, the mtDNA control region is the only molecular marker for which sequences are available with sufficient sample size from different ocean basins to perform a worldwide phylogeographic study in G. melas 26 . To incorporate new genetic markers to a global phylogeographic study would require tremendous sampling effort and expenses, and may also strongly depend on the occurrence of mass strandings in all these regions of the world.
Finally, we believe that collaborative studies surveying uncharted areas, especially within the greater distribution range of long-finned pilot whales in the Southern Hemisphere, are fundamental to obtain complete data on the worldwide phylogeography and taxonomy of G. melas and to lay the groundwork for future research on these topics. Modern genetic tools, such as complete mitogenome and SNPs, have been already used to revise the taxonomic status of short-finned pilot whales 46 , which could be replicated in long-finned pilot whales.

DnA extraction, mitochondrial control region sequencing and microsatellite genotyping.
DNA extractions were performed following a modified salt-extraction protocol 63 , adding a second step of digestion with proteinase K one hour after the first one.
With an exploratory examination of the global haplotype network, it was noted that site 156 of the alignment generated three loops in the network. This hypervariable site was considered to be interfering with the phylogeographic signal of the data and was consequently removed, in order to eliminate a potential homoplasy signal. Additionally, a repeated TA motif starting at position 90 was identified as a possible microsatellite. We modified the sequences by deleting one of the nucleotide positions within each repeat, so each motif was considered as a single mutational step, instead of each nucleotide separately. Thus the final fragment length was of 345 bp.
Genetic diversity and structure. The genetic diversity indices number of haplotypes (h), number of polymorphic sites (S), haplotype diversity (Hd), nucleotide diversity (π) and pairwise differences between sequences (Π) were estimated in Arlequin v3.5.2 67 . Analyses of genetic structure (F ST ), phylogeographic structure (ϕ ST ) and analysis of molecular variance (AMOVA) were conducted in Arlequin v3.5.2 with 1000 permutations and a significance level of 0.05. Phylogeographic structure was also explored with Snn tests of genetic differentiation 68 , performed in DnaSP 5.10.01 69 . For the AMOVA, the ten localities were grouped according to the distribution of each subspecies. A Correspondence Analysis (CA) was performed on all localities with the software Past 3.19 70 , using the matrix of Table 1 in the form of presence/absence of haplotypes.
Additionally, as suggested by the guidelines for the delimitation of cetacean subspecies using genetic data of Taylor et al. 61 , Nei's (1987) net nucleotide divergence (d A , equation 10.21) 71 was calculated among the two putative subspecies in DnaSP. According to these guidelines, the net nucleotide divergence among two subspecies should be within the range of d A = 0.004-0.04.

Historical biogeography.
The population history of the species was tested on the program DIYABC v2.1.0 72 . This software evaluates population histories using Approximate Bayesian Computation (ABC) with genetic data, by testing scenarios built through the combination of population divergence, admixture and population size changes. Two models were evaluated. The first model was defined based on a scenario previously proposed 22 , together with evidence from the genetic results provided in the present study. The model considers www.nature.com/scientificreports www.nature.com/scientificreports/ a trans-equatorial, Last Glacial Maximum (LGM)-associated dispersal event from the Southern Hemisphere to the Northern Hemisphere, followed by a split in distribution and instantaneous population growth (Fig. 6). The alternative model differs from the previous one in that it considers a vicariance event, rather than a founder effect. The program was used to evaluate the accordance of these two scenarios with our genetic data. Priors were set as follows: Effective population size (N e ) of ancient population = 1 000-100 000; Effective population size of founder effect (N f ) = 10-1 000; time of dispersal event t 2 = 10 000-35 000; time of instantaneous population growth t 1 = 2 000-15 000 (with t 2 > t 1 ) and mutation rate u = 1.5 e −7 -1.5 e −8 . In accordance with the recommendations of the authors of the software, we performed 6 000 000 simulations.
Microsatellite data. A total of 19 loci were amplified: 409/470, 464/465 73 , DlrFCB1, DlrFCB6 74 , Ev1, Ev14, EV37 75 , GATA53 76 , GT6, GT51 77 , GT23, GT211, GT509, GT575 78 , MK5, MK9 79 and PPHO131 58 . PCR reactions were done with a Multiplex PCR kit (Qiagen), each reaction containing: 12.5 µL of water, 5 µL of MM2x Buffer and 1 µL of each primer at 10 pM. Between two and four loci with different fluorescent dyes were combined in each reaction. Allele scoring was done with the software GeneMarker v2.6.0 (www.softgenetics.com) with a 500 liz standard. The dataset was tested for scoring errors, allele dropout and null alleles in Micro-Checker v.2.2.3 80 . Observed heterozygosity (H o ), expected heterozygosity (H e ), average number of alleles per locus (nA) and genetic structure (F ST ) were estimated in Genetix v4.05.2 81 . Published microsatellite data on this species was available from Tasmania and New Zealand 27 , NWA, UK 28 , Faroe Islands 23,28 , Iberian Peninsula 23 , northeastern Atlantic, Strait of Gibraltar and the Mediterranean 25 . However, because of the differences in the loci used in these studies, only a partial comparison of genetic diversity could be performed between populations of G. m. melas and G. m. edwardii. Only comparisons of allelic richness could be done, which were carried out in Rundom Pro 1.1 82 with 10 000 randomizations. Comparisons were intra-subspecies among southwestern and southeastern Pacific samples, and inter-subspecies among southwestern Pacific and North Atlantic samples.

Approval.
We confirm that all methods were carried out in accordance with relevant guidelines and regulations. Samples were taken from stranded, deceased animals with permission from the National Fisheries Service (SERNAPESCA, document ID 2016-11-13). All experimental protocols were approved by the Postgraduate Evaluation Committee at the Faculty of Science of the Universidad de Chile. Accession codes. Haplotype R2 (GenBank accession number pending. Submission number #2305260).