Wandering behaviour prevents inter and intra oceanic speciation in a coastal pelagic fish

Small pelagic fishes have the ability to disperse over long distances and may present complex evolutionary histories. Here, Old World Anchovies (OWA) were used as a model system to understand genetic patterns and connectivity of fish between the Atlantic and Pacific basins. We surveyed 16 locations worldwide using mtDNA and 8 microsatellite loci for genetic parameters, and mtDNA (cyt b; 16S) and nuclear (RAG1; RAG2) regions for dating major lineage-splitting events within Engraulidae family. The OWA genetic divergences (0–0.4%) are compatible with intra-specific divergence, showing evidence of both ancient and contemporary admixture between the Pacific and Atlantic populations, enhanced by high asymmetrical migration from the Pacific to the Atlantic. The estimated divergence between Atlantic and Pacific anchovies (0.67 [0.53–0.80] Ma) matches a severe drop of sea temperature during the Günz glacial stage of the Pleistocene. Our results support an alternative evolutionary scenario for the OWA, suggesting a coastal migration along south Asia, Middle East and eastern Africa continental platforms, followed by the colonization of the Atlantic via the Cape of the Good Hope.


Results
Population structure, differentiation and connectivity. Multilocus genotypes from the 16 locations from the Atlantic Ocean and Pacific Ocean were obtained for 462 anchovies (Supplementary Fig. S1). The number of alleles per locus varied from 19 (locus Ee2-91b) to 85 (locus Ee10) over all locations (Supplementary Table S1). Mean allelic richness, standardized for comparison across a minimum common sample size of nine individuals, ranged from 6.9 (Senegal) to 9.3 (USA) in the Atlantic Ocean and from 7.4 to 9.4 in the Pacific Ocean (Table 1). POWSIM analysis indicated at least a 95% probability (based on the proportion of significant chi-squared tests) of detecting an F ST value of as low as 0.001 using the microsatellite data set ( Supplementary Fig. S2). Expected heterozygosity (H E ) varied between 0.797 (Senegal) and 0.894 (USA) in the Atlantic Ocean and between 0.849 and 0.899 in the Pacific Ocean, while the observed heterozygosity (H O ) varied between 0.653 (Guinea-Bissau) and 0.825 (Portugal north) in the Atlantic Ocean and between 0.699 and 0.742 in the Pacific Ocean (Table 1).
The DAPC results show that E. encrasicolus, E. eurystole and E. capensis are closer to each other than to the E. australis and E. japonicus clusters (Fig. 2a). The probability of assignment of individuals to their nominal species (Fig. 2b) shows that E. eurystole and E. capensis individuals were mostly assigned to the E. encrasicolus clusters with probabilities close to 0.8, and E. australis and E. japonicus were mostly assigned to their own clusters also with high probabilities. The a posteriori DAPC analysis (Fig. 2c) shows two clusters, one comprising 167 E. encrasicolus individuals, and all E. eurystole, E. capensis, E. australis and E. japonicus, and another with 157 E. encrasicolus.
Estimates with microsatellite data of the mutation scaled population size parameter were higher in the Pacific Ocean than in the Atlantic Ocean (Θ Atlantic = 11.6 and Θ Pacific = 98.4). Our comparison of candidate models of gene flow between populations, clearly rejects panmixia, and showed that migration from the Pacific to the Atlantic (model 2) fitted our microsatellite data best ( Table 2), but with extremely low gene flow (M = 1.5).
A total of 373 individuals from five OWA putative species and 16 locations were analysed for mitochondrial cyt b gene, yielding 269 haplotypes. Haplotype diversity (h) was generally high, ranging from 0.917 to 1.000 in the eastern Atlantic Ocean (from Norway to South Africa), 0.909 in the western Atlantic, and from 0.709 to 1.000 in the Pacific Ocean (Table 1). Nucleotide diversity (π) was low, ranging from 0.4% (USA) to 1.9% (South Africa) in   Fig. 1 and Table 1). Horizontal and vertical axes are the first two principal components, respectively; (b) scatter plot of a priori-defined nominal species; (c) individuals assigned to their genetic cluster without forcing them into pre-determined groups.
the Atlantic Ocean and from 0.1% to 1.1% in the Pacific Ocean (Table 1). Evolutionary net divergence between OWA putative species varied between 0-0.4% (Table 3). The haplotype network revealed four main clades, two in Atlantic Ocean and two in the Pacific Ocean (Fig. 3). Clades from the two oceanic basins were separated by a minimum of 30-32 mutations, with two individuals from South Africa clustering within the northern Pacific clade. Within the Atlantic Ocean, putative species E. encrasicolus, E. capensis and E. eurystole were not reciprocally monophyletic, but alternatively fit on European anchovy clades A and B. The southern Pacific clade includes haplotypes mostly from E. australis, but also two haplotypes from E. japonicus that are separated nine mutations from the most frequent haplotype of this clade. These likely represent intermediate haplotypes between the two clades. The four clades exhibited different haplotype patterns: E. encrasicolus clade A and E. australis were characterized by multiple star-like radiations with relatively shallow genetic divergences; E. encrasicolus clade B and E. japonicus lacked distinct star patterns and exhibited possibly unsampled or extinct haplotypes (Fig. 3). Pairwise-taxa and pairwise-location F ST , G ST and D EST indicated that genetic differentiation among putative species was low, with the Atlantic putative species (E. encrasicolus, E. capensis and E. eurystole) and the japanese anchovies (E. japonicus) showing the lowest genetic differentiation between sea basins ( Supplementary Fig. S3).
Estimates with mtDNA data of the mutation scaled population size parameter were the same between ocean basins (Θ = 0.1). Our comparison of candidate models of gene flow between populations, clearly rejects panmixia and showed that the full gene flow model (model 1) fitted our mtDNA data best (Table 2), with highly asymmetrical immigration between ocean basins, with the Pacific population providing five times as many immigrants into the Atlantic Ocean than vice-versa (M = 14.5 vs. 2.9, respectively).

Phylogenetic analyses and Divergence Time Estimates.
The resulting topologies inferred from the BI and ML analyses based on the two data sets (with and without the mutation in the codon 368 of the analysed portion of cyt b) were identical ( Fig. 4 and Supplementary Figs S4 and S5). Bayesian (−ln L = 21,088.60) and ML (−ln L = 20,986.77) analyses arrived at similar topologies ( Fig. 4 and Supplementary Fig. S4, respectively) and therefore BI is shown in Supplementary Information (Supplementary Fig. S4). Potential Scale Reduction Factors in the BI analysis were about 1.00 and the average ESS value for all parameters (1385.01) was significantly larger than 200, which indicates convergence of the runs. In ML analysis (Fig. 4) Table 3. Estimates of net evolutionary divergence between putative species of Old World Anchovies (below diagonal) and standard error values (above the diagonal).

Discussion
Genetic divergences detected between Atlantic and Pacific anchovies are more compatible with intraspecific divergence (net divergence = 0.2%; Table 3) and indicate ongoing gene flow between oceanic basins. These results suggest that reproductive isolation between OWA putative species is not complete, and revealed the existence of regional variants or incipient species. The reconstructed genetic patterns within the OWA species complex found in the present study are in agreement with the findings of Whitehead and his colleagues 10 based on morphological characters that consider the OWA as a single species. Regardless of the increased sequence length and the addition of more taxa, the precise origin of the OWA remains uncertain. The colonisation of the Atlantic Ocean likely occurred via South Africa, with anchovies dispersing across the northern Indian Ocean along the continental platforms of South Asia, Middle East and eastern Africa. The Atlantic and Pacific OWA revealed independent demographic histories but with contemporary gene flow. Before addressing the main interpretations and conclusions of these results, two main caveats must be addressed. First, data from the Pacific Ocean rely on only three sampling sites. Second, despite the rare records in the eastern African coast, no intermediate locations were sampled between the Atlantic and the Pacific oceans, thus we may have a restricted representation of the genetic diversity from the Indo-Pacific region. Nevertheless, the results presented in this study provide compelling evidence to set a biogeographical scenario for the OWA. Previous phylogenetic analyses based on mitochondrial data (cyt b: 521 bp) recovered two lineages (clades A and B) within E. encrasicolus that did not group together 13,15,16 . The assumption of paraphyly for this species was uniquely based on a neighbour joining analysis that grouped the clade B with E. japonicus with relatively low statistical support (BP 53%). A more recent study showed that some individuals belonging to the E. encrasicolus mitochondrial clade B were under selection 23 and therefore it is pivotal to understand if selection plays an important role in shaping the evolutionary and demographic history of this complex. All phylogenetic analyses (ML and BI) presented here that included more sequences (55 Engraulidae species) and a larger fragment (1044 bp) than previous studies, returned very similar results, regardless the effect of selection over the analysed portion of the cyt b. Moreover, unlike previous results, our ML analyses retrieved E. encrasicolus as monophyletic (Fig. 4), while the BI results do not reject it ( Supplementary Fig. S4).
Our Bayesian dating analysis estimated that the divergence between the Pacific and the Atlantic OWA occurred during the Pleistocene at 0.67 Ma (Fig. 5). This divergence is more recent than previously estimated 13,16 . Grant and his colleagues 13 used a fixed mutation rate of 1.9%/Ma based on the separation between Cetengraulis edentulus and C. mysticetus determined by the closure of the Panama seaway. We also used the closure of the Panama Isthmus as a calibration event but added two fossil calibrations that were modelled with a lognormal distribution, which may explain the differences between age estimates from both studies.
Our results indicated that the OWA coalesced during the late Pleistocene (Fig. 5). Despite the few observed exceptions, such as those observed e.g. in the genus Gobiodon 25 (Fig. 5), earlier than previously estimated [0.11-0.42 Ma] 24 .
To the best of our knowledge, values of genetic divergence between OWA putative species based on cyt b (0.0-0.4%; Table 3) constitute the lowest percentage of sequence divergence reported for intrageneric fish species 26 . Microsatellite data also showed a high degree of admixture between putative species divided into two clusters (Fig. 2c) that largely coincide with the northern part of the Atlantic Ocean (E. encrasicolus and E. eurystole) and the southeastern Atlantic and Pacific Oceans (E. encrasicolus, E. capensis, E. japonicus and E. australis). The DAPC (Fig. 2) also shows the existence of contemporary gene flow between anchovies from the Atlantic and Pacific basins. Low intraspecific genetic differentiation between individuals from distant locations was also detected on other coastal fish species [27][28][29] . The climate oscillation on relatively short time scales from decades to hundreds of thousands of years promotes shifts in distribution ranges, abundance (ref. 4 and references therein) and cyclical population extinctions and recolonizations 30 that prevent the formation of deep lineages in small pelagics such as sardines and anchovies 31 .
Our results do not unequivocally support the origin of the OWA in the Pacific Ocean as no clear sister group of the OWA emerges from the ML and BI phylogenetic analyses of the cyt b region (Fig. 4 and Supplementary  Fig. S4). Also, we cannot conclude if the Atlantic OWA resulted from a single colonization event or if they are the product of more than one invasion on different timescales 16 given the low statistical support of the node corresponding to the Atlantic MRCA in all phylogenetic trees retrieved with different methods (Fig. 4 and Supplementary Fig. S4). Regardless the number of colonisations, the Pacific anchovies may have reached the Atlantic Ocean by three possible colonization pathways: (1) via Cape Horn in South America, (2) across the Bering strait in the Arctic Ocean, and/or (3) coastal dispersal throughout the Indian Ocean via the Cape of Good Hope in South Africa.
Dispersal through the Cape Horn was unlikely because the OWA do not occur on the coastline of South America and there are no paleontological records in the area of species belonging to this group. Colonization of the Atlantic Ocean through the Bering Strait would only have been possible during an interglacial period when the ice sheets melted and temperatures raised up between 0 and 2 °C. However, the estimated origin of their MRCA occurred at 0.67 Ma (Fig. 5), which coincided with a glacial period. Moreover, OWA from the north Atlantic and north Pacific are not genetically close (Figs 2 and 3; Supplementary Fig. S3). The third hypothetical dispersal route from the Pacific to the Atlantic is via the Cape of the Good Hope in South Africa. There is evidence supporting this alternative route around the South Africa gateway: (1) morphological similarities between Pacific and Atlantic Ocean OWA and (2) the predominant direction of OWA migration is from the Pacific to the Atlantic Ocean (Table 2). This dispersal most likely occurred throughout an interglacial period. The estimated divergence that occurred at 0.67 Ma (OWA MRCA; Fig. 5) between anchovies from the Indian and the Atlantic waters was probably promoted by the Günz glacial period that prevented dispersal between the two ocean basins. Studies on the evolution of the OWA performed thus far already suggested this route as the most likely for the Atlantic colonisation 15,16,31 . Grant and his colleagues 15,16 postulated that anchovies used this migration path through open-ocean, via temperate Indian Ocean and South Africa. These authors considered that this route would be the only possible colonisation pathway for temperate-water species as occurs with other small pelagic fish (e.g. sardines) 31 , given the high sea surface temperatures (SST's) in the coastlines of eastern Africa and southern Asia. Until now, few records of OWA were reported in warmer habitats 10 , but recent studies indicated that higher SST's are not a limiting factor for the OWA, as they can be found in tropical waters both in the Atlantic and Pacific oceans 10,14,23 . The colonisation pathway proposed by Grant and Bowen 16 would imply more than 8000 km of open-ocean migration between the western Australia and South Africa without any stepping-stones. Anchovies are coastal pelagic fishes that live in average up to three years and it would be extremely unlikely that anchovies survived to a trans-oceanic migration of this magnitude in a single generation with no stopovers 32 . Open-ocean areas usually exhibit low productivity and scarce food resources 33 , which would create additional difficulties to large-scale migrations. Moreover, anchovies from Australia and southeastern Atlantic are not closely related in any of the phylogenetic analyses performed thus far (Figs 2 and 3) 15,16 .
Alternatively, we propose a dispersal route of the OWA from the Pacific to the Atlantic across the continental platforms of the northern Indian Ocean, eastern African coast and South Africa, impelled by the North Equatorial Scientific RepoRts | 7: 2893 | DOI:10.1038/s41598-017-02945-0 and Agulhas Currents (Supplementary Fig. S6). The existence of shared microsatellite alleles (Fig. 2) and haplotypes (Fig. 3) between Atlantic and Pacific OWA revealed by our data, points to the existence of recurrent migration events and contemporary gene flow between ocean basins. We propose that the Atlantic colonisers were likely seeded from an Indo-Pacific pool that could have included both north and south Pacific anchovy ancestors, departing from Philippines and Indonesia, using Somalia, Mauritius and Seychelles as stepping-stones 10 . Pleistocene colonisations from the Pacific to the Atlantic through the Cape of the Good Hope were inferred for other coastal fish, mostly for tropical species (e.g. ref. 32). Trans-oceanic dispersals of tropical fish between the Indian and the Atlantic oceans are generally conditioned by the cold Benguela upwelling in the western South Africa, but fluctuations on the intensity of this current along the Pleistocene allowed punctuated episodes of dispersal 34 . Apparently, trans-oceanic dispersal of OWA from the Indian to the Atlantic Oceans is not limited by the Agulhas and Angola warm currents, nor the Benguela cold current due to the apparent wide temperature range tolerance of anchovy species. Climatic fluctuations of mid/late Quaternary are likely to have contributed to anchovies demographic expansions/contractions as seen in other species (ref. 4 and references therein) leading to extinction-colonisation cycles at the extreme of the distribution areas and to population connectivity/ differentiation.

Conclusions
Comprehensive information on coastal fish genetic diversity and dispersal provides a wide-scale perspective of marine species connectivity and evolution. The results of our survey of the OWA complex Engraulis spp. here highlight that (i) coastal fish may disperse along large distances and frequently cross biogeographic barriers while maintaining high levels of genetic diversity and low genetic differentiation among populations; (ii) the dispersal route from the Pacific to the Atlantic Ocean is shared by other coastal fish species (e.g. ref. 32), increasing the support for a common model of intra-specific long distance dispersal; (iii) the permeability of biogeographical barriers depends on the relaxation of environmental conditions of ocean fronts; (iv) differentiation patterns depend on an intricate relationship between common geographic distances among populations, main circulation patterns, punctuated effectiveness of the biogeographic soft barriers promoted by climate oscillations and on ecological and biological traits at intraspecific level.

Material and Methods
Ethics statement. No specific permits were required for the field studies described here, since fish were purchased at fish markets or were collected on scientific cruises with fishing procedures. We confirm that the study locations were not privately owned or protected, and the field sampling activities did not involve endangered or protected species beyond the focal species.
Sample collection, DNA extraction and PCR amplification. Samples of OWA likely representing five putative species were collected from 16 locations from both Atlantic and Pacific oceans ( Fig. 1 and Table 1). The identification of the sampled specimens was based on their geographical origin, given the lack of morphological differentiation and diagnosing characters between putative species 10 . Fish were purchased at small coastal fish markets, as artisanal fishermen do not venture far, or were collected on scientific cruises (see acknowledgements). A small portion of white muscle or fin was preserved in 96% ethanol and stored at −20 °C. Total DNA extraction, polymerase chain reaction (PCR), purification of the PCR product, sequencing for a fragment of the mitochondrial cyt b (1044 bp) and microsatellite genotyping of eight loci were performed as described in Silva et al. 14 . Sequences of the nuclear recombination-activating genes RAG1 (1480 bp) and RAG2 (1221 bp) and of the mitochondrial 16S rRNA (800 bp) from OWA putative species used in the Bayesian dating analysis were obtained as described in Bloom & Lovejoy 35 . Sequence alignment, population structure, differentiation and connectivity. To determine contemporary genetic structuring and individual assignments based on the autosomal microsatellite data set, we used discriminate analysis of principal components (DAPC) implemented in the adegenet package 36 of R 2.15.3 37 and following Silva et al. 14 . DAPC was chosen over Bayesian clustering methods because this method is model free does not assuming Hardy-Weinberg equilibrium or linkage disequilibrium being more appropriate for situations where such assumptions are not met, as is often the case with anchovies 38 .
The program POWSIM 4.0 39 was used to evaluate statistical power for detecting pairwise genetic differentiation at F ST levels ranging from 0.00 to 0.10. We simulated the divergence of three subpopulations, corresponding to (1) European anchovy (E. encrasicolus, E. capensis and E. eurystole), (2) E. japonicus and (3) E. australis, from a single ancestral population through genetic drift to a given overall F ST value defined by controlling effective population size (Ne) and number of generations (t). To best reflect the assumingly large Ne of anchovy, we let Ne = 10 000 and varied t from 0 to 2078 for simulating different levels of differentiation. After the simulation, each subpopulation was sampled at n = 381 and divergence from genetic homogeneity was tested with χ-exact test. This procedure was repeated 100 times and the proportion of significant outcomes was used to estimate statistical power for detecting pairwise genetic differentiation.
Cyt b sequences were aligned using ClustalX 2.0.3 with default settings, implemented in Geneious 5.4 40 , checked and trimmed manually. Sequences were reduced to haplotypes using Collapse 1. Composite Likelihood model. The rate variation among sites was modelled with a gamma distribution (shape parameter = 1.48).
To examine the relationship between mitochondrial haplotypes, a minimum spanning network was constructed with Arlequin 3.5.1.2 42 and visualized with Hapstar 45 . Pairwise genetic differentiation was estimated with G st_est 46 and Jost's D est value 47 , both within and between putative species, following Pennings et al. 48 for mtDNA and using the R package Diversity 49 for microsatellites.
Estimation of migration rates. We used the coalescent-based program Migrate-N 50, 51 to compare different biogeographic hypotheses for the past and present migration of OWA between the Atlantic and the Pacific oceans. We conducted the analyses with two sets of data, mitochondrial DNA and microsatellites, structured into two groups according to geographical regions: southern Atlantic Ocean (pooled southern locations Senegal, Guinea-Bissau, Ghana, Namibia, Angola and South Africa) and Pacific Ocean (Japan, Australia and New Zealand). We tested four variations of the two-population (Atlantic-Pacific) migration model: bidirectional migration (full model, four parameters), strict Atlantic to Pacific migration (three parameters), strict Pacific to Atlantic migration (three parameters) and panmictic model that assumes the Atlantic and Pacific are part of a panmictic population (one parameter). Testing the directionality of gene flow is justified because the dominant ocean current between the ocean basins, the Agulhas flow, runs westerly from the Indian to the Atlantic Ocean and is thought to play a limiting role in marine dispersal in the opposite direction 52,53 . Initial values were calculated using F ST . Mutation rates were set to be constant among loci. The Migrate-N run parameters were calibrated on the full model for convergence of the Markov chain Monte Carlo sampling method. The prior distributions were uniform for mutation-scaled population size parameters theta (θ), that are four times the product of the effective population size and the mutation rate, and mutation-scaled migration rates M, that is, immigration rate scaled by the mutation rate, over the range of θ = 0.05-0.5 and prior migration rate M = 0-100 for mtDNA, and θ = 10-100 and M = 0-50 for microsatellites. These settings resulted in converged posterior distributions with a clear maximum for each estimate. The Bayesian run for mtDNA consisted of one long chain with a total of 15 million states visited and 50,000 states recorded for the generation of posterior distribution histograms for each locus after discarding the first 10,000 genealogies as burn-in; for all loci, a total of 48 million states were visited and 160,000 samples were recorded. For all the analyses we used an adaptive heating scheme with four simultaneous chains using different acceptance ratios (temperature settings were 1.0; 1.5; 3.0; 1 × 10 6 ); the analyses were run on a cluster computer using 4 compute nodes per run. The Bayesian run for microsatellites consisted of one long chain with a total of 6 million states were visited and 20,000 states were recorded for the generation of posterior distribution histograms for each locus after discarding the first 10,000 genealogies as burn-in; for all loci, a total of 48 million states were visited and 160,000 samples were recorded. For all the analyses we used an adaptive heating scheme with four simultaneous chains using different acceptance ratios (temperature settings were 1.0; 1.5; 3.0; 1,000,000.0); the analyses were run on a cluster computer using four compute nodes per run. Overall loci information was combined into a single estimate by Bézier approximation of the thermodynamic scores as described by Beerli & Palczewski 54 . We averaged the Bézier score over three different runs and used as input to evaluate multiple models using Bayes factors 55 . Phylogenetic and dating analyses. Phylogenetic relationships within Engraulidae were based on a fragment of the mitochondrial cyt b gene (121 taxa, corresponding to 55 Engraulidae species; 1044 bp). At least one representative species of Engraulidae per genus (except Papuengraulis) and nine randomly chosen specimens from each of the five currently recognized OWA species were included in the phylogenetic analyses, with the exception of E. capensis from which only four specimens were available (accession numbers in Supplementary  Table S2). According to Lavoué et al. 56 , we selected the following outgroup species: Chirocentrus dorab, Clupea harengus, Denticeps clupeoides, Ilisha africana, Sardina pilchardus, Sundasalanx mekongensis. The Akaike Information Criterion (AIC) 57 implemented in Modeltest 3.7 58 , selected the GTR+I+Γ as the evolutionary model that best-fitted the data set. The inferred parameters were used in maximum likelihood (ML) and Bayesian Inference (BI) analyses. BI analyses were conducted with MrBayes 3.2.1 59 . Metropolis-coupled Markov chain Monte Carlo (MCMC) analyses were ran for 20,000,000 generations with sample frequency of 2000. Final trees were calculated after a burnin of 1,000 generations. PhyML 3.0 60 was used to estimate the ML tree and to test by non-parametric bootstrapping the robustness of the inferred trees using 1,000 pseudoreplicates.
Previous work 15, 16 recovered E. encrasicolus as paraphyletic (specimens assigned to the species grouped into two clades that did not group together). To test if natural selection could interfere with phylogenetic inference, we performed all phylogenetic analyses using the above data set that only included individuals that do not show any evidence of being under selection and repeated the procedures using another data set (116 taxa; 1044 bp) that included individuals presenting a mutation in codon 368 of the cyt b as identified in Silva et al. 23 .
To estimate the OWA origin and date lineage-splitting events within Engraulidae we used a Bayesian relaxed molecular-clock approach as implemented in Beast 2.1.3 61 based on a concatenated dataset of four partial fragments of mtDNA (cyt b: 1131 bp; 16S: 800 bp) and nuclear (RAG1: 1480 bp; RAG2: 1221 bp) genes. We included sequence data of 49 Engraulidae lineages/taxa 35 , likely representing 14 genera (out of 16: the monospecific Lycothrissa and Papuengraulis genera are not represented), from which 5 are OWA (accession numbers in Supplementary Table S3). According to our ML and BI analyses, E. capensis and E. eurystole are conspecific with E. encrasicolus, and two clades (hereafter clade A and clade B) were recovered within the latter. Hence, to perform the dating analysis we only selected a single representative of E. encrasicolus from clade A and two from clade B, both without the mutation at codon 368. We used the BirthDeath model for the tree prior that assumes that at any point in time, every lineage undergoes speciation at rate λ or goes extinct at rate μ 62 , and three calibration points. One refers to the earliest record of Engraulidae [6][7][8][9][10][11][12] million years (Ma) from the Miocene -Lower Pliocene of Cyprus 63 . The second calibration corresponds to age estimated for the divergence between Anchovia clupeoides and A. macrolepidota [2.8-3.1] Ma due to the closure of the Panama seaway 13,64 . The third calibration corresponds to E. japonicus  Ma from Kokubu group, Japan 65 . Calibrations using the two fossils were modelled with a lognormal distribution, where 95% of the prior weight fell within the geological interval in which each fossil was discovered. For the Engraulidae  Ma, the parameters of the lognormal calibration prior were: 95% interval: mean in real space: 1.4, offset: 6.0 and log stdev: 1.0. For E. japonicus  Ma, the parameters of the lognormal calibration prior were: 95% interval: mean in real space: 0.465, offset: 0 and log stdev: 1.0. For the divergence between Anchovia clupeoides and A. macrolepidota we used a calibration according to Lessios et al. 64 where the closure of the isthmus of Panama occurred between 3.1-2.8 Ma. Lognormal calibration was set to: 95% interval: mean in real space: 0.071, offset: 2.8 and log stdev: 1.0. MCMC analyses were run for 20,000,000 generations with a sample frequency of 20,000, following a discarded burn-in of 2,000,000 steps. The convergence to the stationary distributions was confirmed by inspection of the MCMC samples using Tracer 1.6 66 .
ML, BI and dating analyses were performed on the R2C2 research group cluster facility provided by the IT Department of the University of Algarve.
Data Accessibility. Sequences of the mitochondrial cyt b (1044 bp) were deposited in GenBank (accession numbers: JQ716609-JQ716731, JQ716748-JQ716756, JX683020-JX683113, KF601435-KF601478 and KJ007642-KJ007734). Sequences of the nuclear recombination-activating genes RAG1 and RAG2 and of the mitochondrial 16S rRNA used in the Bayesian dating analysis were deposited in GenBank (accession numbers: KX824115-KX824124). The accession numbers of the sequences retrieved from GenBank corresponding to the remaining Engraulidae specimens used in all phylogenetic analyses are indicated in Tables S2 and S3 (SI). Genotypes of individuals obtained from microsatellites are deposited in Dryad repository (http://dx.doi. org/10.5061/dryad.r4f8v).