Against all odds: a tale of marine range expansion with maintenance of extremely high genetic diversity

The displacement of species from equatorial latitudes to temperate locations following the increase in sea surface temperatures is among the significant reported consequences of climate change. Shifts in the distributional ranges of species result in fish communities tropicalisation, i.e., high latitude colonisations by typically low latitude distribution species. These movements create new interactions between species and new trophic assemblages. The Senegal seabream, Diplodus bellottii, may be used as a model to understand the population genetics of these invasions. In the last decades, this species has undergone an outstanding range expansion from its African area of origin to the Atlantic coast of the Iberian Peninsula, where now occurs abundantly. Mitochondrial and nuclear markers revealed a striking high haplotypic nucleotide and genetic diversity values, along with significant population differentiation throughout the present-day geographical range of the Senegal seabream. These results are not consistent with the central-marginal hypothesis, nor with the expectations of a leptokurtic distribution of individuals, as D. bellottii seems to be able to retain exceptional levels of diversity in marginal and recently colonised areas. We discuss possible causes for hyperdiversity and lack of geographical structure and subsequent implications for fisheries.

www.nature.com/scientificreports/ Among fish species of the northeast Atlantic, we can find the co-existence of: (1) panmictic populations without latitudinal variation in genetic diversity (e.g. Lipophrys pholis 15 ); (2) significant population structure with similar levels of genetic diversity throughout the entire species range (e.g. Taurulus bubalis 16 ); and (3) sharp decline of genetic diversity from south (west European) to northern (Scandinavian) populations (e.g. Pomatoschistus microps 17 ; Symphodus melops 18 ; Labrus bergylta 19 ). It is less common the observation of geographical range expansion of marine species without loss of genetic diversity but, for examples, see 16,20,21 . Contemporary and increasingly visible tropicalization will contribute to more records of fast range expansion events. The Senegal seabream, Diplodus bellottii Steindachner, 1882 (Sparidae), displayed an outstanding range expansion from its endemic origin (Senegal to Cape Blanco in Mauritania 22 ) to the Atlantic coast of the Iberian Peninsula within a decade (see Fig. 1 for specific dates). There are no records from the archipelagos of Madeira, Canaries or Cape Verde 23 . However, its presence is reported in Mauritania 24 and Morocco 25 during the late 60's. In the 1970's, the species was consistently recorded in the Iberian Peninsula, first in Cadiz 26  Color-coded sampled locations over-layed on the consensus current modeled probability of occurrence distribution (see "Material and methods" section). Note: The original estimated suitability value was divided by 1000 to convert the suitability value into a probability of occurrence. As a rule of thumb, sites with suitability higher than 0.5 predict presence, while sites with suitability lower than 0.5 indicate absence. Figure generated using the Biomod2 package (https ://cran.r-proje ct.org/web/packa ges/biomo d2) implemented in the R 32  www.nature.com/scientificreports/ moving progressively towards North to Sado and Tagus (1995) estuaries 28 , and Ria de Aveiro in the north Atlantic coast of Portugal 29 (Fig. 1). These movements are thought to result from the sea temperature rise 30 . The Tagus estuary experienced a major increase in the abundance of the Senegal seabream between 1979-1981 and 1995-1997 28 . Presently, the species is a common bycatch species of the artisanal fisheries around the Lisbon area 31 .
The species is therefore a good candidate to illustrate the effect of tropicalisation on the genetic make-up of the species, and on the relationship between colonisation success and genetic diversity, without the contamination of past recolonisation events. Diplodus bellottii breeding is synchronous and occurs mostly from April to June and sexual maturity is reached at 1 year old for females and two for males 31 . The same study analysed the otoliths of this species and found that the oldest specimens were 9 years old, corresponding to females with at least seven reproductive seasons 31 .
In this study, we used the documented expansion of D. bellottii as an opportunity to assess putative founder effects by looking for evidence of reductions in the number of mitochondrial or nuclear singleton haplotypes (haplotypes seen only once in a sample, i.e. an unshared haplotype) coherent with a predictable loss of overall genetic diversity. We hypothesise that similarly to several other marine fish species in this area of the Northeastern Atlantic such as Halobatrachus didactylus 33 , Symphodus melops 34 or Labrus bergylta 19 , the recent observed range expansion will lead to a substantial loss of diversity. Also, the expanding low-density leading edge is by definition constituted by edge patches, which are most typically affected by founder events 35 , where demographic and allelic composition stochasticity 36 is predictably stronger. If no significant reduction in singleton haplotypes is observed, and no differences among haplotype frequencies are found, there would be no evidence for a founder event or genetic bottleneck and the species may be panmictic. Alternatively, if no significant reduction in singleton haplotypes is observed but there are differences among haplotype frequencies, the species may be genetically structured. This study is the first appraisal on the genetic variability of this species across its range, a key information to understand its astonishing northward expansion. Additionally, we used species distribution modelling with present-day conditions to evaluate suitable areas for potential colonization, and explored whether the sampling coverage allowed an effective representation of the genetic diversity of the species.

Results
A total of 357 base pairs of the mitochondrial control region were sequenced for 124 individuals, defining 118 haplotypes and 548 base pairs of the S7 first intron were resolved for 96 individuals, defining 69 haplotypes, from five sampling locations (Fig. 1). Haplotype and nucleotide diversities were high in all locations (Table 1), except in Cadiz where diversities were low. Haplotype diversities were close to 1.000 (h CR = 0.997 to 1.000 and h S7 = 0.960 to 1.000) and nucleotide diversities were high, ranging from πCR = 0.4% (Cadiz) to 10.4% (Mauritania) and πS7 = 0.0% (Cadiz) to 2.7% (Mauritania) with πCR = 7.9 and πS7 = 0.1% for the entire data set (Table 1).
In the entire data set the three estimates of fixation G ST , G′ ST (Table 2), with G ST and θ revealing largely non-significant comparisons, and G′ ST and D showing all comparisons statistically significant. This means that haplotypes are usually distinct among the five sampling sites, with complete haplotypic differentiation (D = 1) involving four pairs of sampling sites in CR and six in S7, including locations that are not the most distant ones, such as Algarve and Cadiz.
The haplotype accumulation curves failed to reach the asymptote for any of the markers (Fig. 2), indicating that only part of the actual genetic diversity was captured. Both markers had largely overlapping haplotype rarefaction curves and confidence intervals, although the Chao 1 estimated total diversity for the CR region of 690 haplotypes and 206 for the S7.
The main characteristic exhibited by the two markers' haplotype networks is their bush-like complexity, with no central or otherwise extremely abundant haplotype. The networks displayed a large number of singleton haplotypes and very few shared haplotypes among sampling sites (coloured circles), a pattern characteristic of DNA hyperdiversity (Fig. 3). The CR region had only five shared haplotypes, with the most frequent haplotype Table 1. Sample locations, sample abbreviations, collection dates, sample sizes and summary statistics for one mitochondrial fragment of mtDNA control region (CR) and first intron of the S7 ribosomal protein (S7) of Diplodus bellottii. Code map and figure location codes, Lon longitude, Lat latitude, N number of individuals, nh number of haplotypes, np number of private haplotypes, ns number of singletons, np/h the proportion of private haplotypes, h haplotype diversity, π nucleotide diversity. www.nature.com/scientificreports/ shared among Mauritania, Cadiz and Lisbon, and the others between Lisbon-Sesimbra (the closest locations), Sesimbra-Algarve, Sesimbra-Cadiz and Lisbon-Cadiz. The S7 intron has seven shared haplotypes, four between Lisbon and Sesimbra (the closest locations), and one between Lisbon and Algarve, Lisbon and Mauritania, and Sesimbra and Mauritania. The neighbour-net network ( Fig. 4) showed two lineages diverging from an unresolved polytomy, as represented by the box-like appearance of the internal branches, evident in the CR less so in the S7 network. The two lineages occurred at all sampling sites, but the χ2 test of the total frequencies of individual membership per sampling site yielded significant results only for the CR (χ2 = 10.31, p = 0.036) and not for the S7 (χ2 = 5.70, p = 0.222).
One of the main features of the BIOMOD2 software is the ability to combine predictions made in single models in an ensemble. The species distribution modelling produced six models (ANN, FDA, GBM, GLM, MARS, and RF) that can be considered good to excellent while CTA, Maxent and SRE have a poor fit with TSS and ROC values under 0.7 (Fig. 5). Among the best performance models were RF (ROC = 0.950, TSS = 0.826) and GBM (ROC = 0.951, TSS = 0.822).
The percentage of the correctly predicted presence (sensitivity) ranged from 65 to 92% considering all models individually, with the highest value for the GBM model and the lowest for SRE (Fig. 5). Specificity (i.e., the % of absences correctly predicted) ranged between 79 and 90%, with the highest value for the MAXENT.Phillips model and the lowest for ANN. The visual assessment showed that the predictions of the consensus model reflected realistic spatial patterns of distribution of D. bellottii without apparent anomalies (Fig. 1). The highest probabilities of occurrence were in the Mauritania southern coast and Morocco, with discontinuous distribution patches, in contrast with south Iberia, that is associated with areas of high habitat suitability, and the Lisbon area. The overall model did not predict suitable habitats north of the Lisbon region, although the species is present in Aveiro, and predicted the area corresponding to the west Alboran Sea as a suitable habitat.

Discussion
Remarkably high haplotype and nucleotide genetic diversities, and significant population differentiation were consistently found throughout the present-day geographical range of the Senegal seabream. These results are not consistent with the central-marginal hypothesis nor with the expectations of a leptokurtic distribution, and the Senegal seabream seems to be able to maintain very high levels of diversity in marginal and recently-colonised areas. Before dissecting these results, it is appropriate to address three main caveats regarding this work. Firstly, sample sizes are small, considering the observed hypervariability of the mitochondrial control region and even smaller in the nuclear S7, as many individuals failed to amplify. This limitation constrains the breadth of the discussion, and results are therefore interpreted with caution. Moreover, no inferences on dispersal patterns can be made from the genetic data here presented. Secondly, there is a geographic sampling gap between Mauritania and Cadiz, where the species may be present, albeit in small numbers. We could not obtain any samples from that region, and species distribution modelling may be affected by that gap. Thirdly, the small number of existing    www.nature.com/scientificreports/ geo-referenced presence records limited the use of more than four environmental predictors to be included in the consensus prediction D. bellottii potential distribution model.

Hyperdiversity in Diplodus bellottii.
Benthopelagic fish such as D. bellottii usually display genetic diversity levels typically lower than pelagic species, due to differences in population sizes (e.g. 40 ). Nevertheless, the diversity levels recorded in D. bellottii were similar to values observed in widely distributed pelagic species such as sardines (e.g. 41 ) or other fish species with larger population sizes (see Table 1 in 42 ). Interestingly, in the congeneric D. vulgaris 43   www.nature.com/scientificreports/ diversity (1.8% and 4.1%) are also very high. Contrariwise, in another species of Diplodus, D. sargus, diversity measures were much lower (h = 0.700 and π = 0.8%) 13 . The differences between D. puntazzo and D. sargus genetic diversity values were attributed to a suite of fortuitous occurrences such as extinction and colonisation rather than to ecological differences between the two species 13 . The results of the four statistics (G ST , G' ST , θ, and D) were very different with G ST ; and θ, showing few pairwise significant differences, and G' ST , and D returning all pairwise comparisons significant. It is important to refer that both θ and G ST suffer from the fact that this dataset is hypervariable, therefore restricting these estimators to values close to zero and overestimating gene flow (for a review see 44 ). On the other hand, both G' ST and D, because they measure differentiation as the fraction of a deme's population that consists of private or near-private alleles 45 , will always be significantly different from zero. The hypervariable mtDNA in D. bellottii leads to a lack of shared haplotypes (with rare exceptions) among locations (Supplementary Table S1), but this does not necessarily reflect population genetic differentiation by fixation of alternative haplotypes as previously reported by 46 . This result may be explained by a single or a combination of small sampling sizes and very high mutation rates. Also, effective population size and demographic stability can be contributing factors, not in the recent colonised locations which may still be locally expanding in numbers, but in the putative source, the most southern location. We have shown that with the number of singletons present, to capture all the haplotypes in the species we would need to elevate the sampling to almost 700 individuals (see Fig. 2) which makes the first explanation plausible but does not rule out the other ones.
The high diversity recorded in D. bellottii is mainly due to a large proportion of singletons (91% and 74%, for CR and S7, respectively) and very few among-location shared haplotypes, 4% and 13%, for CR and S7, respectively. This fact makes all locations virtually significantly different from one another, with statistically significant values of D ranging between 0.847-1.000 and 0.849-1.000 for CR and S7, respectively. The expanding low-density leading edge is by definition constituted by edge patches, which are typically affected by founder events 35 , where demographic 36 and allelic composition stochasticity are predictably stronger. However, alleles present in the leading edge patches, even if absent from other marginal locations are frequently shared with sites at the centre of the distribution 47 , which is not observed in our results.
The extremely high level of genetic diversity found across the entire distribution of D. bellottii and the paucity of shared haplotypes throughout all the sampled locations, including the native southern site, place our findings in apparent conflict with "southern richness and northern purity" 11 and the peripheral populations ("centralmarginal hypothesis" 12 ) hypotheses. Non-expected high diversity was also found in the leading edge population of other species (e.g. Taurulus bubalis) suggesting that marine organisms with a long planktonic stage and high numbers of colonisers can travel long distances and export their diversity to the newly colonised areas 16 . In these cases, bottlenecks may not occur because populations are not close to the carrying capacity of the habitats. Diplodus bellottii is a prolific species at least in its southern distribution limit 48 and it may have sufficiently large numbers of individuals to colonise in significantly large waves of individuals.
possible causes of hyperdiversity. The hyperdiversity displayed by both CR and S7 markers does not automatically suggest restricted gene flow and/or population genetic differentiation. The sampling size did not capture the entire genetic diversity of the species as shown by the rarefaction curve, and did not provide a reasonable number of shared haplotypes. Moreover, abnormally high mutation rates may obscure the signal of gene flow 49 . The haplotype network (Fig. 2) with many unsampled haplotypes and haplotype accumulation curves (Fig. 4) indicate an acute subsampling of haplotypes, despite sampling sizes ranging from 11 to 55 per site, in a total of 124 specimens. As the expected diversity is ca. 690 haplotypes for the CR region and ca. 206 for the S7, many more individuals would be needed to capture a representative level of the genetic diversity of the species. It is therefore plausible that with increased sample sizes, one could detect more haplotypes in more southern locations, such as Mauritania, and shared haplotypes between these locations and the more recent colonised northern sites.
High genetic diversity in CR may have several explanations. The most plausible hypotheses involve: (1) the mutation rate of the fragment, (2) the evolutionary rates hypothesis, or (3) the metabolic rate theory (e.g. 50 ). The mitochondrial control region is a high-mutation region organized in hypervariable regions flanking a central conserved region and is considered the most variant non-coding gene in fishes (e.g. for groupers 51 ). This mitochondrial region is a marker of choice in phylogeographic and connectivity studies (but on the use of hyperdiverse mtDNA for population differentiation see 46 ). The gastropod Melarhaphe neritoides revealed hypervariable mitochondrial DNA markers (16S, COI and Cytb) as a consequence of high mutation rates 49 . The second hypothesis used to explain high mutation rates in tropical areas (D. bellottii is considered a subtropical fish species), suggests that temperature and UV tend to generally increase mutation rates in the mtDNA of vertebrates 52 . The third hypothesis, the metabolic rate theory, posits that smaller animals accumulate mutations quicker than large animals 53,54 . However, it has been shown in the cold-water blue crab Callinectes sapidus, that in each recruitment event, the genetic diversity could be just as high as the one found in the adult population, although with different composition 55 . This would be exactly the opposite of a 'sweepstakes effect, ' in which a small proportion of the gene pool contributes to the recruitment of the population 56 , therefore promoting reduced genetic diversity of populations. This way, in each seasonal recruitment, a new combination of alleles may constitute the new generation, which could help to explain the high genetic diversity in this blue crab species, even in populations near the limits of the species range 55 . Indeed, a similar pattern was found in two fish species, Lipophrys pholis and Atherina presbyter, with contrasting life-history patterns in a nursery rocky coastal area in Portugal 57 . In L. pholis only two out of 171 haplotypes were shared between three sampling periods, and in A. presbyter 25 haplotypes out of 155 were shared, which reveals an increase of private alleles in the gene pool over the years, even without a significant corresponding increase in diversity indices. www.nature.com/scientificreports/ Geographic structure. Despite the results of this study in which all sampling sites are significantly different from each other, no geographical associated structure could be inferred from the networks. Lack of geographic genetic structure can be rooted in large effective population sizes, high potential for dispersal and weak physical barriers to gene flow 58,59 , all of which possible in D. bellottii. Regarding population sizes, some sparids present low effective size (e.g. Pagrus auratus, 60 ), while other species present contrasting high values (as the congeneric D. sargus 61 ). While the dispersal distance is often used as a function of the pelagic larval duration (PLD) 62,63 , there is ample controversy on the subject, with studies finding evidence for uncorrelated individuals' PLDs and net distance travelled 64,65 . Although the exact PLD for our species is unknown, it is expected to be around 17 days, value reported for other Diplodus species (e.g. 66 ); this value may however vary with latitude (a proxy of sea temperature). Finally, there are no apparent physical barriers between sampled locations, which may promote considerable admixture among individuals from different origins. However, the species distribution modelling evidenced stretches of coast along northwestern Africa that seem to be an unsuitable habitat for the species, which could theoretically foster some level of geographical isolation. Apparently, D. bellottii overreaches those soft barriers keeping its genetic diversity undisturbed. The concept of "spatial selection" includes the idea that more available resources coupled with less competition may lead to spatial allele sorting and increased growth at the invasion front, with the corresponding increase in reproduction 67 . Spatial selection would favour dispersal, and since both dispersal and reproductive rate limit invasion (colonisation) speed, high reproductive rates promote faster range expansions. Because D. bellottii is under thermal stress at 30 °C 68 , a scenario of spatial selection in the most southern range is rather probable. The reduced tolerance to higher temperatures, coupled with the species preference for estuaries with large areas and high volumes but low depth nursery grounds, means that an increase in sea surface temperatures may result in habitat loss for juveniles, as the estuaries of southern Europe may get too warm. During the summer of 2006, D. bellottii was first detected at 40° N (north of Portugal; 69 ) showing the quick continued northward expansion of this species. Although the results from the species distribution modelling do not show a particular affinity of this species for areas north to Lisbon, the continued raising of sea surface temperature will make these areas suitable for continual range expansion of D. bellottii. The impacts of this species in the recently colonised ecosystems, mainly in what concerns interactions with other species occupying the same ecological niche, has not been fully evaluated. The spatial overlap of D. bellottii with other sparid species is not consistent, across the literature, ranging from 47% 69 to 94% 30 . These large differences may be explained by the dynamics of the estuaries, which suffer major changes in their hydrological features across the years. A noteworthy observation is that the spatial niche overlap between D. bellottii and D. vulgaris has been shown to increase with latitude 69 . origin of the two groups. The origin of the two groups (Fig. 4) is debatable and somewhat challenging to explain. When an isolation scenario is plausible, the existence of sister groups is easy to envisage. However, it is difficult to propose an allopatric (or peripatric) origin for the geographically co-distributed groups. Their distribution encompasses a continuous coastline with an apparent lack of geographical hard or soft barriers. Moreover, the observed northwards expansion of the two groups makes it difficult to conclude that they have dispersed independently along the Atlantic coastline. Without isolation, different lineages in a continuously distributed coast can arise in a large panmictic population with a long history 70 . Sometimes it is hard to realize that any particular genealogy is the single outcome of an infinite number of equally possible genealogies resulting from stochastic outcome 71 . future work and management implications. This species is an important commercial species in Mauritania, but its smaller length in the Lisbon region 31 , and the existence of more valuable commercial species, has not made the Senegal seabream a main fisheries target. In Lisbon, the species is mostly a bycatch of a traditional shallow artisanal fishery 31 and for official statistics, D. bellottii is merely recorded as Diplodus sp. The fact of this species may constitute a future target fishery should prompt an adequate biological survey to assist its proper management. The unexpected pattern of very high diversity in recently-colonised areas within a climate change scenario, deserves further investigation and this species can serve as a useful model for studying the impact of a changing environment on the genetic diversity of a coastal species. Surveys need to be implemented, ecological implications evaluated, and a genomic temporal approach assessed to understand the causes for a successful colonisation without the loss of genetic diversity levels.

Materials and methods
Sampling. Adult specimens of D. bellotti were collected from five locations along its distributional range in the northeastern Atlantic ( Fig. 1 and Table 1). These sites included: Mauritania, MAU; Cadiz, CAD in South Spain; Algarve, ALG in South Portugal, and Sesimbra, SES and Lisbon, LIS in Central Portugal. After asserting the species identification, fin clips were provided by fishermen. Samples were preserved in 96% ethanol. DNA extraction, amplification and sequencing. Total genomic DNA was extracted with the REDExtract-N-Amp Kit (Sigma-Aldrich) following the manufacturer's instructions. The mitochondrial control region (CR) and the first intron of the nuclear S7 ribosomal protein gene (S7) were amplified, with the same kit, in a Bio-Rad Mycycler thermal cycler, using L-pro1 and H-DL1 72 , and S7RPEX1F and S7RPEX2R 73 , using the following PCR conditions: initial denaturation at 94 °C for 7′, followed by 35/30 cycles (denaturation at 94 °C for 30/45ʺ, annealing at 55 °C for 30/45ʺ, and extension at 72 °C for 1′; values CR/S7, respectively) and a final extension at 72 °C for 7′. The same primers were used for the sequencing reaction, and the PCR products were purified and sequenced in STABVIDA (Sanger sequencing, https ://www.stabv ida.net/).  74 . For the S7 gene forward and reverse strands were obtained and some samples were sequenced twice for error checking. All sequences were deposited in GenBank (Accession Numbers MF120288-MF120372; and MF120408-MF120464, respectively to CR and S7).
Genetic diversity and geographic structure. The standard measures of genetic diversity were computed for the five location sites separately and after pooling (referred to as "total population"). We used a variety of packages developed for R v.3.6.1 32 to estimate population genetic statistics. We used pegas 75 R-package 32 to estimate standard descriptive measures of genetic diversity, including number of haplotypes and private haplotypes, haplotype diversity (h) 76,77 and nucleotide diversity (π) 76 and respective standard deviations.
To assess population structure, we estimated F ST -like statistics 78 , such as: (1) (4) Jost's D 81 using diveRsity R-package 82 . While the first three estimators measure nearness to fixation, the last quantifies the relative degree of allelic differentiation 45 . We report these four estimates to provide complementary information on the genetic structure among the Senegal seabream sampled locations. The diveRsity R-package was used to estimate 95% confidence intervals for all estimators using 10,000 bootstrap iterations.
Haplotype accumulation curves. The spider R-package 83 was used to obtain haplotype accumulation curves by 1000 random permutation subsampling using the functions haploAccum() and plot.haploAccum() supplied in the spider R-package 83 and the Chao 1 estimator 84 , with chaoHaplo() function. Haplotype accumulation curves provide a graphical way to assess the extent of haplotype sampling by representing the extent of saturation as a function of the number of individuals sampled and the number of haplotypes accumulated. Curves displaying weak asymptotic behaviour suggest further sampling is necessary to capture the genetic variation of the species, while curves with rapid saturation indicate that much of the intraspecific haplotype diversity was sampled. The Chao1 estimates the appropriate minimum sample size to account for all haplotype diversity based on the sample size and number of detected haplotypes as well as the number of singleton and doubleton sequences (those occurring once and those appearing twice) in the dataset.
Genetic connectivity. Population genetic connectivity in D. bellottii was investigated by reconstructing haplotype networks using a statistical parsimony algorithm 85 , implemented in TCS v.1.21 86 . The raw output was visualized in the web implementation of tcsBU 37 . Additionally, an unrooted neighbour-net network was constructed using SplitsTree 4 87 to investigate the genealogical relationships among haplotypes. The network was built using HKY + I + G nucleotide substitution model for both markers, as the best selected with the modelTest function of the phangorn 88 R-package according to the Akaike Information Criterion 89 and compatible with the distance algorithms available in SplitsTree 4, and equal-angle split transformation 90 with 1000 bootstrap replicates.
Species distribution data: occurrence data. The georeferenced locations of the observed distribution of Diplodus bellottii was obtained from occurrence records from two databases: OBIS (http:// www.iobis .org/) and GBIF (https ://data.gbif.org/). We used 94 records of Diplodus bellottii compiled from the Global Biodiversity Information Facility (GBIF, https ://www.gbif.org/) and 72 points lifted from the Ocean Biogeographic Information System (OBIS, http:// www.iobis .org/) and added five own records. Spurious geo-reference errors and duplications were deleted from the initial data. Presence points were limited to within 200 m depth to consider the recorded depth range for this species (between 10 and 90 m depth) 91 . Finally, the remaining 39 presence cells were georeferenced to a 0.083° × 0.083° (= 5 arc min ~ 9.2 km) grid resolution.
Species distribution data: environmental data. The occurrence data was linked to 43 environmental variables available from Bio-ORACLE 92 and MARSPEC 93 with a spatial resolution of 5 arcmin using the R package sdmpredictors 94 . All predictors were delimited up to 200 m depth, given the species known bathymetric preference 95 , which enhances modelling performances by limiting the extent of extrapolation. www.nature.com/scientificreports/ Species distribution data: variable selection. The first level of predictor selection, was done by reducing collinearity between predictors, with variance inflation factor (VIF) 96 implemented in R package uSDM 97,98 . We used a stringent VIF value (< 2) to retain predictors, as recommended for ecological studies see 99 . The final set of variables was reduced to four (chlorophyll a range, cloud fraction maximum, North-South aspect and sea surface temperature range) complying with two criteria: being conceptually meaningful variables 100 and proportion of occurrences to predictors obeying to the general rule of thumb of 10 to 1 101 (Table 3).
Species distribution data: modelling approach. We produced an ensemble approach to obtain predictions for the current distribution implemented in biomod2 package 102 (v. 3.3-7) for R 32 . One of the main features of the BIOMOD2 software is the ability of producing an ensemble model by combining predictions from single models nine different methods: maximum entropy (MAXENT), general linear models (GLM), general boosted models (GBM, also referred to as boosted regression trees), classification tree analysis (CTA), artificial neural networks (ANN), surface range envelope (SRE), flexible discriminant analysis (FDA), multiple adaptive regression splines (MARS), random forests (RF), with default settings 39 . Explicit D. bellottii absence records were not available so we used two sets of pseudo-absences (background data) extracted at random in a proportion of ten pseudo-absences per presence cell 103 . Data (presences and pseudo-absences) were split at random into a calibration (70%) and a validation set (30%). We evaluated 140 models: 2 pseudo-absence sets × 10 iterations × 7 models. Models' performance was evaluated using two measures: the true skill statistic (TSS 104 ) and the area under the receiver operating characteristic (ROC) curve (AUC) 105 . For the ensemble modeling, only those models with a TSS and ROC scores above 0.8, and 0.9, respectively (i.e., with minimum 'good' rating) were considered 104,106 . All calculations and analyses were performed with R version 3.0.3, with the following R packages raster 107 , rgdal 108 , dismo 109 , vegan 110 , sp 111 , rJava 112 , performed in R 32 using the R2C2 computational facility (https ://rcast ilho.pt/R2C2/R2C2_clust er.html).