Uncovering marine connectivity through sea surface temperature

A foundational paradigm in marine ecology is that Oceans are divided into distinct ecoregions demarking unique assemblages of species where the characteristics of water masses, and quantity and quality of environmental resources are generally similar. In most of the world Ocean, defining these ecoregions is complicated by data sparseness away of coastal areas and by the large-scale dispersal potential of ocean currents. Furthermore, ocean currents and water characteristics change in space and time on scales pertinent to the transitions of biological communities, and predictions of community susceptibility to these changes remain elusive. Given recent advances in data availability from satellite observations that are indirectly related to ocean currents, we are now poised to define ecoregions that meaningfully delimit marine biological communities based on their connectivity and to follow their evolution over time. Through a time-dependent complex network framework applied to a thirty-year long dataset of sea surface temperatures over the Mediterranean Sea, we provide compelling evidence that ocean ecoregionalization based on connectivity can be achieved at spatial and time scales relevant to conservation management and planning.

In marine realms, the identification of areas with relatively homogeneous ecosystems (ecoregionalization) and their connectivity is key to environmental management and species conservation 1 . These ecoregions provide a framework for investigating complex problems, from monitoring pollutant or invasive species dispersal to designating effective marine protected areas 2,3 . The potential of a marine protected area to retain biodiversity and restock species abundance beyond its border, for example, is inherently linked to some form of regional discretization and population connectivity, such as larval and juvenile dispersal or seeding across ecoregions 4 .
The intrinsic complexity of the processes and scales involved makes marine ecoregionalization challenging. Different methodologies have been proposed so far: (1) the taxonomic approach 5-7 ,(2) the ecological approach 8-10 , (3) the integrated taxo-ecological approach 11 , and (4) the connectivity-based approach 2,12-16 . Among these, (1) relies on species distributions and classifies regions according to similar aggregations of species; (2) identifies areas by common cycles of biogeochemical or physical properties; and (3) adopts a combination in which both the environment and the species are accounted for altogether. These three methodologies share a limiting factor: they all omit the effects of oceanic currents on larval dispersal and on connectivity or isolation between ecoregions. This problem is overcome by the connectivity-based approach (4), where dispersal processes are identified through Lagrangian methods, at times coupled to network approaches 14 . The need to account for dispersal is motivated by the metapopulation concept 17,18 : the distribution of marine species is not solely driven by local conditions, but also by the spreading of propagules, larvae, juveniles, as well as adult organisms.
As ocean circulation models and reanalysis datasets have become more readily available and reliable, the connectivity-based ecoregionalization has been attempted for some basins 2,[19][20][21] . However, simulating larval dispersal using Lagrangian particle tracking is a computationally intensive task. For example, in the work of 2 Lagrangian particles were regularly seeded every 3 days and 10 km at three different depths (0.5 m, 50 m, 100 m) over the entire Mediterranean Sea. The particles were advected by the velocity field from an ocean model run at 1/12° horizontal resolution and the ecoregions were then calculated through a hierarchical clustering applied to a distance matrix. Computational requirements constrained this work to a period of 4 years. Eco-provinces from Lagrangian trajectories have been determined also using flow networks 14 adapting the infomap community detection algorithm 16 . In these brute-force, Lagrangian-based approaches, the need for large spatial coverage dramatically reduces the maximum time span and vice versa.
To date, no ecoregionalization method informs scientists and stakeholders about ecosystem connectivity at the basin scales and interannual to decadal times toward which management measures and marine protected area designations should aspire. We propose to this end a complex networks-based framework ("δ-MAPS" 22,23 ) applied to monthly sea surface temperature (SST) anomalies for ecoregionalization and connectivity inference. www.nature.com/scientificreports/ δ-MAPS has the advantage, compared to most other network methods, of identifying spatially contiguous, functional domains. The use of SST anomalies is justified by the dynamic link that relates them to sea surface height on small spatial scales (order of 100 km, the so-called mesoscales) and low temporal frequencies (monthly and longer) 24 , which are the ones of interest to basin-scale marine ecosystem connectivity. In the search for a variable that is easily observable with a uniform coverage, linked to the flow advective properties, and containing as much useful information about habitability for a given species as possible, the SST field satisfies all criteria: It is observable through satellite platforms; at equatorial, tropical and mid-latitudes its anomalies are strongly coupled with the layers underlying the ocean surface through horizontal oceanic currents and mixing; and it modulates habitability directly, as well as indirectly, through the solubility of oxygen and carbon dioxide.
The modulation of SST anomalies by ocean surface currents and horizontal advection is verified as we test the new framework using a recently developed reanalysis for the Mediterranean Sea that spans 30 years at spatial resolution of 6-7 km (CMEMS MED-Physics 25 ) but in future application satellite products could be used instead. The choice of the Mediterranean Sea is motivated by the existence of previous works that we can use for validation purposes, and by the urgency of establishing better management protocols in this basin.
The Mediterranean Sea is a critical biodiversity hotspot threatened by climate change. It covers a small fraction of the global ocean surface area (0.82%) and volume (0.32%) 26 but hosts roughly 17,000 species, retaining a large endemism [26][27][28] . The general circulation in the Mediterranean Sea, together with its latitudinal extent, seasonal cycle, and complex bathymetry, allows wide environmental and biological diversity across scales 26 . The near surface circulation consists of separate anticyclonic gyres in the western and eastern portions, connected through the Strait of Sicily (Fig. 1). The complexity of its coastlines, especially on the northern side, and the presence of numerous islands promote the formation of eddies, fronts and other local persistent currents. The overall hydrography [28][29][30] , as well as its general oligotrophy 2,28 , creates heterogeneous ecosystem conditions and functional habitats. In the past decades, the area of increased sea water temperatures have expanded from the southern and eastern portion of the basin into the northward and westward direction, threatening local habitats and productivity. Together with warmer temperatures, changes in biodiversity have been recorded following the invasion of alien species especially in the eastern basin where it has been fostered by the opening of the Suez Canal 28,31 . In light of these trends and of the overall role of the Mediterranean basin in the global marine biodiversity, it is worthwhile to test the proposed methodology to explore long-term changes in its ecoregionalization. Methods δ-MAPS. δ-MAPS aims at reducing a generic spatiotemporal field X(t) dimensionality by identifying spatially contiguous regions (referred to as domains) and their connectivity patterns. A domain is a spatially contiguous region in a given field that participates in the same dynamic functions, such that grid cells belonging to that domain share a highly correlated temporal activity. Links between domains define a functional network with a weight assigned to each edge to reflect the magnitude of interaction between any two domains. The strength of a domain is then defined as the sum of the absolute weights of all edges. It is therefore an ideal tool to identify ecoregions if applied to a field that informs us of the connectivity properties of the flow. www.nature.com/scientificreports/ δ-MAPS works through two steps, namely domain identification and network inference, and we refer the reader to Ref. 22 for a detailed description. Domain identification. Given a field X(t), its dimensionality is reduced by identifying sets of grid cells (i.e., domains) with highly correlated activity. To do so it is hypothesized that domains have epicenters or cores, where their local homogeneity is greatest, and these cores are identified. Formally, each grid cell i is associated with a time series x i (t), after removing a linear trend and seasonality, and for each i we define a K-neighborhood γ K (i) including the grid cell i and its K closest neighbors. The local homogeneity of i is then computed as the average pairwise correlation in its γ K (i) . A grid cell is a core if its local homogeneity is a local maximum and greater than a threshold δ. Cores are then expanded and merged to identify domains.
Network inference. For each domain A, we compute its signal X A (t) as the weighted cumulative anomaly of all time series within it: where x i (t) is a time series of length T associated to grid cell i with latitude φ i and |A| is the number of grid cells in A.
Given D domains, the network is inferred by considering each possible pair of domains A and B and computing their Pearson correlation r A,B (τ ) for a lag range τ ∈ [−τ max , τ max ] . Given a significance level, the significance of each pairwise correlation is tested excluding autocorrelations using the Bartlett's formula 42 .
Two domains A and B are connected if there exists at least one significant correlation between the two at any lag in the range τ ∈ [−τ max , τ max ] , denoted as R A,B (τ). If R A,B (τ) includes the lag τ = 0 , the link is left undirected. If R A,B (τ) is strictly positive (negative) the link will be directed from A to B (B to A). A weight w A,B is assigned to each link based on the covariance between the two signals X A (t) and X B (t) at the lag τ * at which their significant correlation r A,B (τ ) is maximized. Finally, a (non-dimensional) strength value is assigned at each domain, defined as the sum of the absolute weights of all the links connected to that domain.
In this work, δ-MAPS is applied to the Mediterranean Sea Physical Reanalysis (CMEMS MED-Physics 25 ), which is the result of the Mediterranean Forecasting System run with horizontal grid resolution of 1/16° (ca. 6-7 km) complemented by a variational data assimilation scheme (OceanVAR) for temperature and salinity vertical profiles and satellite sea level anomaly along track data. The significance level for the network inference is set to 0.03 and tested using a t-test. We used τ max = 12 months and a K-neighborhood of 4 grid cells. The δ threshold (see Ref. 22 ) is inferred using a statistical significance level α. In the validation test over 2007-2010 (see "Results") α = 10 -3 , in the networks built using 7-year time slots α = 2 × 10 -5 , and in those using 6 and 8 years α = 6 × 10 -5 and α = 3 × 10 -6 , respectively (see "Results" for timeslots definition). Different values of α are set in order to maintain the minimum significant correlation for the domain identification within a fixed range (0.54-0.55) for the time slots containing the validation period, independently of the time slots considered. This guarantees that ecoregions are consistent with those obtained over the validation period.

Results
The δ-MAPS analysis is performed onto monthly mean SST anomalies from the Mediterranean Sea Physical Reanalysis (CMEMS MED-Physics 25 ) over the period 1987-2017. The advantage of using a reanalysis resides in the availability of a velocity field consistent with the SSTs that allows us to confirm the coupling between network domains and ocean currents within the euphotic layer. It is worthwhile remarking that this work and the one of Ref. 2 not only use very different methods to define connectivity, but also different data sources. Our study uses velocity and SST output fields from CMEMS MED Physics reanalysis, while Ref. 2 uses the configuration PSY2V3 of the operational system MERCATOR OCEAN with a resolution of 8 km in the horizontal downscaled to a connectivity grid of 50 × 50 km. The data assimilation and clustering algorithms are different and Ref. 2 employs a cut-off in addition to the clustering grid downscaling. These differences unavoidably translate into slightly different shapes and patterns of the domains inferred. For example, the D + V area in panel (a) of Figure S1 is effectively two separate ecoregions in Ref. 2 , in which the Messina Strait is not resolved at the connectivity grid level. However, this separation appears inconsistent with the surface kinetic energy (K.E.) of panel (b) in Figure S1, computed from the horizontal currents, e.g. zonal (u) and meridional (v) velocity components, as K.E. = 1/2 |V| 2 where |V|= (u 2 + v 2 ) 0.5 . Indeed, there is no clear separation between the regions north and south at Messina Strait in our dataset. Having detailed this example and acknowledged that some differences should be expected, the overall basin eco-regionalization using δ-MAPS is consistent with that in Ref. 2 . The spatial accuracy is enough to well separate the main ecological areas, despite small-scale differences (i.e. some km, due to resolution choices).
By and large, the SST anomaly domains in Figure S1 are bounded by ocean currents, in agreement with Ref. 2 . This is due to the dominance of advective forcing by ocean currents on the SSTs at equatorial and mid latitudes, The dynamical changes associated with the strengthening and weakening of ocean currents are hypothesized to coincide with a reshaping of the sub-basin ecoregions and reciprocal connectivity. The ecoregionalization inference is therefore performed considering time slots of varying length, so that yr end = yr ini + Δ with yr ini = y 0 + n, n = 0,1,…,N, where y 0 is the initial year of the dataset (1987) and N is the total number of time slots, each of duration Δ years, between 6 and 8. Time slots overlapping by more than one year among different trends periods are excluded. The choice of Δ = 7 years represents the best trade-off for having enough time slots to quantify the evolution of ecoregions and a sufficiently large number of data points in each time slot for statistical inference. We will focus on this case, but results are verified also for the other Δ values (see Supplementary Information).
Strength maps for three representative time slots are presented in Fig. 3a,c,e while maps of domain strengths for all Δ = 7 time slots can be found in Figure S4. The mean surface kinetic energy averaged within each timeslot is next compared to the number of ecoregions in corresponding timeslots. The fragmentation level, or the total number of ecoregions, and the mean surface kinetic energy content are highly correlated ( Figure S5b in Supplementary Information), with a Pearson's coefficient of 0.79 for the whole Mediterranean Sea, and 0.8 (0.65) for the eastern (western) basin. The fact that time slots are not independent does not invalidate the analysis, and a large correlation (c.c = 0.73) is retained even when using four non-overlapping time slots. A higher fragmentation occurs whenever the upper ocean layer is more energetic, and this relationship is robust to changes of Δ (see Supplementary Information). The domain strength is next compared to the mean K.E. content. For each timeslot, the domain strength is spatially averaged over the eastern and western basin separately. The correlations between the averaged strengths and the corresponding time slot mean surface K.E. values, both varying as the time slots change, are then calculated for eastern and western basins separately. No linkage is found in the western basin, but a strong anticorrelation describes the relationship in the eastern Mediterranean (c.c. − 0.74). This anticorrelation remains high (− 0.73) also when the eastern basin strengths are related to the whole basin surface K.E. averaged over each timeslot. www.nature.com/scientificreports/ We hypothesize that the amount of K.E. associated with semi-permanent jets, currents or large mesoscale eddies, grouped here together and named KE fronts, can be used as an indicator of their role as connectivity modulators. We identify KE fronts applying a pattern recognition algorithm on the K.E. fields for each time slot. The resulting pictures are processed by an image segmentation technique, based on K-means clustering, to separate the K.E. in four clusters of increasing energy content. The maximum-intensity group is selected as indicators for KE fronts and the number of pixels contained in each cluster is counted and used to estimate the size or abundance of each one. The maximum-intensity cluster well represents the energy-containing structures as measured by the correlation between the mean surface K.E. content in each time slot and the pixels within the corresponding cluster (c.c. > 0.99). The more pixels reside within each cluster, the larger the KE fronts-populated areas that this cluster approximates. This estimation is carried out for the whole basin, and separately in the eastern and western parts. The number of pixels is then correlated to the number of inferred ecoregions for the whole Mediterranean (c.c. = 0.81), and for eastern (c.c. = 0.81) and western (c.c. = 0.69) basins. Figure S6 in the  panel (b)), for the whole Mediterranean Sea for the maximum cluster. The number of ecoregions is highly correlated with the KE fronts everywhere and especially in the eastern Mediterranean Sea. The higher level of fragmentation found in the MAX period is thus associated with more abundant and/or larger surface KE fronts, acting as eco-dynamical barriers.
To further strengthen this assessment, we consider that energy fronts can act as modulators for SSTa-derived domains. The ecoregionalization over a certain time slot characterizes that time range in one single ecoregionmap but stems from data known at several time points (i.e. monthly SSTa in our case). The resulting domains account therefore for the inherent physical variability of the system over time. A higher (lower) ecoregions fragmentation may therefore by associated with dynamical fronts occurring at different times and not necessarily in the same place, over a certain time range. If this is plausible, we expect to count more (less) occurrences of higher energy in broad areas where the domains are more (less) fragmented. For each time slot, the number of occurrences of a front in each pixel is therefore counted. Specifically, having defined a front as a K.E. realization above the 50th percentile of the overall (1987-2017) time varying surface K.E., we count how many times a front appears in the considered time slot at each pixel. In Figure S7 (Fig. 3b,d,f for the eastern basin and Figure S8 in the Supplementary Information for the western basin).
In 1987-1993 the western basin was characterized by a high mean positive correlation of 0.73, with a strong, non-directional connectivity among the Tyrrhenian and Ligurian-Algero Provençal domains. In 2004-2010 the connectivity was overall weaker, and in particular reduced among Tyrrhenian waters. The connectivity between the Balearic domain (Bal) and the Tyrrhenian ones was also reduced. In 2011-2017 the connectivity was mostly recovered, especially in Tyrrhenian waters. In this period, the Algero-Provençal domain separated from the Ligurian Sea (Lig), enforcing its connectivity with the Balearic and the Alboran ecoregions.
In the eastern basin we focus our attention on the ecoregion immediately offshore the Suez Canal (Fig. 3), the major anthropogenic corridor for the introduction of non-indigenous marine species in the Mediterranean Sea, the so-called Lessepsian immigrants 32 . According to δ-MAPS, connectivity from the domain surrounding Suez was high in the first decade, decreased approaching MAX, remained small until about 2010-2011 with fewer statistically significant links, and increased again in the more recent time slot considered. During the UP and DOWN periods, the strongest connections were with the eastern Levantine (domain N), followed by that with the Aegean, Ionian and Tunisian Seas. During UP the connectivity extended to the Provençal and Algerian Seas, in the western basin, while in DOWN these links were absent and replaced by a connection with the Adriatic Sea.
The 1987-1993 and 2011-2017 periods, while not too dissimilar in energy levels, differed indeed for the phase of the Ionian-Adriatic Bimodal Oscillating System or BiOS 33,34 . The BiOS is a mode of variability characterized by a decadal reversal of the Northern Ionian Gyre (NIG) from cyclonic to anticyclonic, and vice versa. In its anticyclonic spinning the NIG deviates the inflowing Modified Atlantic Water (MAW) from the Sicily Channel towards the northern Ionian, entering the Adriatic Sea and decreasing its salinity and temperature. This prevents a portion of the MAW from reaching the Levantine basin, and enhances the outflow of Levantine waters into the western basin, along a pathway that follows the African coastline. The anticyclonic NIG co-occurs with higher concentrations of Atlantic and Western Mediterranean organisms in the Adriatic Sea. When the NIG is cyclonic, on the other hand, Levantine waters enter the Adriatic Sea, whereas the MAW preferably flows toward the Levantine 35 and Lessepsian migrations influence the Adriatic Sea at various latitudes, affecting also phytoplankton phenology 33,36,37 . The corresponding regions and connectivity networks in the two opposite NIG periods are detailed in Figure S9 in the Supplementary Information.

Discussion
The Lessepsian invasion in the Mediterranean Sea. The investigation of the Mediterranean Sea ecoregions in the past 30 years revealed connectivity pathways of Lessepsian invasion (Fig. 3) that match, in the time averaged sense, the biodiversity patterns identified through the analysis of available coastal data, irrespective of time (Ref. 31 , their Fig. 2). Our analysis adds to the sparse coastal observations both space distribution and time variability of connectivity pathways, and helps contextualizing the timeline of the most recent, invasive, immigration into the Mediterranean, that of the lionfish or Pterois miles 32 . A first specimen was observed in 1991 off the coast of Israel 38 but no other individuals were observed until 2012, when two lionfish were caught along the Lebanon coast. Spreading to other parts of the eastern Mediterranean then followed quickly, with specimens collected around Cyprus, Greece, Syria and Turkey by 2014, followed by Tunisia in 2015 and Italy in 2016 (Ref. 39 and refs therein). The first immigration followed the pathway indicated by the strongest link in the 1987-1993 period from the Suez domain: Lw to N. It is likely that SST conditions limited spreading around 1990, because spawning in lionfish occurs preferentially when near surface water temperatures are at or above 28.4 °C. In the following decade the SSTs reached the favorable range during summer in most of the east Mediterranean, but the isolation of the Suez domain halted the invasion. With the shift to higher connectivity and the re-establishing of currents favoring transport across the east Mediterranean, lionfish easily spread after 2011, reaching as far as the coastline of Italy, with a timeline in agreement with the strength ordering identified by the network analysis. www.nature.com/scientificreports/ In reference to the BiOS, we have shown that the anticyclonic to cyclonic transition is characterized by a shift towards positive correlations along Levantine waters and the Adriatic Sea. Periods of cyclonic NIG therefore may favor Lessepsian migrations into the Adriatic as outcome of the established connectivity patterns, while the anticyclonic phase favors Atlantic migration events due to the MAW intrusion. Species invasion in the Adriatic Sea has been explained through circulation changes, temperature rise and by the northward displacement or enlargement of thermophilic organisms' habitat, from the southernmost areas of the Mediterranean basin towards more temperate regions. While some species spawning and settling may be favored moving through the time considered by the increasing seawater temperatures, our framework highlights how periodic circulation changes have been modulating the connectivity of the Ionian-Adriatic system all along. This cyclic behavior in invasion likelihood could and should be exploited by restoration efforts.
Biological invasions are a pervasive environmental problem in the Mediterranean Sea as they relate to the conservation of its biodiversity. They have been the focus of intense research over the past two decades, yet accurate predictions of spreading pathways and regional susceptibility remain elusive. Here we have shown that key information about the spatial extent, directionality and time variability of these invasions can be added through complex network analysis applied to sea surface temperature fields routinely available at low computational cost, opening new possibilities for their management and containment.
Significance of proposed framework for marine ecoregionalization. A sustainable use of the oceans and their resources requires, among other actions, the protection and restoration of portions of the world coastlines and their ecosystems, and the conservation and restoration of marine biodiversity 40,41 . To achieve this goal, the identification of ecoregions and their connectivity in time and space is key to developing effective strategies for targeted assessments, environmental management, and implementation of marine protected areas. By exploiting the relationship between sea surface temperature anomalies and ocean currents that spans time and space scales of relevance to the sustainable development of marine ecosystems, we introduced an ecoregionalization framework based on connectivity. The framework builds on δ-MAPS, a complex network analysis tool related to multivariate statistics, clustering and community detection. δ-MAPS identifies the semi-autonomous components of the field under investigation, in our case sea surface temperature anomalies, and their (potentially) lagged and weighted interrelationships. It has been applied to the Mediterranean Sea for validation purposes, but is applicable to all oceans at mid-, tropical and equatorial latitudes. It provides key information regarding physical barriers to dispersion of larvae and juveniles, and the likelihood of spread among ecoregions over time. δ-MAPS is fast and automated, and it can be combined and augmented with the analysis of surface temperature and chlorophyll trends, both fields being available through satellite observations and constantly upgraded, quality controlled, data assimilation products. In the ocean, surface temperatures are directly linked to oxygen and CO 2 content, while chlorophyll connects to nutrient availability, planktonic distributions and eutrophication episodes. δ-MAPS can also be applied to model simulations and future ocean projections whenever the resolution is high enough to resolve mesoscale circulations.
Ecoregions are essential units of comparative analysis in the assessment, management and solution of ecosystems problems. In the oceans, ecoregionalization is an interdisciplinary endeavor that involves physical, biological and ecological oceanographers, complicated, in comparison to land systems, by the presence of ocean currents connecting far away water bodies. δ-MAPS, through complex network analysis of a routinely observed quantity, SST, introduces a new, powerful tool to evaluate the physical contribution and its changes over time in most of the global Ocean.

Data availability
The datasets analyzed are from the Mediterranean Sea Physical Reanalysis CMEMS MED-Physics 25 (