## Background & Summary

The Good Environmental Status of European Seas (GES)1 is the European goal of reaching the sustainably of stock and environment exploitation and no loss of biodiversity and ecosystem services. It is the primary goal of several European strategic frameworks such as the Marine Strategy Framework Directive (MSFD), the Maritime Spatial Planning Directive (MSP), the Green Deal and Blue Growth strategies, and the EU Biodiversity Strategy for 20302,3,4,5,6,7. This goal is challenging in the current context of increasing energy, food demand, and climate change. Scientific approaches that address GES require processing marine data of ecosystems to assess ecosystem services, biodiversity, and stock status. They also require multi-disciplinary modelling approaches to extract valuable knowledge from the data6. Recently, international projects such as EcoScope8, have been fostering the shift from traditional “vertical” modelling approaches - focussing on one species, stock, or ecosystem service independently of the other - to “horizontal” approaches, which combine multi-species, environmental, and social dynamics9,10. However, these approaches require huge amounts of high-quality data to produce meaningful knowledge11,12. In particular, environmental, geophysical, world-population, and marine-region data are crucial to model species habitats13,14, understand the response and resilience of marine areas to climate change15,16,17, assess stock status and fisheries pressure on stocks18,19,20, and build ecosystem models21,22,23,24.

This paper describes an extensive data collection of harmonised and standardised global-scale parameters, with associated long-term forecasts under different greenhouse gas emission and societal development scenarios. The collection aims at supporting ecological, ecosystem, and ecological-niche models within horizontal approaches to marine resource management.

Figure 1 summarises our workflow. We harmonised and standardised geospatial data from our own heterogeneous resources and publications that had newly produced or re-processed these data. Some data were previously available in custom formats (e.g., CSV or text files), which meant they were not as accessible as they could be. Additionally, we specifically produced other data to complement the collection. The primary sources involved were (i) environmental data produced for the AquaMaps ecological niche models, (ii) data from the Italian National Research Council (CNR) studies on marine science, Earth science, and epidemics that re-processed or newly produced open-access data based on other sources, and (iii) data produced by the Quantitative Aquatics (Q-quatics) non-governmental organisation for ecosystem and ecological models.

The complete list of environmental data with their primary and secondary sources is reported in Tables 14, grouped by resolution and parameter type. Data harmonisation consisted of correcting errors and aligning the data to the same coordinate grids, with either 0.1° or 0.5° resolutions. The format of the published data is ESRI-GRID ASCII. All data have a global-scale range but are also cut out on 8 European marine areas of particular economic or ecosystem importance (focus regions), identified by the EcoScope European Project community of practice8. This specialisation aims to make the collection highly valuable for European and global-scale ecological niche models, ecological models, ecosystem models, environmental similarity analyses, and climate change studies, as also documented in the rest of the paper.

The earliest year involved in our collection is 1950. Forecasts are available for 2050 and 2100 under the Representative Concentration Pathway25 (RCP) scenarios 2.6 (63 data), 4.5 (162 data), and 8.5 (162 data), and the A2 Special Report on Emissions Scenarios26 (SRES) defined by the Intergovernmental Panel on Climate Change (IPCC) (180 data). These scenarios represent future greenhouse gas emission conditions and future societal development hypotheses. Temporal aggregation is annual for 196 data and monthly for 369 data (from January 2017 to March or May 2021, depending on the parameter). With a temporal coverage of ~10 years, between 1950 and 2100, our data are unsuited for running long-term continuous time series analyses. However, they are suited for creating long-term snapshots of ecological, environmental, and ecosystem models. Moreover, they allow for continuous time series analyses between 2016 and 2020 yearly (over 5 years, as demonstrated in this paper) and between 2017 and 2021 monthly (over 53 months), which are suited for finding evidence of inter-annual and inter-month variations and climate change-related variations14,27.

We checked the data against their primary sources for consistency. Moreover, we used a subset of annual data between 2016 and 2020, specifically created for this publication, to conduct a spatiotemporal analysis. This analysis confirmed similarities and discrepancies between the focus regions highlighted by independent studies (as indicated in the section “Technical Validation”), along with the parameters primarily responsible for the similarities.

## Methods

This section explains all workflow steps depicted in Fig. 1.

### Data

As the first workflow step, we collected data from the primary sources listed in Tables 14, which included:

1. 1.

Historical annual environmental data used by the AquaMaps ecological niche models and additional information attached to the AquaMaps authority files,

2. 2.

Re-processed or novel data attached to Italian National Research Council publications on marine science, Earth science, and epidemics,

3. 3.

Annual and monthly environmental data for the AquaMaps environmental parameters produced by the Quantitative Aquatics (Q-quatics) non-governmental organisation.

The data specifically produced for the present publication are the sea parameters reported in Table 1 with an asterisk. The re-distribution of the data was compliant with the primary and secondary source policies for the type of data re-processing we undertook. All data were globally distributed geospatial rasters; some were defined on marine areas only as that was appropriate for the ecological models of GES and EcoScope the datasets were used for. The data were defined on squared areas, with sides equal to the spatial resolution. Overall, the parameters involved were:

1. 1.

Sea-bottom and sea-surface dissolved oxygen, salinity, and temperature

2. 2.

Sea net primary production

3. 3.

Sea ice concentration

4. 4.

Average, minimum, maximum sea depth

5. 5.

Average, minimum, maximum elevation

6. 6.

Distance of a square marine area from land and its fraction covered by water

7. 7.

The characterization of each data cell in terms of which Large Marine Ecosystem (LME), Exclusive Economic Zone (EEZ), Marine Ecoregions of the World (MEOW), and Major Ocean Basins they belong to, and whether or not it sits in a Marine Protected Area (MPA)

8. 8.

Number of islands

9. 9.

Water area that lies within the shelf, slope, and abyssal zones

10. 10.

Tidal range extension

11. 11.

Coral density

12. 12.

Estuary and seamount presence

13. 13.

Carbon dioxide flux at soil surface

14. 14.

Air surface temperature

15. 15.

Precipitation

16. 16.

Difference between air surface temperature and sea surface temperature

17. 17.

World population density

18. 18.

Sediment thickness

19. 19.

Atmospheric concentration of methane and nitrous oxide

20. 20.

Earth heat flow

21. 21.

Distance from crust plates

22. 22.

Earthquake density, depth, magnitude

23. 23.

Groundwater resources

After the data collection phase, we harmonised all global-scale data from their primary sources’ geodetic systems to the same global-scale grid and projection, i.e., the WGS 84-EPSG:4326 geodetic system with equirectangular projection. We set two square grids for the data, at 0.5° and 0.1° depending on the original resolutions. The original files had heterogeneous formats, from raw text (CSV, XYZ) to more structured formats (NetCDF, ESRI-GRID). All files were first aligned to the same grid and checked for inconsistency and offset by comparing each grid point with the expected original data value. Eventually, they were converted to the ESRI-GRID ASCII format (ASC)28. ESRI-GRID is a standard format approved by the Open Geospatial Consortium (OGC), a worldwide community that assesses standards and protocols to improve access to geospatial data. This format allows for inspecting the data with text processing software as well as visualising them with commonly used Geographic Information System (GIS) software (e.g., QGIS29, and ArcGIS30). The format is also the most frequently accepted by ecological niche modelling and ecosystem modelling software (e.g, MaxEnt31 and Ecopath with Ecosim22,32,33,34) and most programming languages have libraries for parsing it35,36. In our harmonisation and standardisation workflow, one ASC file corresponds to one parameter in a specific year (or month) and location. This correspondence makes the files easily convertible into other formats (e.g., in NetCDF format through the GDAL software37). Overall, the ESRI-GRID ASCII format was optimal for our collection’s scope of supporting ecological and ecosystem models and climatic analyses.

All data were also cut out on 8 European marine areas of particular economic or ecosystem importance, identified by the EcoScope European Project community of practice. These areas (hereafter named focus regions) were:

1. 1.

The global-scale

2. 2.

3. 3.

The Aegean Sea

4. 4.

The Baltic Sea

5. 5.

The Bay of Biscay

6. 6.

The Black Sea

7. 7.

The Levantine Sea

8. 8.

The North Sea

9. 9.

The Western Mediterranean Sea

The areas were geographically identified according to the corresponding marine eco-region (Adriatic, Aegean, Baltic, Levantine, the North Sea) or International Hydrographic Organization region (Bay of Biscay, the Black Sea, Western Mediterranean Sea)38,39.

The temporal coverage of our data collection is of ~10 years within the period 1950–2100. Forecasts for 2050 and 2100 are available under different greenhouse gas emission scenarios, i.e., RCP 2.6 (low emission), 4.5 (medium emission), and 8.5 (high emission), although the RCP 2.6 scenario was not available for 2050. Moreover, some forecasts for the IPCC SRES A2 scenario (which hypothesises a future of independent, self-reliant nations with constantly increasing population and regionally diversified economic development, slow technological change, and worldwide use of nuclear energy) were also available and included in the collection.

Data harmonisation for text files was conducted through a dedicated Java process40 that managed the different formats, aligned the data to a resolution-specific grid, and finally produced one ASC file. As for primary sources with NetCDF and ESRI-GRID formats, we performed manual checking, alignment, and band extraction through QGIS. Conversion to ASC format was done through GDAL. No-data locations were all assigned a default −9999 value, specified in the ASC file header through the NODATA attribute, which makes it automatically interpreted and used by GIS software for consumption and visualisation. Data with non-homogeneous resolution over longitude and latitude were homogenised through nearest-neighbour and bilinear interpolation separately, via QGIS. Earthquake and high-resolution temperature and precipitation data were left to their original aggregated temporal range to represent an aggregated reference of a recent past.

### Newly produced data and time series analysis

We conducted a spatiotemporal analysis on the focus regions to find evidence of similarities between parameter trends over the years. Then we checked for agreement with outputs of other studies as a further data validation. We focussed this analysis on a data collection subset containing newly produced data at 0.1° resolution, annually aggregated from 2016 to 2020 (Table 1). We selected these data because they were not previously explicitly validated in other publications, and were thus differentiated from the other data whose content was instead validated in other publications12,16,17,41,42,43,44.

The selected data were the following (Fig. 2):

1. 1.

Sea-surface temperature

2. 2.

Sea-bottom temperature

3. 3.

Sea-ice concentration

4. 4.

Sea-surface salinity

5. 5.

Sea-bottom salinity

6. 6.

Sea net primary production

7. 7.

Sea-bottom dissolved oxygen

These parameters are generally used by the AquaMaps ecological niche models41 that assume they include sufficient information to assess global species presence16. It is important to note that 2016 data were not available for sea-bottom salinity, sea net primary production, and sea-bottom dissolved oxygen.

We used ocean products from the Copernicus Marine Service45 to produce the new data for the seven environmental parameters above. NetCDF data for mean monthly sea surface and bottom temperature, sea surface and sea bottom salinity, and sea ice concentration were re-processed based on the Global Ocean 1/12° Physics Analysis and Forecast 001–024 monthly dataset46, that natively used the WGS 84-EPSG:4326 geodetic system and equirectangular projection and had 0.083° spatial resolution. Mean monthly data for net primary production and dissolved oxygen were obtained from two temporally complementary datasets: the Global Ocean Biogeochemistry Analysis and Forecast 001–028 monthly dataset47 (in WGS 84-EPSG:4326 geodetic system and projection) and the Global Ocean Biogeochemistry Hindcast 001–029 monthly dataset47 (in ETRS 89-EPSG:4258 geodetic system and projection), both having a spatial resolution of 0.25°. Global monthly data for sea surface and bottom temperature, sea surface and bottom salinity, sea ice concentration were rasterized and resampled using the R-Terra package48, upscaling to 0.1° spatial resolution using bilinear interpolation. Net primary production and dissolved oxygen data were all reprojected to WGS 84-EPSG:4326 prior to rasterization and resampled by downscaling to 0.1° spatial resolution using bilinear interpolation. We carried out this process only on one depth layer (either surface or bottom) for all parameters, except for sea bottom salinity and bottom dissolved oxygen. These two parameters required the resampling of up to 72 depth levels (0.5 m to 5902 m) per month, then extracting data at the maximum depth layer per 0.1° grid cell before compiling them into corresponding monthly sea bottom salinity and sea bottom dissolved oxygen raster layers. We then used the resampled monthly rasters to compute the annual means for each of the seven parameters. The annual mean data were saved as GeoTIFF and CSV formats and were manually inspected for exact correspondence through coordinate mapping in ArcGIS30. Cases where precedent resampling to 0.1° spatial resolution had yielded marginal rows (along 89.95°N or 76.95°S) or a marginal column (along 179.95°E) with missing data were resolved by copying parameter values directly from the neighboring row or column. This approach was considered reasonable in view of the spatial resolution of the data. The final outputs were exported as CSV files and underwent the data harmonisation and standardisation process depicted in Fig. 1.

To validate the data, annual average values per region were first extracted and visualised for each parameter to compare trends across all regions (section “Technical Validation”). Moreover, each region was characterised through its associated parameter time series. Average time series 0-lag cross-correlation was used for numerical comparison. Specifically, it was calculated per parameter across all focus regions, and per region across all parameters. These analyses highlighted general and regional parameter time series similarities. Confirmation of these similarities with that seen in other scientific studies was used to assess the reliability of the data in representing valid ecological macro-patterns.

### Habitat representativeness score

Parameter time series cross-correlation might indicate that two regions were subject to similar average parameter variations. This condition might correspond to similar habitats over time in geographically connected regions if the parameters have similar ranges and distributions. A species’ ecological niche is, mathematically, the space within a hyper-volume in a vector space of environmental parameters associated with the species’ proliferation. Understanding general habitat similarity between two regions is equivalent to assessing the similarity between the parameter hyper-volumes over the two regions, independently of the species. This assumption is reasonable if the involved parameter set is complete enough for ecological niche modelling. Correlated region-specific parameter time series do not necessarily indicate similar habitats, because parameter distributions’ similarity and geographic reachability are also required. Habitat similarity, which depends on annual parameters’ distributions, can also change over the years and is thus complementary information with respect to time series correlations. Habitat dissimilarity after a specific year, unnecessarily corresponding to lower time series cross-correlation, likely indicates that an abrupt event made two regions different.

We conducted habitat similarity analysis over the years between our focus regions to study these variations and search for confirmations in other studies. Specifically, we used the Habitat Representativeness Score (HRS)49 to measure habitat similarity. HRS is an algorithm based on Principal Component Analysis (PCA)50 that measures the overall difference of the data distributions between two regions, across the largest data variance directions (principal components). HRS has been used to understand the principal environmental drivers of species presence in distant regions51 and to assess ecological survey completeness49. The algorithm works with two inputs: a reference region A and a test region B. As the output, it calculates a score interpretable as the representativeness of habitat B by habitat A (HRS(A, B)). Each region is characterised through vectors of environmental parameters. PCA is conducted on the reference region (A) vectors to extract major data variance axes (i.e., the principal components). An optional threshold, set on the principal components’ eigenvalues, can restrict the comparison to the largest variance axes. In our validation experiment, we selected components covering up to 95% of the total data variance. Then, the normalised data frequency distribution of the vectors on each axis is calculated and subdivided into equal-frequency bins. The region B vectors are then projected onto the principal components of region A. The B parameter frequencies over the A principal components are calculated across the same bins estimated for A. Finally, the pairwise differences between the bin frequencies are calculated for all principal components. The HRS is the sum of these pairwise differences. Since bin frequencies sum to 1 on each axis, the HRS ranges from 0 to the number of principal components (N), with N representing completely different habitats and 0 perfect habitat similarity.

We calculated a pairwise HRS matrix to discover significant habitat similarities between the focus regions. However, HRS is an asymmetric function by construction because PCA conducted on region A and projected on B likely gives different results than PCA conducted on region B and projected on A. One possible estimation of the overall HRS between A and B is the mean between HRS(A,B) and HRS(B,A)52. This choice also makes the HRS matrix symmetric and facilitates the similarity analysis. Therefore, we used average HRS as the region-pair score. For each region, we standardised the scores by dividing the value by the total HRS range. We finally assessed as “similar” those region pairs emerging from the standardisation by more than 10%. We repeated this analysis for all annual data between 2016 and 2020 to study habitat similarity stability over the years between the focus regions. Finally, we verified evidence of the detected similarity stability and instability in other studies.

### Detecting major habitat similarity drivers

Using PCA over the focus regions’ parameters allowed for consideration of the variables that primarily contributed to the largest principal components and thus to the HRS50,53. In particular, the PCA principal axes (eigenvectors) and their eigenvalues allow for defining loading vectors as $$loading=eigenvector\cdot \sqrt{eigenvalue}$$. Each loading is a vector containing as many elements as the number of original environmental parameters, and there is one loading for every PCA axis. The loading vector elements represent the original environmental parameters’ contributions to the corresponding PCA axis (weights). Keeping only the major PCA axes (i.e., those with the largest eigenvectors) allows for the analysis to be focussed on the largest data variance and exclude noise. Average parameter weight across the loadings measures the average contribution to the principal components by each parameter and thus the contribution to the HRS. Therefore, the parameters with the largest average weights are the major drivers of the estimated HRSs and thus of the detected similarities. For the present loadings analysis, we selected the principal components covering up to 95% of the total data variance and the environmental parameters with a non-zero positive average weight.

## Data Records

We made the data available on a public-access Figshare repository54. The collection is composed of 6 datasets. The principal datasets are “Environmental Geophysical Marine Socioeconomic parameters at 0.1° and 0.5° resolutions” and “Monthly data at 0.1° resolution”. Internally, they are structured with a folder hierarchy that optimises search time for an ecological niche modelling expert (Fig. 3). The first dataset separates 0.5° and 0.1° spatial resolution files in two main folders. The 0.5° resolution folder contains one sub-folder each for RCP 2.6, 4.5, and 8.5, the IPCC SRES A2 forecast scenario, and historical data (named HISTORICAL). Each sub-folder is organised by year. For example, the RCP 4.5, RCP 8.5, and IPCC SRES A2 folders contain the 2050 and 2100 sub-folders. The RCP 2.6 folder contains only the 2100 folder. The HISTORICAL data folder contains year-specific sub-folders from 1950 to 2019 and two additional folders for the 1900–2008 and 2000–2014 temporal aggregations. Each annual sub-folder contains one sub-folder for each focus region (9 total), which in turn contains the ESRI-GRID parameter files with the specific resolution, scenario, time reference, and region corresponding to the file path and the metadata. Each file name contains information to reconstruct the path. For instance, Sea-surface_temperature_res_05_annual_years_2019_Clim_scen_historical_regional_Adriatic_Sea.asc indicates a file containing annual-aggregated sea-surface temperature data, at 0.5° resolution, in 2019, within the HISTORICAL data sub-set, and cut out on the Adriatic Sea. The 0.1° annual data root folder has the same structure as the 0.5° root folder but contains only historical data. The years involved in our data collection are: 1950, 1997, 1999, 2011, 2013, 2016, 2017, 2018, 2019, 2050, and 2100. Some files, e.g., those of gas concentration of methane and nitrous oxide, have a variant file with the “bilinear” attribute in the file name to indicate that bilinear interpolation was used instead of nearest neighbour to homogenise coordinate resolutions.

The adopted folder structure allows an ecological niche modelling expert to find aligned files in one folder and directly use them in modelling software like MaxEnt31, e.g., to quickly model a species’ distribution at a 0.5° spatial resolution in 2019 in the Adriatic.

The monthly dataset has a folder structure organised by parameter name. Each parameter folder contains one sub-folder for each year, which in turn contains monthly ESRI-GRID files. This structure is conceived to facilitate monthly parameter analyses.

The complete file collection contains 2132 files. Global-scale data are 565; 154 have 0.5°resolution and 411 have 0.1° resolution. Among the 0.1° resolution data, 369 have a monthly aggregation and 196 an annual aggregation. Forecasts are available for 2050 and 2100, and overall include 63 files for RCP 2.6 (only in 2100), 162 for RCP 4.5, 162 for RCP 8.5, and 180 for IPCC SRES A2.

Additional datasets in the collection (in the “Statistics, trends, HRS, PCA-loadings, and charts” and “File list and statistical properties” datasets) contain summary tables and charts with standard statistics (mean, standard deviation, geometric mean, log-normal standard deviation), cross-correlations, HRS estimates, and PCA loadings that we used for the technical validation. The Figshare repository also contains all R scripts, Java software links, and references to the programs used to conduct the technical validation (in the “Scripts and related software” dataset).

## Technical Validation

### Consistency with respect to the original data

Each produced ESRI-GRID file was defined on a regular spatial grid. Therefore, as a first consistency check, we exhaustively verified that all grid data corresponded to the expected original data. In particular, we systematically sampled from each ESRI-GRID file and pairwise checked if the samples corresponded, through coordinate mapping, to the expected values in the original dataset. As for interpolated coordinates, the nearest neighbour value in the original file was taken as the validation reference. This operation allowed us to detect conversion and misalignment errors, which we later adjusted for exact correspondence with the original files. We conducted this operation with a specific Java-based program for text files40, and with QGIS and GDAL for NetCDF and ASC files. General content validation was also conducted by manually checking if the means, standard deviations, geometric means, and log-normal standard deviations (for positive-defined variables) of all files fell in the expected ranges. A summary table of statistics for all files is available in our repository55. The script for calculating this table is available in the “Scripts and related software” dataset54.

The quality of the data from our previous studies was already verified in the original publications (referred in Tables 14), and in other additional publications12,16,17,41,42,43,44. Therefore, we technically validated these files by checking their ESRI-GRID version consistency with the original files.

As for the newly generated data, we assessed their quality by searching for evidence of the inferred trends and similarities in other studies (explained in the following sections).

### Time series cross-correlation analysis

We produced two charts to visually summarise (i) the parameter time series over the focus regions and (ii) the focus regions’ characterisation in terms of parameter variations (Figs. 4, 5). Moreover, in Table 5, we summarised the parameter trends confirmed by other studies. For each parameter, we also reported the percentage of regions (over the total nine regions) for which we found studies confirming or explaining the trends. The parameter charts highlight an inconstant trend of sea-surface and -bottom temperature across the regions. Moreover, sea temperature had a general increasing trend at the global scale (more than linearly for sea-bottom temperature), which several other studies have confirmed in the last decades56,57. Net primary production presented a globally increasing trend, in agreement with other studies58,59, and an overall decreasing trend in the Adriatic, Aegean, and the Black Sea also highlighted by other studies14,60,61. Sea-bottom dissolved oxygen presented a non-linear global-scale decrease in 2020 in all regions, also confirmed by other studies62,63,64. Sea-surface and -bottom salinity had a globally decreasing trend in several regions, probably because of ice melting and climate change-related freshwater fluxes65,66,67. A salinity increasing trend occurred for the Black Sea, also observed by another study68. A significant sea ice concentration variation occurred in the Baltic Sea, with an increasing trend up to 2019 followed by a lower value in 2020, reflecting the global trend. A decrease occurred in the Black Sea and the North Sea. These observations agree with other studies69,70,71,72.

Averaging time series cross-correlations across the areas revealed overall similarities between the parameter trends. We highlighted direct and inverse correlations in the comparison matrix to study the significant similarities. Specifically, we studied the direct and inverse correlations being at least moderate73, i.e., higher than 30% or lower than −30% (Table 6).

This analysis revealed the following time series similarities in the analysed time frame:

• Net primary production, on average, was inversely correlated with sea-bottom and -surface salinity, especially at the global scale and in the Adriatic, Aegean, Bay of Biscay, Black Sea, Levantine, and Western Mediterranean. These observations agree with those of other studies;14,74,75,76,77

• Sea-bottom salinity was generally positively correlated with sea-surface salinity in most seas except for the Black Sea, due to peculiar deep and shallow thermohaline dynamics78. It was also positively correlated with sea-bottom temperature in the Adriatic, Aegean, Bay of Biscay, and Levantine, as can also be inferred by other studies79,80,81,82;

• Sea-bottom dissolved oxygen was inverse-correlated with sea-surface temperature at the global scale, and in the Baltic, Black Sea, North Sea, and Western Mediterranean, as inferable also by other studies;63,64,78,83,84,85

• Sea-ice concentration had no significant correlation with the other parameters.

Repeating the same analysis by focus region highlighted the following moderate73 correlations (Table 7):

• The global scale had similar trends to those of the Baltic Sea (because of a similar ice concentration trend), Bay of Biscay, and Western Mediterranean because they shared averagely increasing net primary production and temperature trends56,57,58,59. Conversely, the global scale had different trends with respect to those of the Adriatic due to different parameter signal-phases;14,86

• The Adriatic Sea time series were correlated with those of the Aegean Sea through similar trends of all parameters87;

• The Aegean Sea time series were correlated with those of the Levantine Sea through similar sea-bottom dissolved oxygen and sea-bottom and -surface temperature trends87;

• The Baltic time series were correlated with those of the North Sea through all parameters except sea-ice concentration88,89;

• The Bay of Biscay had similar trends to those of the North Sea and the Western Mediterranean through net primary production, sea-bottom dissolved oxygen, temperature (only for the North Sea), and sea-bottom salinity (only for the Western Mediterranean) because of constant inter-connected water mass flow exchange90,91;

• The Black Sea had a standalone characterisation and non-significant cross-correlation with the other regions78;

The regions with higher cross-correlation were geographically connected regions that share sea-currents and partially overlap. Although the correlations generally do not correspond to habitat similarity, they might indicate similar area responses to inter-annual parameter variations and climate change17.

### Habitat similarities

To further explore if time-series cross-correlations were accompanied by habitat similarities, we calculated HRSs between the focus regions. We used a re-implementation of the HRS algorithm92, also available as a Web tool93 on the D4Science e-Infrastructure94,95,96,97,98. The HRSs were calculated on annual parameters from 2017 to 2020 (Table 8). The comparison was not reported for 2016 because HRS could not be calculated for all parameters. The global scale was excluded from the focus regions because calculating the HRS against much smaller areas would not have been meaningful due to the incommensurable data variabilities. HRSs were categorised as similar/dissimilar based on the threshold described in the Methods section. Numerical details are reported in the “Statistics, trends, HRS, PCA-loadings, and charts” dataset on our Fisgshare repository54.

The HRS table indicates that habitat similarity occurred between the Aegean and Adriatic Seas only in 2019 and 2020 and between the North Sea and Baltic Sea from 2017 to 2020. These regions also have generally similar parameter trends and are geographically connected. Habitat dissimilarity between the Aegean and Adriatic seas in 2018 and 2017 corresponds to a known effect of an anticyclonic Bimodal Oscillating System regime that prevented eastern waters from entering the Adriatic in those years86,99,100,101. General habitat similarity between the North Sea and the Baltic Sea has also been highlighted by other studies, unless regime shifts occur102,103,104. This overall similarity is also demonstrated by the many fishery-targeted species living in both the areas, e.g., Gadus morhua, Limanda limanda, Platichthys flesus, Pleuronectes platessa, Scophthalmus maximus, Scophthalmus rhombus, and Solea solea.

As for the other focus regions, the similar time series trends in the previous section did not correspond to habitat similarity. Thus, these regions can present similar inter-annual parameter changes but dissimilar parameter distributions.

### Habitat similarity drivers

The extracted PCA loadings (Table 9) shed light on the parameters’ variability over the years and their contributions to HRSs. This analysis highlighted that the similarity between Adriatic and Aegean seas was mainly driven by the sea-bottom temperature distribution. In the Aegean Sea, this parameter had a higher weight in 2020 and 2019 than in 2018 and 2017, and its distribution resembled the one of the Adriatic Sea in 2020 and 2019. The parameter contribution rankings in 2018 and 2017 in the Aegean Sea changed with respect to 2020 and 2019, in correspondence of the anticyclonic Bimodal Oscillating System regime effect86,99,100,101.

Habitat similarity between the North Sea and the Baltic Sea over the years mainly depended on the net primary production and sea-bottom dissolved oxygen distributions. These two were the only shared parameters between the regions that contributed to the PCA loadings. Although the parameter contribution ranking over the years in the Baltic Sea was variable, the similarity was overall good because of similar net primary production and sea-bottom dissolved oxygen distributions.

The parameter contribution rankings over the years across the other regions were variable. An abrupt change occurred in the Levantine Sea, where sea-bottom dissolved oxygen and temperature weights decreased from 2018 to 2019 and the net primary production and sea-surface temperature weights increased contextually. These variations likely corresponded to a dissolved oxygen reduction (and variance reduction) in the region caused by the peculiar Levantine Sea thermohaline flux105. This flux is indeed characterised by dissolved oxygen being inversely correlated with sea-surface temperature and directly correlated with deep-layer temperature increase106.

## Usage Notes

ESRI-GRID ASCII files can be visualised with GIS software, e.g., QGIS29, or ArcGIS30, by dragging and dropping files to the software interface.