Global-scale parameters for ecological models

This paper presents a collection of environmental, geophysical, and other marine-related data for marine ecological models and ecological-niche models. It consists of 2132 raster data for 58 distinct parameters at regional and global scales in the ESRI-GRID ASCII format. Most data originally belonged to open data owned by the authors of this article but residing on heterogeneous repositories with different formats and resolutions. Other data were specifically created for the present publication. The collection includes 565 data with global scale range; 154 at 0.5° resolution and 411 at 0.1° resolution; 196 data with annual temporal aggregation over ~10 key years between 1950 and 2100; 369 data with monthly aggregation at 0.1° resolution from January 2017 to ~May 2021 continuously. Data were also cut out on 8 European marine regions. The collection also includes forecasts for different future scenarios such as the Representative Concentration Pathways 2.6 (63 data), 4.5 (162 data), and 8.5 (162 data), and the A2 scenario of the Intergovernmental Panel on Climate Change (180 data).


Background & Summary
The Good Environmental Status of European Seas (GES) 1 is the European goal of reaching the sustainably of stock and environment exploitation and no loss of biodiversity and ecosystem services. It is the primary goal of several European strategic frameworks such as the Marine Strategy Framework Directive (MSFD), the Maritime Spatial Planning Directive (MSP), the Green Deal and Blue Growth strategies, and the EU Biodiversity Strategy for 2030 [2][3][4][5][6][7] . This goal is challenging in the current context of increasing energy, food demand, and climate change. Scientific approaches that address GES require processing marine data of ecosystems to assess ecosystem services, biodiversity, and stock status. They also require multi-disciplinary modelling approaches to extract valuable knowledge from the data 6 . Recently, international projects such as EcoScope 8 , have been fostering the shift from traditional "vertical" modelling approaches -focussing on one species, stock, or ecosystem service independently of the other -to "horizontal" approaches, which combine multi-species, environmental, and social dynamics 9,10 . However, these approaches require huge amounts of high-quality data to produce meaningful knowledge 11,12 . In particular, environmental, geophysical, world-population, and marine-region data are crucial to model species habitats 13,14 , understand the response and resilience of marine areas to climate change [15][16][17] , assess stock status and fisheries pressure on stocks [18][19][20] , and build ecosystem models [21][22][23][24] .
This paper describes an extensive data collection of harmonised and standardised global-scale parameters, with associated long-term forecasts under different greenhouse gas emission and societal development scenarios. The collection aims at supporting ecological, ecosystem, and ecological-niche models within horizontal approaches to marine resource management. Figure 1 summarises our workflow. We harmonised and standardised geospatial data from our own heterogeneous resources and publications that had newly produced or re-processed these data. Some data were previously available in custom formats (e.g., CSV or text files), which meant they were not as accessible as they could be. Additionally, we specifically produced other data to complement the collection. The primary sources involved were (i) environmental data produced for the AquaMaps ecological niche models, (ii) data from the Italian National Research Council (CNR) studies on marine science, Earth science, and epidemics that re-processed or newly produced open-access data based on other sources, and (iii) data produced by the Quantitative Aquatics (Q-quatics) non-governmental organisation for ecosystem and ecological models.
The complete list of environmental data with their primary and secondary sources is reported in Tables 1-4, grouped by resolution and parameter type. Data harmonisation consisted of correcting errors and aligning the data to the same coordinate grids, with either 0.1° or 0.5° resolutions. The format of the published data is ESRI-GRID ASCII. All data have a global-scale range but are also cut out on 8 European marine areas of 1 Institute of Information Science and Technologies, Italian National Research Council, Pisa, 56124, Italy. 2 Quantitative Aquatics, Inc., Los Baños, 4031, Laguna, Philippines. ✉ e-mail: gianpaolo.coro@cnr.it DATA DESCRIPTOR OPEN Table 1. Data at 0.1° resolution at the global scale available in our repository, with indication of the related primary and secondary sources. The asterisks (*) indicate the data that were specifically produced for this article.
1. Sea-bottom and sea-surface dissolved oxygen, salinity, and temperature 2. Sea net primary production 3. Sea ice concentration 4. Average, minimum, maximum sea depth 5. Average, minimum, maximum elevation 6. Distance of a square marine area from land and its fraction covered by water 7. The characterization of each data cell in terms of which Large Marine Ecosystem (LME), Exclusive Economic Zone (EEZ), Marine Ecoregions of the World (MEOW), and Major Ocean Basins they belong to, and whether or not it sits in a Marine Protected Area (MPA) 8. Number of islands 9. Water area that lies within the shelf, slope, and abyssal zones 10  After the data collection phase, we harmonised all global-scale data from their primary sources' geodetic systems to the same global-scale grid and projection, i.e., the WGS 84-EPSG:4326 geodetic system with equirectangular projection. We set two square grids for the data, at 0.5° and 0.1° depending on the original resolutions. The original files had heterogeneous formats, from raw text (CSV, XYZ) to more structured formats (NetCDF,  Table 3. Data of geophysical parameters at 0.5° resolution at the global scale available in our repository, with indication of the related primary and secondary sources. www.nature.com/scientificdata www.nature.com/scientificdata/ ESRI-GRID). All files were first aligned to the same grid and checked for inconsistency and offset by comparing each grid point with the expected original data value. Eventually, they were converted to the ESRI-GRID ASCII format (ASC) 28 . ESRI-GRID is a standard format approved by the Open Geospatial Consortium (OGC), a worldwide community that assesses standards and protocols to improve access to geospatial data. This format allows for inspecting the data with text processing software as well as visualising them with commonly used Geographic Information System (GIS) software (e.g., QGIS 29 , and ArcGIS 30 ). The format is also the most frequently accepted by ecological niche modelling and ecosystem modelling software (e.g, MaxEnt 31 and Ecopath with Ecosim 22,32-34 ) and most programming languages have libraries for parsing it 35,36 . In our harmonisation and standardisation workflow, one ASC file corresponds to one parameter in a specific year (or month) and location. This correspondence makes the files easily convertible into other formats (e.g., in NetCDF format through the GDAL software 37 ). Overall, the ESRI-GRID ASCII format was optimal for our collection's scope of supporting ecological and ecosystem models and climatic analyses.
All data were also cut out on 8 European marine areas of particular economic or ecosystem importance, identified by the EcoScope European Project community of practice. These areas (hereafter named focus regions) were: 1. The global-scale 2. The Adriatic Sea 3. The Aegean Sea 4. The Baltic Sea 5. The Bay of Biscay 6. The Black Sea 7. The Levantine Sea 8. The North Sea 9. The Western Mediterranean Sea The areas were geographically identified according to the corresponding marine eco-region (Adriatic, Aegean, Baltic, Levantine, the North Sea) or International Hydrographic Organization region (Bay of Biscay, the Black Sea, Western Mediterranean Sea) 38,39 .
The temporal coverage of our data collection is of ~10 years within the period 1950-2100. Forecasts for 2050 and 2100 are available under different greenhouse gas emission scenarios, i.e., RCP 2.6 (low emission), 4.5 (medium emission), and 8.5 (high emission), although the RCP 2.6 scenario was not available for 2050. Moreover, some forecasts for the IPCC SRES A2 scenario (which hypothesises a future of independent, self-reliant nations with constantly increasing population and regionally diversified economic development, slow technological change, and worldwide use of nuclear energy) were also available and included in the collection.
Data harmonisation for text files was conducted through a dedicated Java process 40 that managed the different formats, aligned the data to a resolution-specific grid, and finally produced one ASC file. As for primary sources with NetCDF and ESRI-GRID formats, we performed manual checking, alignment, and band extraction through QGIS. Conversion to ASC format was done through GDAL. No-data locations were all assigned a default −9999 value, specified in the ASC file header through the NODATA attribute, which makes it automatically interpreted and used by GIS software for consumption and visualisation. Data with non-homogeneous resolution over longitude and latitude were homogenised through nearest-neighbour and bilinear interpolation separately, via QGIS. Earthquake and high-resolution temperature and precipitation data were left to their original aggregated temporal range to represent an aggregated reference of a recent past.  Table 4. Data of world population and marine-region parameters at 0.5° resolution at the global scale available in our repository, with indication of the related primary and secondary sources.
www.nature.com/scientificdata www.nature.com/scientificdata/ Newly produced data and time series analysis. We conducted a spatiotemporal analysis on the focus regions to find evidence of similarities between parameter trends over the years. Then we checked for agreement with outputs of other studies as a further data validation. We focussed this analysis on a data collection subset containing newly produced data at 0.1° resolution, annually aggregated from 2016 to 2020 (Table 1). We selected these data because they were not previously explicitly validated in other publications, and were thus differentiated from the other data whose content was instead validated in other publications 12,16,17,[41][42][43][44] . The selected data were the following (Fig. 2): 1. Sea-surface temperature 2. Sea-bottom temperature 3. Sea-ice concentration 4. Sea-surface salinity 5. Sea-bottom salinity 6. Sea net primary production 7. Sea-bottom dissolved oxygen These parameters are generally used by the AquaMaps ecological niche models 41 that assume they include sufficient information to assess global species presence 16 . It is important to note that 2016 data were not available for sea-bottom salinity, sea net primary production, and sea-bottom dissolved oxygen.
We used ocean products from the Copernicus Marine Service 45 to produce the new data for the seven environmental parameters above. NetCDF data for mean monthly sea surface and bottom temperature, sea surface and sea bottom salinity, and sea ice concentration were re-processed based on the Global Ocean 1/12° Physics Analysis and Forecast 001-024 monthly dataset 46 , that natively used the WGS 84-EPSG:4326 geodetic system and equirectangular projection and had 0.083° spatial resolution. Mean monthly data for net primary production and dissolved oxygen were obtained from two temporally complementary datasets: the Global Ocean Biogeochemistry Analysis and Forecast 001-028 monthly dataset 47 (in WGS 84-EPSG:4326 geodetic system and projection) and the Global Ocean Biogeochemistry Hindcast 001-029 monthly dataset 47 (in ETRS 89-EPSG:4258 geodetic system and projection), both having a spatial resolution of 0.25°. Global monthly data for sea surface and bottom temperature, sea surface and bottom salinity, sea ice concentration were rasterized and resampled using the R-Terra package 48 , upscaling to 0.1° spatial resolution using bilinear interpolation. Net primary production and dissolved oxygen data were all reprojected to WGS 84-EPSG:4326 prior to rasterization www.nature.com/scientificdata www.nature.com/scientificdata/ and resampled by downscaling to 0.1° spatial resolution using bilinear interpolation. We carried out this process only on one depth layer (either surface or bottom) for all parameters, except for sea bottom salinity and bottom dissolved oxygen. These two parameters required the resampling of up to 72 depth levels (0.5 m to 5902 m) per month, then extracting data at the maximum depth layer per 0.1° grid cell before compiling them into corresponding monthly sea bottom salinity and sea bottom dissolved oxygen raster layers. We then used the resampled monthly rasters to compute the annual means for each of the seven parameters. The annual mean data were saved as GeoTIFF and CSV formats and were manually inspected for exact correspondence through coordinate mapping in ArcGIS 30 . Cases where precedent resampling to 0.1° spatial resolution had yielded marginal rows (along 89.95°N or 76.95°S) or a marginal column (along 179.95°E) with missing data were resolved by copying parameter values directly from the neighboring row or column. This approach was considered reasonable in view of the spatial resolution of the data. The final outputs were exported as CSV files and underwent the data harmonisation and standardisation process depicted in Fig. 1.
To validate the data, annual average values per region were first extracted and visualised for each parameter to compare trends across all regions (section "Technical Validation"). Moreover, each region was characterised through its associated parameter time series. Average time series 0-lag cross-correlation was used for numerical comparison. Specifically, it was calculated per parameter across all focus regions, and per region across all parameters. These analyses highlighted general and regional parameter time series similarities. Confirmation of these similarities with that seen in other scientific studies was used to assess the reliability of the data in representing valid ecological macro-patterns.
Habitat representativeness score. Parameter time series cross-correlation might indicate that two regions were subject to similar average parameter variations. This condition might correspond to similar habitats over time in geographically connected regions if the parameters have similar ranges and distributions. A species' ecological niche is, mathematically, the space within a hyper-volume in a vector space of environmental parameters associated with the species' proliferation. Understanding general habitat similarity between two regions is equivalent to assessing the similarity between the parameter hyper-volumes over the two regions, independently of the species. This assumption is reasonable if the involved parameter set is complete enough for ecological niche modelling. Correlated region-specific parameter time series do not necessarily indicate similar habitats, because parameter distributions' similarity and geographic reachability are also required. Habitat similarity, www.nature.com/scientificdata www.nature.com/scientificdata/ which depends on annual parameters' distributions, can also change over the years and is thus complementary information with respect to time series correlations. Habitat dissimilarity after a specific year, unnecessarily corresponding to lower time series cross-correlation, likely indicates that an abrupt event made two regions different.
We conducted habitat similarity analysis over the years between our focus regions to study these variations and search for confirmations in other studies. Specifically, we used the Habitat Representativeness Score (HRS) 49 to measure habitat similarity. HRS is an algorithm based on Principal Component Analysis (PCA) 50 that measures the overall difference of the data distributions between two regions, across the largest data variance directions (principal components). HRS has been used to understand the principal environmental drivers of species presence in distant regions 51 and to assess ecological survey completeness 49 . The algorithm works with two inputs: a reference region A and a test region B. As the output, it calculates a score interpretable as the representativeness of habitat B by habitat A (HRS(A, B)). Each region is characterised through vectors of environmental parameters. PCA is conducted on the reference region (A) vectors to extract major data variance axes (i.e., the principal components). An optional threshold, set on the principal components' eigenvalues, can restrict the comparison to the largest variance axes. In our validation experiment, we selected components covering up to 95% of the total data variance. Then, the normalised data frequency distribution of the vectors on each axis is calculated and subdivided into equal-frequency bins. The region B vectors are then projected onto the principal components of region A. The B parameter frequencies over the A principal components are calculated across the same bins estimated for A. Finally, the pairwise differences between the bin frequencies are calculated for all principal components. The HRS is the sum of these pairwise differences. Since bin frequencies sum to 1 on each axis, the HRS ranges from 0 to the number of principal components (N), with N representing completely different habitats and 0 perfect habitat similarity.
We calculated a pairwise HRS matrix to discover significant habitat similarities between the focus regions. However, HRS is an asymmetric function by construction because PCA conducted on region A and projected on B likely gives different results than PCA conducted on region B and projected on A. One possible estimation of the overall HRS between A and B is the mean between HRS(A,B) and HRS(B,A) 52 . This choice also makes the HRS matrix symmetric and facilitates the similarity analysis. Therefore, we used average HRS as the region-pair score. For each region, we standardised the scores by dividing the value by the total HRS range. We finally assessed as "similar" those region pairs emerging from the standardisation by more than 10%. We repeated this analysis for all annual data between 2016 and 2020 to study habitat similarity stability over the years between the focus regions. Finally, we verified evidence of the detected similarity stability and instability in other studies. www.nature.com/scientificdata www.nature.com/scientificdata/ Detecting major habitat similarity drivers. Using PCA over the focus regions' parameters allowed for consideration of the variables that primarily contributed to the largest principal components and thus to the HRS 50,53 . In particular, the PCA principal axes (eigenvectors) and their eigenvalues allow for defining loading vectors as = ⋅ loading eigenvector eigenvalue. Each loading is a vector containing as many elements as the number of original environmental parameters, and there is one loading for every PCA axis. The loading vector elements represent the original environmental parameters' contributions to the corresponding PCA axis (weights). Keeping only the major PCA axes (i.e., those with the largest eigenvectors) allows for the analysis to be focussed on the largest data variance and exclude noise. Average parameter weight across the loadings measures the average contribution to the principal components by each parameter and thus the contribution to the HRS. Therefore, the parameters with the largest average weights are the major drivers of the estimated HRSs and thus of the detected similarities. For the present loadings analysis, we selected the principal components covering up to 95% of the total data variance and the environmental parameters with a non-zero positive average weight.

Data Records
We made the data available on a public-access Figshare repository 54 . The collection is composed of 6 datasets. The principal datasets are "Environmental Geophysical Marine Socioeconomic parameters at 0.1° and 0.5° resolutions" and "Monthly data at 0.1° resolution". Internally, they are structured with a folder hierarchy that optimises search time for an ecological niche modelling expert (Fig. 3). The first dataset separates 0.5° and 0.1° spatial resolution files in two main folders. The 0.5° resolution folder contains one sub-folder each for RCP 2.6, 4.5, and 8.5, the IPCC SRES A2 forecast scenario, and historical data (named HISTORICAL). Each sub-folder is organised by year. For example, the RCP 4.5, RCP 8.5, and IPCC SRES A2 folders contain the 2050 and 2100 sub-folders. The RCP 2.6 folder contains only the 2100 folder. The HISTORICAL data folder contains year-specific sub-folders from 1950 to 2019 and two additional folders for the 1900-2008 and 2000-2014 temporal aggregations. Each annual sub-folder contains one sub-folder for each focus region (9 total), which in turn contains the ESRI-GRID parameter files with the specific resolution, scenario, time reference, and region corresponding to the file path and the metadata. Each file name contains information to reconstruct the path. For instance, Sea-surface_tem-perature_res_05_annual_years_2019_Clim_scen_historical_regional_Adriatic_Sea.asc indicates a file containing annual-aggregated sea-surface temperature data, at 0.5° resolution, in 2019, within the HISTORICAL data sub-set, and cut out on the Adriatic Sea. The 0.1° annual data root folder has the same structure as the 0.5° root www.nature.com/scientificdata www.nature.com/scientificdata/ folder but contains only historical data. The years involved in our data collection are : 1950, 1997, 1999, 2011, 2013, 2016, 2017, 2018, 2019, 2050, and 2100. Some files, e.g., those of gas concentration of methane and nitrous oxide, have a variant file with the "bilinear" attribute in the file name to indicate that bilinear interpolation was used instead of nearest neighbour to homogenise coordinate resolutions. The adopted folder structure allows an ecological niche modelling expert to find aligned files in one folder and directly use them in modelling software like MaxEnt 31 , e.g., to quickly model a species' distribution at a 0.5° spatial resolution in 2019 in the Adriatic.
The monthly dataset has a folder structure organised by parameter name. Each parameter folder contains one sub-folder for each year, which in turn contains monthly ESRI-GRID files. This structure is conceived to facilitate monthly parameter analyses.
The complete file collection contains 2132 files. Global-scale data are 565; 154 have 0.5°resolution and 411 have 0.1° resolution. Among the 0.1° resolution data, 369 have a monthly aggregation and 196 an annual aggregation. Forecasts are available for 2050 and 2100, and overall include 63 files for RCP 2.6 (only in 2100), 162 for RCP 4.5, 162 for RCP 8.5, and 180 for IPCC SRES A2.
Additional datasets in the collection (in the "Statistics, trends, HRS, PCA-loadings, and charts" and "File list and statistical properties" datasets) contain summary tables and charts with standard statistics (mean, standard deviation, geometric mean, log-normal standard deviation), cross-correlations, HRS estimates, and PCA loadings that we used for the technical validation. The Figshare repository also contains all R scripts, Java software links, and references to the programs used to conduct the technical validation (in the "Scripts and related software" dataset).

technical Validation
Consistency with respect to the original data. Each produced ESRI-GRID file was defined on a regular spatial grid. Therefore, as a first consistency check, we exhaustively verified that all grid data corresponded to the expected original data. In particular, we systematically sampled from each ESRI-GRID file and pairwise checked if the samples corresponded, through coordinate mapping, to the expected values in the original dataset. As for interpolated coordinates, the nearest neighbour value in the original file was taken as the validation reference. This operation allowed us to detect conversion and misalignment errors, which we later adjusted for exact correspondence with the original files. We conducted this operation with a specific Java-based program for text files 40 , and with QGIS and GDAL for NetCDF and ASC files. General content validation was also conducted by manually checking if the means, standard deviations, geometric means, and log-normal standard deviations (for positive-defined variables) of all files fell in the expected ranges. A summary table of statistics for all files is available in our repository 55 . The script for calculating this table is available in the "Scripts and related software" dataset 54 .
The quality of the data from our previous studies was already verified in the original publications (referred in Tables 1-4), and in other additional publications 12,16,17,[41][42][43][44] . Therefore, we technically validated these files by checking their ESRI-GRID version consistency with the original files.
As for the newly generated data, we assessed their quality by searching for evidence of the inferred trends and similarities in other studies (explained in the following sections). time series cross-correlation analysis. We produced two charts to visually summarise (i) the parameter time series over the focus regions and (ii) the focus regions' characterisation in terms of parameter variations (Figs. 4, 5). Moreover, in Table 5, we summarised the parameter trends confirmed by other studies. For each parameter, we also reported the percentage of regions (over the total nine regions) for which we found studies confirming or explaining the trends. The parameter charts highlight an inconstant trend of sea-surface and -bottom temperature across the regions. Moreover, sea temperature had a general increasing trend at the global scale (more than linearly for sea-bottom temperature), which several other studies have confirmed in the last decades 56,57 . Net primary production presented a globally increasing trend, in agreement with other studies 58,59 , and an overall decreasing trend in the Adriatic, Aegean, and the Black Sea also highlighted by other studies 14,60,61 . Sea-bottom dissolved oxygen presented a non-linear global-scale decrease in 2020 in all regions, also confirmed by other studies [62][63][64] . Sea-surface and -bottom salinity had a globally decreasing trend in several regions, probably  Table 5. Summary table of the trends observed in our data that agreed with other studies, and the percentage of regions (over the total 9 regions) for which we found trend confirmation in other studies.
www.nature.com/scientificdata www.nature.com/scientificdata/ because of ice melting and climate change-related freshwater fluxes [65][66][67] . A salinity increasing trend occurred for the Black Sea, also observed by another study 68 . A significant sea ice concentration variation occurred in the Baltic Sea, with an increasing trend up to 2019 followed by a lower value in 2020, reflecting the global trend. A decrease occurred in the Black Sea and the North Sea. These observations agree with other studies [69][70][71][72] .   www.nature.com/scientificdata www.nature.com/scientificdata/ Averaging time series cross-correlations across the areas revealed overall similarities between the parameter trends. We highlighted direct and inverse correlations in the comparison matrix to study the significant similarities. Specifically, we studied the direct and inverse correlations being at least moderate 73 , i.e., higher than 30% or lower than −30% (Table 6). This analysis revealed the following time series similarities in the analysed time frame: • Net primary production, on average, was inversely correlated with sea-bottom and -surface salinity, especially at the global scale and in the Adriatic, Aegean, Bay of Biscay, Black Sea, Levantine, and Western Mediterranean. These observations agree with those of other studies; 14,74-77 • Sea-bottom salinity was generally positively correlated with sea-surface salinity in most seas except for the Black Sea, due to peculiar deep and shallow thermohaline dynamics 78 . It was also positively correlated with sea-bottom temperature in the Adriatic, Aegean, Bay of Biscay, and Levantine, as can also be inferred by other studies [79][80][81][82] ; • Sea-bottom dissolved oxygen was inverse-correlated with sea-surface temperature at the global scale, and in the Baltic, Black Sea, North Sea, and Western Mediterranean, as inferable also by other studies; 63,64,78,83-85 • Sea-ice concentration had no significant correlation with the other parameters.
Repeating the same analysis by focus region highlighted the following moderate 73 correlations (Table 7): • The global scale had similar trends to those of the Baltic Sea (because of a similar ice concentration trend), Bay of Biscay, and Western Mediterranean because they shared averagely increasing net primary production and temperature trends [56][57][58][59] . Conversely, the global scale had different trends with respect to those of the Adriatic due to different parameter signal-phases; 14  www.nature.com/scientificdata www.nature.com/scientificdata/ The regions with higher cross-correlation were geographically connected regions that share sea-currents and partially overlap. Although the correlations generally do not correspond to habitat similarity, they might indicate similar area responses to inter-annual parameter variations and climate change 17 .  Table 8. Habitat similarity highlights, based on the Habitat Representativeness Score algorithm, between the focus regions over the years. Checkmarks indicate significant habitat similarity; empty cells indicate nonsignificant similarity. www.nature.com/scientificdata www.nature.com/scientificdata/ The HRS table indicates that habitat similarity occurred between the Aegean and Adriatic Seas only in 2019 and 2020 and between the North Sea and Baltic Sea from 2017 to 2020. These regions also have generally similar parameter trends and are geographically connected. Habitat dissimilarity between the Aegean and Adriatic seas in 2018 and 2017 corresponds to a known effect of an anticyclonic Bimodal Oscillating System regime that prevented eastern waters from entering the Adriatic in those years 86,[99][100][101] . General habitat similarity between the North Sea and the Baltic Sea has also been highlighted by other studies, unless regime shifts occur [102][103][104] . This overall similarity is also demonstrated by the many fishery-targeted species living in both the areas, e.g., Gadus morhua, Limanda limanda, Platichthys flesus, Pleuronectes platessa, Scophthalmus maximus, Scophthalmus rhombus, and Solea solea.
As for the other focus regions, the similar time series trends in the previous section did not correspond to habitat similarity. Thus, these regions can present similar inter-annual parameter changes but dissimilar parameter distributions.
Habitat similarity drivers. The extracted PCA loadings (Table 9) shed light on the parameters' variability over the years and their contributions to HRSs. This analysis highlighted that the similarity between Adriatic and Aegean seas was mainly driven by the sea-bottom temperature distribution. In the Aegean Sea, this parameter had a higher weight in 2020 and 2019 than in 2018 and 2017, and its distribution resembled the one of the Adriatic Sea in 2020 and 2019. The parameter contribution rankings in 2018 and 2017 in the Aegean Sea changed with respect to 2020 and 2019, in correspondence of the anticyclonic Bimodal Oscillating System regime effect 86,[99][100][101] .
Habitat similarity between the North Sea and the Baltic Sea over the years mainly depended on the net primary production and sea-bottom dissolved oxygen distributions. These two were the only shared parameters between the regions that contributed to the PCA loadings. Although the parameter contribution ranking over the years in the Baltic Sea was variable, the similarity was overall good because of similar net primary production and sea-bottom dissolved oxygen distributions.
The parameter contribution rankings over the years across the other regions were variable. An abrupt change occurred in the Levantine Sea, where sea-bottom dissolved oxygen and temperature weights decreased from 2018 to 2019 and the net primary production and sea-surface temperature weights increased contextually. These variations likely corresponded to a dissolved oxygen reduction (and variance reduction) in the region caused by the peculiar Levantine Sea thermohaline flux 105 . This flux is indeed characterised by dissolved oxygen being inversely correlated with sea-surface temperature and directly correlated with deep-layer temperature increase 106 .

Usage Notes
ESRI-GRID ASCII files can be visualised with GIS software, e.g., QGIS 29 , or ArcGIS 30 , by dragging and dropping files to the software interface.

Code availability
Software to transform text files into ESRI-GRID ASC files is openly available on the GitHub 40 . Software to calculate Habitat Representativeness Score and PCA loadings is also openly available on the GitHub 92 and through a Web interface in the D4Science e-Infrastructure (RPrototypingLab VRE) 93 . R scripts to calculate cross-correlation and parameter statistics are available on our public Figshare repository 54 , in the "Script and Related Software" dataset. (2023) 10:7 | https://doi.org/10.1038/s41597-022-01904-3 www.nature.com/scientificdata www.nature.com/scientificdata/