Introduction

Fine-scale knowledge on the distribution of species and habitats is crucial for effective management and conservation of marine resources1,2,3,4. Conservation prioritisation exercises require good quality information on the spatial distribution of vulnerable species and their associated habitats, including different life history stages5. Such spatial information is also critical to decision-makers and managers, so that marine resources are sustainably exploited and other human activities (e.g. extractive industries, maritime transport, fisheries and aquaculture) seek to minimise negative impacts6. At present, data available to address these issues typically consist of sparse geo-referenced information on species and habitat occurrences. At best, absence records for non mobile species and benthic habitats are usually only available for a limited number of sites since the absence of a species or habitat is only ascertained when a given site has been systematically surveyed. Mapping marine biodiversity remains operationally complicated and expensive, with the result that fine-scale knowledge of species and habitat distributions is unavailable for most marine areas7,8,9.

In the Mediterranean Sea, there have been several attempts at assessing the distribution patterns of species and habitats across the entire basin, based on literature reviews10,11,12,13,14. Recently, Giakoumi et al.15 assessed potential spatial priorities for the conservation of three Mediterranean habitats (Posidonia oceanica meadows, coralligenous formations and marine caves) by considering their eco-regional representation, as well as the opportunity costs for fisheries and aquaculture. These efforts are partly driven by existing, as well as emerging policies at the national and international levels. Member States (MS) of the European Union (EU) have, for instance, committed to collating knowledge on the distribution of species and habitats and assessing their ecological status, as part of an ecosystem approach to marine management. Based on this information, targets and associated indicators will be used to guide progress towards achieving ‘Good Environmental Status’ (GES) in MS' marine waters by 202016. A similar process is emerging at the scale of the Mediterranean basin for the Contracting Parties to the Barcelona Convention (1976). To date, these efforts have documented important gaps in knowledge, especially on species and habitats that are considered of critical importance for the Mediterranean Sea and its conservation15.

Bioconstructions such as coralligenous outcrops and maërl beds are typical Mediterranean underwater seascapes, comprising coralline algal frameworks that grow in dim light conditions17. They are the result of the building activities of algal and animal constructors, counterbalanced by physical, as well as biological, eroding processes. Because of their extent, biodiversity and production, coralligenous and maërl habitats rank among the most important ecosystems in the Mediterranean Sea17,18,19,20,21 and they are considered of great significance both for fisheries22 and carbon regulation23,24.

Mechanical disturbance and re-suspension of nearby sediments, particularly by bottom trawling, is probably the most destructive human activity currently affecting coralligenous outcrops and maërl beds17,25,26. Other threats include pollution (e.g. wastewater discharge, aquaculture), which results in increased turbidity and sedimentation, but also direct habitat destruction through artisanal and recreational fishing (e.g. fishnets, long-lines), coastal or offshore construction activities (including submarine cables) and unregulated diving activities and anchoring17,19,27. Climate change is also known to affect several key species that are part of coralligenous habitats, by increasing the incidence of thermal anomalies (e.g.28,29,30) and storms31. Some invasive algal species (Womersleyella setacea, Acrothamnion preissii, Caulerpa racemosa v. cylindracea and C. taxifolia) can also pose a severe threat to these communities, either by forming dense carpets (i.e. physical barriers) or by increasing sedimentation26,32,33,34,35. Such a pervasive range of impacts, coupled with the slow growth rates and long recovery periods of these systems, have driven efforts aimed at conserving them.

Although not legally binding, the Barcelona Convention's ‘Action plan adopted in 2008 for the conservation of coralligenous outcrops and other calcareous bio-concretions in the Mediterranean Sea’ asserts that “coralligenous/maërl assemblages should be granted legal protection at the same level as Posidonia oceanica meadows”26. Coralligenous outcrops also appear in the EU's Habitats Directive36 (under 1170 Reefs) and in the Bern Convention37. Two of the most common maërl-forming Mediterranean species, Lithothamnion corallioides and Phymatolithon calcareum, are included in Annex V of the Habitats Directive. Finally under European law38, destructive fishing is prohibited over Mediterranean coralligenous and maërl bottoms. The substantial lack of relevant geospatial data, however, significantly hinders the effective implementation of these policies11,19. Giakoumi et al.15 recently produced a basin-scale distribution map integrating all benthic assemblages thriving on hard substrata of biogenic origin and under low irradiance levels, along with rhodolith beds in coastal detritic bottoms, selected deep-sea habitats (e.g. seamount peaks, offshore rocky banks) and some deep coral communities. There was no attempt at discriminating between these very different systems and spatial planning analyses were carried at coarse spatial resolution (presence/absence in a grid of 10 km cell size). Continuous spatial information hence remains unavailable, hampering the development of effective spatial measures to protect coralligenous outcrops and maërl beds.

A number of modelling techniques can be used to fill gaps in the knowledge of the spatial distribution of species and habitats by predicting the location of areas that are likely to be suitable for a species or a community to live39,40,41. Models are usually based on physical and environmental variables (e.g. water temperature, salinity, depth, nutrient concentrations, seabed types, etc), which are typically easier to record and map across vast expanses (i.e. regional, global scale) in contrast to species and habitat data42,43,44. Despite inherent limitations and associated uncertainties, predictive modelling is a cost-effective alternative to field surveys as it can help identifying and mapping where sensitive marine ecosystems may occur.

In the present study, a thorough review of existing spatial datasets showing the distribution of coralligenous and maërl habitats across the Mediterranean Sea was undertaken, with particular attention given to the basin's eastern and southern parts, where data have traditionally been limited in regional reviews (see for instance11). Based on the collated spatial datasets (parts of which are, to date, unpublished), predictive modelling was carried out to produce the first continuous maps of these two habitats across the Mediterranean Sea. We anticipate that our results will be critical (i) for the development of basin-wide spatial planning initiatives (including representative networks of marine protected areas) based on realistic information on habitat distribution and (ii) to guide cost-effective future surveys and monitoring efforts towards areas that are presently poorly-sampled and under-represented in current conservation planning exercises.

Results

Occurrence datasets for coralligenous outcrops and maërl beds

Datasets on coralligenous outcrops and maërl beds came from a total of 17 countries (Supplementary Table S2) and in a wide variety of formats: from shapefiles and lossless rasters, to image maps in paper format, or electronic format with information loss through compression. The datasets were found to be heterogeneous, with scales from 14,000 to 1250,000 and un-standardised legends, even within the same country. The collated coralligenous outcrops dataset was composed of 4,293 points, 12 lines and 23,632 polygons (Figure 1a). That of maërl beds had 416 points and 748 polygons (Figure 1b). Together, the surface areas corresponding to the polygons only amounted to 2,763.4 km2 (coralligenous outcrops) and 1,654.5 km2 (maërl beds). Point and line data do not have associated surface areas. Thus, they were used in the modelling but not for surface areas estimates.

Figure 1
figure 1

Occurrences of (a) coralligenous outcrops and (b) maërl beds across the Mediterranean Sea, as extracted from the review work and used in the distribution models (see text for details).

Data sources are listed in the Supplementary References. Boundaries of point/line/polygon features of the data layers have artificially been enhanced so that very small-scale occurrences are visible on the illustrative maps shown here. As a result, surface areas covered by these habitats appear much larger than they are in reality (e.g. around Malta). These data layers are more accurately viewed within a Geographic Information System. Maps were created using ArcGIS software by Esri (Environmental Systems Resource Institute, ArcMap 10.1, (www.esri.com).

We estimated that, combining all the collected information in terms of points/lines/polygons and only considering the length of the coast where data had been retrieved, datasets provided information for approximately 30% of the coasts of the Mediterranean basin. For the remaining coastline (70%), no further information could be found and/or accessed. In situ depth of occurrences collected from the publications (see the Supplementary References) revealed that records were located between 10 and 140 m (Figure 2), peaking in the shallower sector of this range. Based on this, modelling was restricted to the 0–200 m depth zone. Published information was found to be insufficient for deriving a value of sampling effort across depth bins.

Figure 2
figure 2

In situ depths of occurrences for coralligenous outcrops and mäerl beds, as extracted from the review work.

Scientific information on these two habitats remained unevenly distributed, essentially because the majority of systematic studies have taken place in the western Mediterranean. Areas for which information was previously unavailable were, however, much better covered by the present study, particularly the eastern Mediterranean Sea. Important new information was gained from Malta, Italy, France (Corsica), Spain, Croatia, Greece, Albania, Algeria, Tunisia and Morocco, making the present datasets the most comprehensive to date. In Malta and Italy, knowledge was particularly extensive. Distribution maps of bioconstructions were available in shapefiles for several portions of the Italian coastline, covering continuous stretches of coasts of hundreds of kilometres (i.e. Ligurian Sea, Tyrrhenian Sea, Apulia, Sicily). Still, there were areas of the Mediterranean Sea where data remained extremely scarce (Albania, Algeria, Cyprus, Israel, Libya, Montenegro, Morocco, Syria, Tunisia and Turkey) or totally absent (Bosnia and Herzegovina, Egypt, Lebanon and Slovenia). Knowledge on maërl beds was somewhat limited compared to what was available for coralligenous outcrops; a significant update was nevertheless achieved. Previously unknown spatial information on maërl distribution was brought to light for Greece, France (Corsica), Cyprus, Turkey, Spain and Italy. Malta and Corsica, in particular, had significant datasets for this habitat as highlighted by fine-scale surveys in targeted areas.

Coralligenous outcrops occurrence model

A total of 11,174 presence points (i.e. the training set derived from the combined polygon, line and point occurrence dataset) were used to model the occurrence of coralligenous outcrops across the Mediterranean Sea. The final model based on this training set retained six variables from the starting subset of 12 and the AUCs were 0.80 for the training set and 0.77 (standard deviation 0.003) for the geographically independent test set of 5,581 points (Figure 3a). Bathymetry, slope of the seafloor and nutrient input were the three main contributors to the model (combined contribution of 84.1%; Table 1), whilst the remaining three predictors (euphotic depth, phosphate concentration and geostrophic velocity of sea surface current) had a combined contribution of 16%. Variable response curves suggested a unimodal response, in support of grouping these species together, at the spatial scale considered.

Table 1 Relative contributions of each predictor variable to the coralligenous outcrops distribution model
Figure 3
figure 3

ROC (Receiver Operating Characteristic) curves for the training and test sets of (a) coralligenous outcrops and (b) maërl beds.

AUC: Area Under the Curve.

Based on a jackknife test of variable importance (for the test gain; Supplementary Figure S5a), the predictor variable with the highest gain when used in isolation was nutrient input, which therefore appeared to have the most useful information by itself. The predictor variable that decreased the gain the most when it was omitted was euphotic depth, which therefore had the most information that was not present in the other predictor variables. The jackknife test on the test set's AUC (Supplementary Figure S5b) confirmed that bathymetry, slope of the seafloor and nutrient input were the main contributors to the model and highlighted the role of sea surface current in predicting the occurrence of coralligenous outcrops.

Areas predicted to have suitable conditions for the occurrence of coralligenous outcrops are shown in Figure 4a and were generally consistent with known presence areas. Predicted occurrences for the North African coast highlighted suitable areas for which there were no occurrence data. This suggested that the measures taken to (i) address the geographic sampling bias (target group background) and (ii) to prevent overfitting (hinge feature; regularisation multiplier) had been efficient.

Figure 4
figure 4

Spatial distributions showing occurrence probabilities for (a) coralligenous outcrops and (b) maërl beds across the Mediterranean Sea, as predicted using distribution modelling.

Maps were created using ArcGIS software by Esri (Environmental Systems Resource Institute, ArcMap 10.1, (www.esri.com).

Maërl beds occurrence model

A total of 4,612 presence points (i.e. the training set derived from the combined polygon and point occurrence dataset) were used to model the occurrence of maërl beds across the Mediterranean. The final model based on this training set retained seven variables from the starting subset of 12 and the AUCs were 0.88 for the training set and 0.82 (standard deviation 0.004) for the geographically independent test set of 2,204 points (Figure 3b). Phosphate concentration, geostrophic velocity of sea surface current, silicate concentration and bathymetry were the four main contributors to the model (combined contribution of 83.6%; Table 2), whilst the remaining three predictors (bottom salinity, euphotic depth and slope of the seafloor) had a combined contribution of 16.4%. As for coralligenous outcrops, unimodal variable response curves supported grouping these species together, at the spatial scale considered.

Table 2 Relative contributions of each predictor variable to the maërl beds distribution model

Based on a jackknife test on the test gain (Supplementary Figure S6a), phosphate concentration appeared to have the most useful information by itself and also the most information that was not present in the other predictor variables. The jackknife test on the test set's AUC (Supplementary Figure S6b) confirmed that phosphate and silicate concentrations and sea surface current were the strongest contributors to the model.

Predicted areas with suitable conditions for the occurrence of maërl beds are shown in Figure 4b and were mostly consistent with known presence areas, with some exceptions such as the Po river estuary in the north of Italy. Given the paucity of occurrence data for this habitat across the Mediterranean and especially the North African coast, the model output was relatively informative in highlighting several suitable areas where no occurrence data were available to train the model. Again, this suggested that measures taken to prevent issues of geographic sampling bias and overfitting worked. The model predicted high suitability (probability of occurrence >0.8) in one area having no known record of maërl beds: the southern Evoikos Gulf (Greece). This area happens to have a relatively high phosphate concentration and groundtruthing would be necessary to confirm the actual presence of maërl beds, beyond the predicted suitability of the area.

Discussion

Spatial data on coralligenous and maërl habitats have become increasingly available during the last twenty years11,15,45, indicating that these bioconstructions occur widely across the Mediterranean basin17. The present study has provided (i) the most comprehensive update for the distributions of coralligenous outcrops and maërl beds across the Mediterranean Sea, going much further than previous studies, in particular for the eastern Mediterranean basin and (ii) the first basin-wide and continuous distribution maps based on predictive modelling. Knowledge acquisition was particularly acute for maërl beds, for which data on spatial occurrence had remained comparatively scarce before this review and associated modelling exercise.

Surface areas reported here for coralligenous outcrops (2,763 km2) and maërl beds (1,654 km2) were based on (“raw”) polygon data resulting from in situ observations (i.e. not from the model outputs), predominantly from small-scale studies, limited to the 0 to 200 m depth band. These figures do not include surface areas associated with vertical cliffs, where coralligenous outcrops are commonly found. Point and line data were not used in the surface area estimations, as they do not have associated surface areas. Hence, the figures given here clearly underestimate the real spatial extent of coralligenous and maërl habitats in the Mediterranean Sea. In this region, spatial data on species and habitat distributions remain very patchy and in many locations the information ranges from low quality to completely unavailable. It remains challenging for the Mediterranean but also elsewhere, to integrate information that has often been collected using differing approaches and that is stored in various repositories. Considering the available polygon data and their associated surface areas, we roughly estimate that as much as 95% of coralligenous habitat may still need to be mapped across the Mediterranean basin, especially in deeper areas. The value is probably even higher for maërl beds. Given their high biodiversity value, the systematic mapping of these two habitats across the Mediterranean should be a priority, especially as they can be used more widely to track anthropogenic disturbances, for instance as part of the EU's Marine Strategy Framework Directive16 and the Barcelona Convention.

The present study adopted a presence-only modelling approach because of the paucity of known absence areas for coralligenous outcrops and even more for maërl beds. The “species” datasets were strongly spatially biased towards northwest marine regions. Despite measures taken to minimise this bias and also overfitting, some areas of known absence were predicted to be suitable for coralligenous outcrops (e.g. Nile delta, north-eastern coast of Italy): these false predicted presences were manually removed from the final maps. In contrast, areas of known presence, especially those based on point data, did not necessarily show very high suitability levels. Model outputs (Figure 4) were hence presented herein in combination with the collated observed occurrence data (Figure 1). This implies that spatial management measures for fisheries that are aimed at protecting coralligenous outcrops and maërl beds, should not be based solely on the model outputs presented here; targeted groundtruthing should be carried out so that informed decisions are taken.

Due to data limitations on species lists across the various component datasets, coralligenous outcrops and maërl beds were each modelled as a whole, instead of modelling multispecific assemblages with distinct habitat preferences. Inspection of (unimodal) response curves did suggest that the approach taken was appropriate at the spatial scale used. This may not necessarily be the case at the local or site scale. It thus remains important to encourage systematic assessments of species composition of the two habitats across the region, beyond simply recording the presence of the habitat as a whole, so that species and assemblages with different habitat preferences can be modelled and mapped separately.

While better coralligenous outcrops and maërl beds data would certainly improve model outputs, so would better alternatives for predictor variables, especially if they have finer spatial resolutions. Predictor resolution, from the global/continental scales to the site/micro scales, influences the importance of different variables in controlling species distributions across varying spatial scales46. For instance, very good site-scale predictors of coralligenous outcrops occurrence are hard substrata and steep underwater cliffs (although concretions over flat rocky surfaces and platform coralligenous assemblages are also extremely common), combined with strong currents, light between 0.05% and 3% of surface irradiance and nutrient-poor waters17. Coralligenous outcrops would be unlikely to occur in high sedimentary zones without hard substrata, in enclosed estuarine systems and in sandy areas with low salinities such as river mouths, although some exceptions exist17. For maërl beds, flat and coarse grained areas would tend to be suitable habitats, as well as straits with strong bottom currents that reduce sedimentation23,47. The present study did not manage to unearth such fine-resolution predictor variables in mapped formats that would cover the entire Mediterranean Sea; besides, the spatial resolutions of the variables that were available were coarser than desired. As fine-resolution (i.e. local, site and micro-scales) spatial data on e.g. bottom types or salinity were not available at the scale of the Mediterranean basin, coarser-resolution (i.e. regional and landscape scales) surrogates were used, thereby constraining model behaviour and the derived local-scale interpretation.

The main drivers of the coralligenous outcrops model were bathymetry, slope of the seafloor and nutrient input. Those of the maërl beds model were phosphate concentration, sea surface current, silicate concentration and bathymetry. For bathymetry, a link of causality with presence of bioconstructions is possible. In contrast, predictor variables that are measured in situ and interpolated (e.g. phosphate and silicate concentrations), or even modelled (e.g. bottom salinity), harbour more uncertainty and may play a role in the model because of their “shape” (i.e. spatial gradients and patterns), not because of causality or correlation.

The map of predicted coralligenous outcrops occurrence agrees in parts with the multi-criteria evaluation approach of Cameron and Askew48 for the western Mediterranean: there, data layers such as bottom substratum, current and bathymetry were combined within a Geographic Information System (GIS) using various thresholds. The two approaches give comparable results for the North Algerian coast, parts of the south-eastern French coastline, the Spanish coast, western Corsican and Sardinian coasts and the Balearic Islands. In other areas, discrepancies are significant: for instance, most of the western Italian coast is missed by the multi-criteria approach and also the eastern Corsican and Sardinian coasts, as well as the Tunisian coast.

Disentangling the environmental variables driving the distribution of coralligenous and maërl habitats across the Mediterranean Sea is clearly a challenge and requires detailed knowledge on the ecology and biology of these complex habitats. Specific experimental and observational studies are required to address this issue. However, the predicted occurrence maps for coralligenous outcrops and maërl beds can be of critical importance to guide more-cost-effective survey and monitoring efforts targeting poorly-surveyed areas (e.g. in non-EU countries) and areas where these bioconstructions are putatively likely to occur. In turn, the newly collected data, preferably less spatially biased, could then be used to improve distribution models, since a systematic survey of the whole Mediterranean basin is not a realistic option. Model performance will also improve with finer resolution, or more relevant, predictor variables, resulting in better predicted occurrence maps. To date, however, this predictive modelling exercise remains unique in this regional sea, having provided continuous predicted occurrence maps for two of its most important habitats in terms of biodiversity.

Human impacts on coralligenous and maërl habitats can be substantial and these effects may grow further in the future as a result of the interlinked effects of climate change and rising anthropogenic pressure. In light of the importance of the processes produced by these habitats, increasing our understanding of their distribution is critical in helping to protect their associated biodiversity. Presently, coralligenous and maërl habitats are considered priority habitats at the European and regional levels, with specific conservation and management measures. The occurrence and predictive maps presented here can be fed into the development of basin-wide conservation plans (e.g. for establishing networks of marine protected areas) or other forms of marine spatial planning and also in policy development that, at present, are often largely limited by the scarce spatial information on both the distribution and extent of such marine habitats.

Methods

Compiling occurrence datasets for coralligenous outcrops and maërl beds

Geo-referenced occurrence records for coralligenous outcrops and maërl beds across the Mediterranean basin were compiled as part of two international research projects: ‘MEDISEH’41 (Mediterranean Sensitive Habitats), which was financed by the European Commission under the MAREA Framework and CoCoNET (Towards COast to COast NETworks of marine protected areas), financed by the EU's 7th Framework Programme. Data sources included peer-reviewed articles and national, regional and international reports (‘grey literature’). This resulted in a total of 771 scientific documents (see Supplementary References), a subset of which had associated spatial information (i.e. maps), information on in situ depth of occurrence, and/or species lists for the communities encountered. Spatial information also came from unpublished in situ observations by experts and divers. Where digital spatial information (e.g. shapefiles) was not available, shapefiles were created manually by digitising image maps, or by manually extracting spatial information from textual descriptions, based on expert knowledge. All the GIS work (including the maps) was carried out using ArcGIS software by Esri (Environmental Systems Resource Institute, ArcMap 10.1, www.esri.com).

Predictor variables used as input to the models

An initial screening phase for data layers relevant to predictive modelling of coralligenous and maërl habitats identified 17 datasets that were also spatially continuous at the scale of the Mediterranean basin (Supplementary Table S1). Predictors under consideration ranged from physical (e.g. bathymetry), environmental (e.g. salinity) and anthropogenic (e.g. nutrient input) variables, to calculations (e.g. distance to ports) and in situ (e.g. silicate concentration) or remotely-sensed (e.g. euphotic depth) measurements.

All 17 layers were standardised to raster format, having the same geographic extent (the Mediterranean basin), coordinate system (WGS 1984 datum; cylindrical equal-area projection) and resolution (cell size 400 m). This choice of working resolution, albeit artificial, allowed for a better fit along the coastline (extracted from the GSHHS Database49, version 2.2.1), i.e. with minimal gaps between the ‘end’ of the predictor layers and the land boundaries. For predictor variables with coarser native resolutions, the resolution was artificially made finer without re-interpolating the data, so that the variables would retain their native resolutions. Gaps in spatial coverage were retained and coded as such.

Habitat modelling

Coralligenous outcrop is a collective term that refers to a very complex biogenic structure mainly created by the outgrowth of encrusting calcareous algae on hard substrata in dim light conditions17. Some fleshy and turf algae as well as several groups of sessile invertebrates (e.g. sponges, ascidians, cnidarians, bryozoans, serpulid polychaetes, molluscs) contribute to create the final coralligenous habitat17,29.

Maërl is also a collective term for a biogenic structure composed of one or more species of free-living (unattached) calcareous red algae (mostly Corallinaceae but also Peyssonneliaceae), dwelling on sedimentary bottoms. These algae can display a branching or a laminar appearance. They sometimes grow as nodules known as rhodoliths that cover all the sea floor, or accumulate within the sand and gravel ripple marks11,23,50. Some Authors distinguish between dense accumulations of interlocking rhodoliths within the ripples of muddy and sandy substrates (maërl beds) and rhodoliths dispersed among sediments (rhodolith bottoms). However, in the literature, the terms maërl and rhodolith are also used as synonyms. In the present study, both were combined under the umbrella term mäerl beds.

Therefore, coralligenous outcrops and maërl beds are two complex habitats featured by a suite of different species, which will vary locally and regionally51,52. The coralligenous habitat can even be considered to be a submarine seascape, or community mosaic, rather than a single community17. Although habitat modelling should be better performed on distinct sub-communities of each, e.g. as defined by a prior multivariate analysis53, this, however, could not be done here due to the scarcity of species lists across the various component datasets. As a result, coralligenous and maërl occurrences were each modelled as a whole, without distinguishing between their component sub-communities.

Point, line and polygon (i.e. boundary) occurrence data were used to develop and test the distribution models. Polygons and lines were converted to sets of point data: this involved first converting them to raster format (using the same grid resolution as that of the predictor variables) and then converting the raster to a point shapefile (using the default filtering to one point per pixel). The resulting point shapefile was then merged with the other point dataset for the same habitat. Excluded from the coralligenous model were occurrence data located in the Po estuary (Italy), due to their unique and unusual habitat preferences. These are indeed a specific type of coralligenous outcrops called tegnue, defined as submerged rocky substrates of biogenic concretions, irregularly scattered in the sandy or muddy seabed and containing extraordinary zoobenthic assemblages22.

Model development

Data exploration was carried out on the coralligenous, maërl and predictor variable datasets, using R software (R Development Core Team; www.r-project.org), so as to detect potential outliers. Modelling techniques are often sensitive to multicollinearity among the predictor variables used. Available predictor variables (17) were hence iteratively tested for multicollinearity based on a combination of variance inflation factor (VIF < 2.5) and Spearman's rank correlation (rs < 0.6). This resulted in a subset of 12 mostly uncorrelated predictor variables54 (Table 3; Supplementary Figures S1 and S2), which were used as initial input to the models.

Table 3 Predictor variables used in the models. dd: decimal degrees; psu: practical salinity units; n/a: not applicable. Maps are shown in Supplementary Figure S1

Maximum entropy, a well-known approach in machine-learning, is widely used to model species geographic distributions (i.e. their occurrence) in the terrestrial and marine environments, using, for instance, museum collections that only record occurrence localities. The software Maxent55,56 (version 3.3.3 k) was used to build models for coralligenous outcrops and maërl beds, starting with the subset of 12 predictor environmental variables. The algorithm used in Maxent aimed to find the largest spread, or maximum entropy, in the geographic dataset composed of occurrence records of coralligenous outcrops or maërl beds, in relation to the 12 predictor variables. For each of the two models being developed, Maxent started with a uniform distribution of occurrence probability values for coralligenous outcrops or maërl beds over the entire Mediterranean basin and conducted an optimisation routine that iteratively improved model fit, measured as the loss of entropy (i.e. the “gain” of information).

Available occurrence points for each habitat were split between ‘training’ and ‘test’ sets, the latter accounting for approximately a third of occurrences. The test set for each habitat model was geographically independent (Supplementary Figure S3) so as to avoid spatial autocorrelation between the test and training sets (which would occur if test points were selected randomly by Maxent). Test areas were selected so as to encompass, as much as possible, a variety of environmental conditions. The test set was not used for model development, but kept aside and fed separately to Maxent so as to assess model performance across the region.

Of the several feature types available in Maxent, hinge features were used so as to obtain smoother models and to help prevent overfitting to the training data57. In addition, Maxent's ‘regularisation multiplier’58 was tuned to 2.5 (the default value being 1), so as to reduce overfitting further and control model complexity.

So as to reduce the effect of the geographic sampling bias in the occurrence datasets for coralligenous and maerl habitats, ‘target group background’59 was used. Areas of the Mediterranean were attributed a relative value of sampling effort, based on expert knowledge (Supplementary Figure S4). This information was fed to Maxent in raster (ASCII) format.

Based on estimated relative contributions to the model by the 12 predictor variables, the ones contributing the least to the model were removed (i.e. usually if they contributed less than 5% and based on expert judgement) and the final model was re-run without them. The importance of each retained predictor variable was then measured through a jackknife (also called ‘leave-one-out’) test of variable importance, by training with each predictor variable first omitted and then used in isolation. The model output was spatialised in the form of raster showing the logistic probability (ranging from 0 to 1) of occurrence for the habitat considered.

The Receiver Operating Characteristic (ROC) curve39 was used to investigate the trade off between prediction sensitivity and specificity. The associated Area Under the Curve (AUC) is 0.5 in the case of random prediction and higher values (to a maximum of 1) correspond to better performing models.