Introduction

Despite the subterranean environment harbours a diverse array of living forms, including microorganisms, invertebrates, and vertebrates1, its biodiversity has received limited attention in terms of biomonitoring and conservation efforts2,3. The exploration of subterranean biodiversity faces a significant challenge called the Racovitzan impediment4. This concept encapsulates a set of inherent difficulties encountered in studying and understanding subterranean ecosystems and their dwellers, mostly related to the limited accessibility and the peculiar abiotic (e.g., total darkness, low oxygenation) and biotic (e.g., nutrient scarcity) conditions of most subterranean environments5. In recent years, the scientific community has strongly reaffirmed the urgent need to overcome these challenges to advance our understanding of subterranean biodiversity and develop effective conservation strategies3,6. Given the aforementioned access and sampling challenges, the picture gleaned from studying subterranean environments is unavoidably limited. In fact, many taxa thriving beneath the surface have yet to be described, while unexpected findings of certain taxa in new areas often defy biogeographic expectations. Alongside the Racovitzan impediment, these taxonomic (i.e., Linnean) and biogeographic (i.e., Wallacean) shortfalls hinder the effectiveness of conservation programs aimed at preserving subterranean biodiversity2,7. However, recent advances in speleological techniques and sampling methods have propelled our understanding of subterranean biodiversity forward8,9,10,11,12,13,14,15,16,17. The growing body of research targeting subterranean environments has revealed remarkable examples of convergent evolution and surprising biological radiations18,19, thereby reaffirming that subterranean organisms offer crucial insights into widely debated topics such as the time and mode of speciation20.

Accessible caves represent only a fraction of the vast subterranean realm, characterised by macro- and microfractures, conduits, streams and lakes, and large chambers being intermittently or permanently filled with groundwater. Cave systems are particularly extensive and articulated in karst regions, where the hydrologic cycle plays a pivotal role in the formation and functioning of the groundwater environments21. As meteoric water infiltrates fractured rocks (e.g., carbonate rocks), it contributes to the development of two distinct yet usually connected zones: the unsaturated and the saturated karst. While the former is characterised by intermittent groundwater availability, the latter is permanently flooded. Both these groundwater “black boxes” host a remarkable variety of habitats supporting diverse biological communities22,23. Surprisingly, the unsaturated portions of a karst system, comprising temporary pools, trickles and locally waterfilled fractures in the bedrock, often exhibit higher species richness than the saturated karst within the same system9. Furthermore, species found in the unsaturated karst are rarely the same as those collected from the saturated karst, primarily due to differences in habitat availability for obligate groundwater dwellers (i.e., stygobites). Alongside stygobites, aquatic species usually thriving in surface waters (i.e., non-stygobites) are commonly found in both the unsaturated and saturated portions of karst groundwater. Despite their primary habitats being surface waters (e.g., lakes, streams, ponds), they occasionally or accidentally enter groundwater via the connections between the surface and the underground.

Karst aquifers offer valuable ecosystem services, such as provisioning of freshwater for human consumption, irrigation, and industrial use24. However, these aquifers are highly sensitive to direct or indirect human impacts2,25, including contamination, overexploitation, and climate change26,27. Moreover, karst groundwater hosts rare and narrow-range endemic species that are particularly vulnerable to extinction28,29,30. Biological surveys conducted over the past 30 years have provided insights into the large-scale factors influencing the composition of biological assemblages in karst areas, including evolutionary history, habitat heterogeneity and environmental gradients14,30,31,32,33,34,35,36. Nevertheless, only a few studies33,36 have thus far evaluated the relative role of geographic distance and environmental gradients on biotic dissimilarity (i.e., beta diversity) of groundwater crustacean assemblages, explicitly partitioning between spatial turnover and nestedness-resultant dissimilarity37. Partitioning beta diversity into these two components and exploring its geographic, historical, and environmental correlates has proven to be crucial for understanding biogeographic processes36,38,39 and outlining conservation priorities40,41.

In this study, we analysed copepod (Crustacea Copepoda) assemblages from 12 caves spanning seven macro-areas along the Italian peninsula, from the northeastern karst (Friuli-Venezia Giulia) to the southern gypsum area (Calabria) and limestone systems (Apulia). By sampling caves from different macro-areas, we aimed to obtain a representative sample of the copepod diversity across the Italian karst and to investigate the associated ecological dynamics. We selected the Crustacea Copepoda as the target group because they were the most abundant and species-rich taxon in all the sampled caves. With the hypothesis that both spatial isolation and regional climate may have shaped the composition of biotic communities in cave waters, we conducted this research work with the following objectives: (i) to characterise the copepod assemblages of the examined caves in terms of species richness (i.e., alpha diversity); (ii) to quantify between-caves beta diversity; (iii) to evaluate the relative contributions of geographic distance and surface climatic conditions to the observed patterns of beta diversity.

In a broader context, our results contribute to pushing forward the current knowledge about how surface climatic conditions affect faunal diversity in cave waters, a topic of primary importance in the context of the ongoing climate crisis and associated biodiversity loss. Further, by taking advantage of climate data from an online repository to fit state-of-the-art dissimilarity models on presence-absence data of groundwater fauna, our analytical workflow may represent a “template” for other subterranean biologists in exploiting the increasing amount of publicly available and spatially explicit digital environmental data to disentangle the drivers shaping subterranean biodiversity.

Material and methods

Study sites

We selected twelve caves in diverse karst areas across the Italian peninsula, spanning multiple latitudinal bands (Fig. 1), and we sampled each cave twice between 2019 and 2022.

Figure 1
figure 1

Map of Italian lithounits (based on the Italian lithological map retrieved from the Italian National Geoportal, http://www.pcn.minambiente.it/mattm/en/), with Italian administrative borders (black contour lines), the twelve sampled caves (black crosses) and their corresponding codes (see Table 1).

The investigated caves (Table 1) span seven macro-areas with distinct hydrogeologic characteristics: (i) Trieste Karst of the Eastern Alps (Northeastern Italy), which comprises the Abisso di Trebiciano (coded as Tr, see Table 1) and Grotta Antonio Federico Lindner (Li) caves; (ii) the Apuan Alps in the Northern Apennines (Northwestern Italy), encompassing the caves named Antro del Corchia (Co), Tana che Urla (TcU), and Grotta del Vento (GdV); (iii) the Frasassi Gorge area in the Central Apennines (Central Italy), including Grotta del Fiume (Fi); (iv) the karst area of inland Abruzzo in the Central Apennines (Central Italy), hosting Grotte di Stiffe (St); (v) the Alta Murgia and Bassa Murgia subregions of Apulia (Southern Italy), respectively hosting Grave Rotolo (Ro) and Grotta Puntore (Pu); (vi) the Alburni Massif in the Southern Apennines (Southern Italy), with Grotte di Castelcivita (Ca) and Grotte di Pertosa-Auletta (Pe); (vii) the evaporitic rock formations of the Southern Apennines (Southern Italy), with the Grave Grubbo (GG) cave.

Table 1 For each sampled cave, the following data are reported: extended and abbreviated (“Cave code”) denomination, coordinates (EPSG 4326: WGS84), elevation (at cave entrance, as derived from a 30 m-resolution DEM), lithological composition, horizontal and vertical extents.

Abisso di Trebiciano and Grotta Antonio Federico Lindner caves are located along the underground course of the Timavo River42, the main contributor to the recharge of the Trieste Karst aquifer. During flood periods, the water rises several meters inside the two caves, so that their deepest portions become totally submerged.

Antro del Corchia cave is located on the southwestern slopes of the Apuan Alps and is part of one of Italy’s largest karst systems43. Due to its huge overall length (dozens of kilometers) and depth (> 1000 m), and the consequent difficulties in displacing the sampling equipment, in this cave we focused the sampling on the habitats located close to the touristic path, which included an underground stream, some small lakes, pools, and dripping pools. On the southeastern side of the Apuan Alps are Grotta del Vento and Tana che Urla caves. Most of the former cave is characterized by unsaturated habitats, with an underground stream flowing only in its deepest part. Contrarily, Tana che Urla is crossed throughout its development by a perennial stream, without a well-developed unsaturated zone.

Grotta del Fiume has a sub-horizontal development, spanning 7 km and hosting several lakes and dripping pools, with both the saturated and unsaturated parts of the cavity being well-developed.

Grotte di Stiffe cave is traversed throughout by a perennial river, which has created meanders, waterfalls, and syphons (in the inner part of the cave). The unsaturated zone is primarily limited to one chamber, where there are some dripping pools.

Grotta Rotolo consists of wide vertical pits and sub-horizontal passages, reaching the water table at the bottom of the final drop44. Within the vadose zone, several pools are present at different heights.

Grotta Puntore is located half a kilometer from the Adriatic Sea, within an area affected by seawater intrusion; this cave opens at 12 m a.s.l. as a collapse sinkhole, and it consists of a single chamber being 20 m wide and 12 m deep, almost entirely filled with breakdown deposits, hosting a brackish water lake on its northern sector.

Grotte di Castelcivita, which opens on the western slopes of the Alburni Massif, is divided into three levels: the top level represents the inactive branch; the middle one is the hydrogeologically active branch, hosting pools and lakes; the bottom level consists of permanently flooded conduits45. On the opposite side of the Alburni Massif is located the Grotte di Pertosa-Auletta cave: it develops along three parallel branches, where both saturated and unsaturated environments occur, including larger and smaller pools, dripping pools, and lakes. The southernmost branch is traversed by the Negro River, which flows out of the cave46.

Grave Grubbo cave is mainly developed along one branch within which a stream flows; there are also two other small conduits, one being totally dry and the second hosting small pools and drips only during flood periods.

Although the twelve investigated caves may not encompass all the hydrogeologic features of the macro-areas where they are located, they represent a robust compromise between sampling feasibility and representativeness of the Italian karst in terms of both hydrogeology and groundwater fauna.

Faunal sampling

First, we carefully identified the different habitat types within each cave, including pools, streams and lakes, gours (also termed “rimstone dams”, barriers made up of calcite and other minerals being deposited around cave pools and streams). Then, we collected aquatic invertebrates using a stratified random sampling protocol, maximising sampling efforts within each habitat. For most caves, the sampling activity required advanced speleological techniques involving the use of ropes, harnesses and specific personal equipment for vertical progression in order to access the sampling sites safely. Sampling was conducted, whenever possible, in both the saturated and the unsaturated portions of the cave, and each habitat was sampled at least twice to obtain a representative sample of the corresponding invertebrate community. Faunal samples were collected and processed following a standardised protocol47, employing diverse sampling methods and devices based on the single habitat types.

Specifically, we sampled dripping pools by aspirating water with a 50 mL syringe or manually sieving sediments using a plankton net (60 µm mesh). In areas showing intense dripping, we placed the net directly beneath the dripping area. We sampled the freeflowing water in subterranean streams and rivulets by placing the mouth of the net against the water flow and then sieving the sediment upstream of the net mouth to dislodge the benthic and interstitial species. We used a hand net (60 µm mesh) in deeper pools and where water flow was more substantial. Sampling within the phreatic zones was conducted using the same methods and equipment just described for the vadose zones, except in the phreatic zone of the Grave Rotolo cave where we employed a 60 µm-mesh Cvetkov net48. The Cvetkov net was also used for sampling the “Lago della Bottiglia” lake within Grotta del Fiume.

The collected specimens were preserved in 80% ethyl alcohol and stored in the Stygobiology Laboratory at the University of L’Aquila. The Crustacea Copepoda were then sorted under a Leica M205C stereomicroscope and identified to the species/subspecies level using taxonomic keys49,50,51,52,53 and additional specialised literature. Each copepod species was further categorised as stygobite (obligate groundwater dweller) or non-stygobite (non-obligate groundwater species which are found occasionally or accidentally in groundwater).

Data analysis

All the statistical analyses were performed through the R software version 4.2.054.

Estimating ‘hidden’ alpha diversity and between-cave beta diversity

To assess whether the set of species recorded in the sampled caves is likely to comprehensively represent the alpha diversity of Crustacea Copepoda occurring within the Italian karst groundwaters, we used three non-parametric estimators of species richness, namely ‘Chao 2’55, ‘first-order Jackknife’56, and ‘Bootstrap’57, implemented within the ‘specpool’ function of the “vegan” package58 version 2.6–2. Further, to visualise the relationship between the number of observed species and increasing sampling effort (here intended as the number of sampled caves), we generated a Species Accumulation Curve59 for each of the three richness estimators, taking advantage of the 'specaccum' function from the “vegan” package. The accumulation curves were obtained through 1000 random permutations of the initial set of sites (i.e., caves) in pools of sites of increasing size (ranging from n = 1 to n = 12).

We estimated the between-cave dissimilarity of copepod assemblages by computing pairwise taxonomic beta diversity following the framework developed by Baselga37,60. Specifically, total pairwise beta diversity was partitioned into turnover and nestedness- resultant dissimilarity through the ‘beta.pair’ function implemented in the “betapart”61 R package version 1.5.6. Turnover quantifies the extent to which the difference in species composition between two sites is due to the presence of distinct species at each site. Differently, nestedness-resultant dissimilarity (hereafter, nestedness) rises when two sites share some species but one of the sites is richer in species than the other. The Sørensen dissimilarity index (βsor) was used as a measure of total beta diversity, turnover was measured through the Simpson dissimilarity index (βsim), while nestedness was measured through the corresponding index (βsne) described in Baselga37.

Distance-decay models

To investigate the dependence of between-cave taxonomic dissimilarity upon geographic distance, we fitted distance-decay models for total beta diversity, and for its two components separately, through the ‘decay.model’ function of the betapart package. This function permits to contrast the observed biotic dissimilarity (response variable) between two sites to the corresponding geographic distance (predictor) via a Generalized Linear Model (GLM) with normal error distribution and logarithmic link function. The Euclidean geographic distances between each pair of caves were computed through the ‘st_distance’ function of the “sf” package62 version 1.0–14, after projecting the original caves’ geographic coordinates (EPSG: 4326) to the ETRS89-LAEA projected reference system (EPSG: 3035). According to Gómez-Rodríguez & Baselga39, we tested two kinds of decay models: (i) a negative exponential model, where the predictor values (i.e., between-caves geographic distance) were left unchanged; (ii) a power-law model, fitted after log-transforming the geographic distances through natural logarithm. For each model kind, the corresponding explanatory performance was evaluated based on the model pseudo-R2 (\({\text{pseudo}} - {\text{R}}^{{2}} = 1 - \frac{explained\,deviance}{{null\,deviance}}\)) and on the p value resulting from the randomization algorithm implemented in the ‘decay.model’ function39,61, run across 1000 iterations.

Generalized dissimilarity models

Distance-decay models based on (generalized) linear regression have been also applied to assess whether the biotic dissimilarity between sites or regions is linked to the corresponding environmental dissimilarity37,63. In such applications, environmental distance between the contrasted sites/regions is usually measured as their distance along summary axes resulting from ordination techniques (e.g., Principal Component Analysis, Non-Metric Multidimensional Scaling) applied to environmental data. However, in recent years, a novel, more flexible, approach known as Generalized Dissimilarity Modelling64 has gained momentum and is being more and more applied to investigate the drivers behind biotic dissimilarity at various spatial scales65. We took advantage of the “gdm” package66 version 1.5.0–9.1 to fit Generalized Dissimilarity Models (GDMs) contrasting the observed between-cave beta diversity to both geographic and climatic distances.

Within GDMs, the biotic dissimilarity between sites is related to the inter-site spatial and environmental distance through a GLM with negative exponential link function (Eq. 1):

$$d_{ij} = 1 - e^{\eta }$$
(1)

    Where dij is the pairwise biotic dissimilarity and η is a function of inter-site environmental distance termed “predicted ecological distance”65. Differently from the “traditional” distance-decay models, in GDMs the predicted ecological distance is modelled based on transformed values of the predictors, resulting from a linear combination of I-spline basis functions which allow higher flexibility in modelling non-linear relationships between biotic and spatial/environmental dissimilarity65. Without diving into technical details, for which we refer the reader to Mokany et al.65, it is worth mentioning that the degree of flexibility in the parametrization of GDMs is mainly linked to: (i) the number of I-spline functions applied for transforming each predictor; (ii) the number and location of the knots along the predictors’ axes, where the knots define the portion of each predictor axis to which the single I-spline functions are applied. Here, we used the default parameters of the ‘gdm’ function from the homonymous package; specifically, three I-spline functions were used to transform each predictor, with the knots located at the minimum, median and maximum quantiles.

To quantify the climatic dissimilarity among the areas surrounding the 12 sampled caves, we took advantage of the Worldclim 2.1 online database67 to download gridded climate surfaces (30 arc-seconds pixel resolution, ~ 1 km2) representing the average values of the nineteen so-called “bioclimatic variables” (i.e., Bio1-Bio19) for the 1970–2000 period. These variables capture annual and seasonal trends related to temperature and precipitation. We chose as candidate predictors eight bioclimatic variables presumed to directly impact on subterranean environments68: Bio1 (Annual Mean Temperature), Bio3 (Isothermality), Bio10 (Mean Temperature of Warmest Quarter), Bio11 (Mean Temperature of Coldest Quarter), Bio12 (Annual Precipitation), Bio15 (Precipitation Seasonality), Bio18 (Precipitation of Warmest Quarter), and Bio19 (Precipitation of Coldest Quarter).

To assess whether climatic dissimilarity influences the between-cave taxonomic beta diversity in a scale-dependent manner, we computed average values of the candidate climatic predictors within four circular buffers of increasing radius around the entrance of each cave: (i) 0.5 km; (ii) 2.5 km; (iii) 5 km; (iv) 10 km. For each buffer, we performed a Variance Inflation Factor (VIF) analysis on the corresponding climatic averages through the “usdm” R package69 version 1.1–18, discarding the variables which attained a VIF score > 5 to avoid multicollinearity-related issues in model calibration70. Successively, we performed 1000 GDM fitting iterations: for each iteration, we randomly selected 85% of the 66 cave pairs (n = 56) to populate the training data table; these data were then used to calibrate a separate GDM for each combination of dissimilarity metric (i.e., βsor, βsim, βsne) × buffer radius. Within each GDM, the dissimilarity metric represented the response variable, while between-cave geographic distance and between-cave differences in the average value of each retained climatic variable within the considered buffer represented the explanatory variables. We then assessed the explanatory performance of each model based on the corresponding explained deviance (on training data), and the model predictive performance by computing the Root Mean Squared Error (RMSE) of its predictions on the withheld cave pairs (n = 10).

For each fitted GDM, we also partitioned, through the ‘gdm.partition.deviance’ function, the portion of explained deviance attributable to three sets of variables (and pairwise combinations thereof): (i) “Dist”, including only pairwise geographic distance; (ii) “Prec”, comprising the precipitation-related variables Bio12, Bio15, Bio18 and Bio19; (iii) “Temp”, including the temperature-related variables Bio3, Bio10 and Bio11 (Bio1 was discarded, after the VIF, for all the four buffer radii).

We then investigated possible statistically significant differences (in terms of explained deviance, RMSE and deviance partitioning) among the GDMs fitted on climatic averages from the four distinct buffer radii by performing Kruskal–Wallis Rank Sum Tests, followed by post-hoc one-tailed Pairwise Wilcoxon Rank Sum Tests with Benjamini & Hochberg71 p value correction. We used non-parametric tests because preliminary data exploration72 highlighted, for several dissimilarity metric × buffer radius combinations, the lack of fundamental assumptions of parametric tests (e.g., normality of residuals, homoscedasticity). For all these tests, the threshold for Type I error (i.e., “alpha-level”) was set to p value  < 0.05.

Finally, we fitted a full-data GDM for each dissimilarity metric × buffer radius combination. From these full-data models, we inspected: (i) the distribution of predicted versus observed dissimilarity values; (ii) the relative importance of each explanatory variable, assessed through the permutation algorithm implemented in the ‘gdm.VarImp’ function; (iii) the I-spline curves corresponding to the top-three variables in terms of relative importance.

A flowchart summarising the workflow designed to conduct the set of analyses described above is shown in Fig. 2.

Figure 2
figure 2

Flowchart summarising the different analytical steps taken to investigate the contribution of geographical distance and surface climatic conditions to between-cave taxonomic beta diversity.

Results

Out of the 71 copepod species recorded in the 12 sampled caves, 46 were stygobites and 25 non-stygobites (Supplementary Materials Table S1). Among the non-stygobites, Paracyclops imminutus was by far the most widespread as it was recorded in 11 caves, followed by Bryocamptus echinatus which was found in 8 caves. Worth mentioning, 45 out of the 71 species were collected in a single cave each. Grotte di Pertosa-Auletta (Pe) showed the highest alpha diversity (22 species), followed by Grotte di Stiffe (St) with 18 species. Grotta Lindner (Li) and Tana che Urla (TcU) showed the lowest number of recorded species (5), followed by Grave Rotolo (Ro) and Grotta del Vento (GdV) hosting 6 species each. Some of the species being unique to single caves still lack scientific description, such as a new species of Hesperocyclops from Grave Rotolo and a new species of Pseudectinosoma from Grotta Puntore (Pu).

Despite the relatively high number of recorded species, all three non-parametric richness estimators suggested that more species may have been found if additional caves were sampled. Indeed, the ratio between observed and estimated richness was as low as 40.8% according to the Chao2 estimator which, however, also showed a high standard error (Supplementary Material Table S2); higher ratios between observed and estimated richness emerged for the first-order Jackknife (63.3%) and the Bootstrap (88.1%) estimators (Supplementary Material Table S2). The accumulation curves obtained using the three estimators were almost identical and none of them approached an asymptotic trend (Supplementary Material Fig. S1).

Total between-cave beta diversity was relatively high for most pairs of caves (Supplementary Material Table S3), with βsor values lower than 0.3 emerging only between the three caves located in the Apuan Alps [i.e., Antro del Corchia (Co), Grotta del Vento (GdV), and Tana che Urla (TcU)]. Further, overall beta diversity (mean βsor ± SD = 0.64 ± 0.31; median βsor = 0.76) was mostly driven by turnover (mean βsim ± SD = 0.53 ± 0.29; median βsim = 0.60), with far lower nestedness-resultant dissimilarity (mean βsne ± SD = 0.11 ± 0.11; median βsne = 0.07). A preliminary visual comparison between dissimilarity values and geographic distances (Supplementary Material Tables S3, S4) revealed that relatively close caves may be noticeably dissimilar in terms of copepod assemblages, while caves located hundreds of kilometres apart may have quite a low species turnover. For instance, Abisso di Trebiciano (Tr) and Grotta Lindner (Li), which are just 15 km distant, showed a high beta diversity (βsor = 0.86) and, most notably, this was almost totally driven by turnover (βsim = 0.8). Differently, Grotte di Stiffe (St) and TcU, which are located 325 km apart, showed one of the lowest turnover values (βsim = 0.2), with beta diversity in this case being primarily driven by nestedness of the TcU assemblage within that of St (βsne = 0.48).

Both the negative exponential and the power-law distance-decay models showed statistically significant explanatory performance for both turnover and total beta diversity, but not for nestedness (Supplementary Material Table S5). The distance-decay curves resulting from the power-law model showed a better fit to observed values compared to the ones resulting from the negative exponential model, though both of them either underestimated or overestimated dissimilarity for some pairs of caves (Fig. 3).

Figure 3
figure 3

Distance-decay curves resulting from the distance-decay models based on either a negative exponential function (left plots) or a power-law function (right plots), fitted for between-cave turnover, nestedness and total beta diversity. Black dots showed the observed beta diversity—geographic distance pairs of values.

Whatever the extent of the buffer over which the climatic averages were computed, the explanatory performance of the GDMs fitted with nestedness as response variable was far lower (median explained deviance < 10%) than that of the models fitted for turnover and total beta diversity (Fig. 4), thus mirroring the results of the distance-decay models.

Figure 4
figure 4

Half-violin plots showing variation in (a) explained deviance, and (b) RMSE on test data, extracted from the Generalized Dissimilarity Models fitted using climatic averages computed over four distinct buffer radii around cave locations (0.5 km, 2.5 km, 5 km and 10 km) as explanatory variables (along with between-cave geographic distance). The horizontal bar within each half-violin indicates the median.

The GDMs fitted on climatic averages computed over circular buffers of increasing radius around the caves significantly differed in terms of both explained deviance (Turnover: χ2 = 627.70, DF = 3, p value  < 0.001; Nestedness: χ2 = 249.38, DF = 3, p value  < 0.001; Total beta diversity = 305.74, DF = 3, p value  < 0.001) and RMSE on test data (Turnover: χ2 = 71.65, DF = 3, p value  < 0.001; Total beta diversity = 57.58, DF = 3, p value  < 0.001), except for RMSE of models built for nestedness (χ2 = 4.26, DF = 3, p value  = 0.24). For turnover and total beta diversity, the GDMs fitted using the 10 km-radius buffer showed higher explanatory (i.e., higher explained deviance) and predictive (i.e., lower RMSE) performance compared to the models fitted using the lower size buffers (p < 0.001 for all the pairwise comparisons in the post-hoc tests). For nestedness, instead, no consistent scale-dependent differences emerged in terms of either explanatory or predictive performance (Supplementary Materials Table S6).

Focusing on turnover and total beta diversity, buffer size also influenced the percentage of deviance explained by the different sets of variables (p < 0.001 in the Kruskal–Wallis Rank Sum Tests), except within the GDMs targeting total beta diversity and fitted upon “Dist” only (χ2 = 5.141, DF = 3, p value  = 0.16) and “Dist + Prec” (χ2 = 0.982, DF = 3, p value  = 0.806). Post-hoc tests confirmed significantly higher explained deviance using a 10 km-radius compared to the other three radii for the GDMs including temperature variables, namely those fitted only on “Temp”, “Dist + Temp” and “Prec + Temp” (Supplementary Materials Table S7). Further, the GDMs targeting turnover and fitted only on “Temp” variables had higher explanatory performance than those fitted using only geographic distance or “Prec” variables, and their median explained deviance increased about twicefold when using a 10 km-radius buffer (Fig. 5). For total beta diversity, GDMs fitted only on “Dist” explained as much deviance as those fitted on “Temp”, except when temperature averages were computed over a 10 km-radius buffer (Fig. 5). However, none of the models fitted on a single set of variables showed explained deviance higher than 15%. The GDMs fitted on coupled sets of variables generally explained less deviance than models including all the variables, but such difference was slight for “Dist + Temp” models fitted using the 10 km-radius buffer (Fig. 5).

Figure 5
figure 5

Half-violin plots summarising the results of the deviance partitioning algorithm applied to the Generalized Dissimilarity Models fitted using climatic averages computed over four distinct buffer radii around cave locations (0.5 km, 2.5 km, 5 km and 10 km) as explanatory variables (along with between-cave geographic distance). The horizontal bar within each half-violin indicates the median. “Dist” = geographic distance; “Temp” = temperature-related variables; “Prec” = precipitation-related variables.

GDMs fitted on all the 66 cave pairs (i.e., “full-data models”) confirmed that computing climatic averages over a wider area around cave locations increased model fit on observations, for both turnover and total beta diversity. Indeed, Spearman’s correlation between predicted and observed dissimilarity values increased from r = 0.4 with 0.5 km-radius buffer to r = 0.5 with 10 km-radius buffer for turnover, and from r = 0.32 with 0.5 km-radius buffer to r = 0.45 with 10 km-radius buffer for total beta diversity (Supplementary Materials Figs. S2, S4). Contrarily, the low correlation between predicted and observed values of nestedness further decreased when using the 10 km-radius buffer (Supplementary Materials Fig. S3).

“Temp” variables (i.e., Bio10, Bio11) consistently emerged as the top-contributing ones, followed by geographic distance, within the full-data GDMs fitted with turnover as response variable (Table 2). Differently, geographic distance was the most influential predictor for total beta diversity, except within the full-data GDM fitted using the 10 km-radius buffer. In line with the results of the deviance partitioning analysis conducted on the 1000 partial GDMs (i.e., those fitted on 85% random samples of the cave pairs and validated on the remaining 15%), the relative importance of seasonal temperature averages noticeably increased within the full-data GDMs fitted using the 10 km-radius buffer, for both turnover and total beta diversity (Table 2). In the latter models, Bio11 was excluded from model fitting due to multicollinearity emerging in the VIF, then being replaced by Bio10 as the leading variable.

Table 2 Top-three variables, in terms of relative contribution, within the full-data GDMs fitted for turnover and total beta diversity using the four distinct buffer radii for spatial averaging of climate-related variables.

The I-spline curves extracted from the full-data GDMs confirmed the strong contribution of temperature gradients to the predicted dissimilarity in terms of turnover (Fig. 6): indeed, an exponential growth in the modelled partial ecological distance emerged with increasing temperature for both Bio11 and Bio10, while the curves obtained for the other predictors showed logarithmic (for geographic distance) or sigmoidal (for “Prec” variables, i.e. Bio18 and Bio19) trends, with far lower maximum values. For total beta diversity, the I-spline drawn for geographic distance still showed a logarithmic trend but with a far steeper increase in partial ecological distance up to 300 km and a highest peak (Supplementary Materials Fig. S5). Nonetheless, changes along the gradient of Bio10 translated into increases in partial ecological distance far more rapidly than for geographic distance within the full-data model fitted with the 10 km-radius buffer (Supplementary Materials Fig. S5).

Figure 6
figure 6

I-spline curves extracted from the full-data GDMs fitted with turnover as response variable, and including as explanatory variables climatic averages computed over a buffer around the caves’ entrance whose radius measured: (a) 0.5 km, (b) 2.5 km, (c) 5 km, and (d) 10 km. The curves show the relationship between modelled partial ecological distance and the values of the top-three variables in terms of relative contribution to the considered GDM. In each row, the order of the plots reflects the relative contribution of the variables, with the top-contributing variable on the left.

Discussion

The twelve caves showed a notable variety of copepod species. Species fully dependent on groundwater (i.e., stygobites) and species occasionally or accidentally entering cave waters from the surface (i.e., non-stygobites) were both well represented, though with the primacy of the obligate groundwater dwellers. Overall, we identified 71 copepod species in the collected samples and some of the sampled caves hosted more than 15 species. Nonetheless, the implemented richness estimators indicated that the set of recorded species was likely not exhaustive in terms of copepod alpha diversity across the investigated karst macro-areas. This is in line with the modern view of subterranean environments being far richer in life than previously thought9,18,73, and it further reinforces the recent claims of the scientific community about the urgency of filling the existing knowledge gaps about subterranean biodiversity6,74. Moreover, the high incidence of spot endemics (23 stygobitic species) and of rare—in terms of frequency of occurrence—species (5 stygobitic species) is likely the main factor that determined a long upward slope to the asymptote in the obtained species accumulation curves75.

Along with a high alpha diversity, we were able to highlight a striking between-cave taxonomic dissimilarity (i.e., beta diversity) which was driven mainly by pure species replacement, in line with what has been derived for the European groundwater fauna by Zagmajster et al.76, from the regional to the continental scale. The dominance of turnover over nestedness-resultant dissimilarity (i.e., dissimilarity due to ordered species loss) in shaping beta diversity is emerging in numerous other ecosystems across a wide variety of spatial scales. In a recent meta-analysis including 99 studies conducted in various regions across the globe, Soininen et al.77 showed that available evidence points to a major role of turnover particularly at lower latitudes and with increasing extent of the studied area, while nestedness-resultant dissimilarity generally increases towards the poles. This trend is likely due to stronger environmental filtering at higher latitudes, driven by factors such as paleoclimatic events (e.g., Quaternary glaciations) and harsher current conditions (e.g., longer freezing periods and lower primary productivity).

Across the European continent, similar patterns were retrieved for a wide array of taxa, including beetles37,78, mammals79, parasitic trematodes80, aquatic plants and cladocerans81, freshwater fishes82, and groundwater crustaceans33. In such studies, the relative contribution of spatial (i.e., geographic distance) versus environmental (e.g., climate, land cover, soil composition) factors on the ratio between turnover and nestedness also varied depending on the target taxa. For instance, Heino et al.78 found that overall beta diversity among northern European regions was similar between ground beetles and diving beetles; however, compositional dissimilarity linked to richness differences (corresponding to nestedness-resultant dissimilarity) was higher than turnover for the former group, and vice versa for the latter. For ground beetles, the authors linked the preponderance of nestedness, and the major contribution of geographic distance to the observed dissimilarity, to the Quaternary glaciations which have determined regional extinctions in northern areas and to biogeographical barriers hampering subsequent re-colonizations. Differently, the higher turnover emerging for diving beetles was more strongly related to temperature gradients than to spatial isolation, suggesting that environmental filtering rather than dispersal barriers shaped current beta diversity patterns for this group. In southern Europe, however, the dominance of pure species replacement has emerged for various taxa with widely diverging life-history traits, dispersal capability and evolutionary history33,79,81. This pattern has been generally attributed to the higher historical climatic stability of southern Europe compared to boreal regions, which would have favoured speciation events over long periods devoid of dramatic climatic changes causing mass extinctions.

Focusing on groundwater crustaceans, copepods included, Zagmajster et al.33,76 confirmed higher turnover values at lower latitudes, a major contribution of turnover over nestedness to total beta diversity, and significant linear correlations (positive for turnover, negative for nestedness) with geographic distance. At a far finer spatial scale, a study conducted on copepod assemblages populating a set of thirty groundwater-fed springs from Central Italy36 highlighted a high nestedness-resultant dissimilarity, which was also positively correlated to between-spring geographic distance; such a correlation with geographic distance did not emerge, instead, for the turnover component. However, in both these studies, the authors did not implement any analytical tool permitting to explicitly discriminate the proportion of dissimilarity being purely attributable to geographic distance from that driven by environmental gradients.

The distance-decay models we implemented, using either a negative exponential function or a power-law function, showed that total beta diversity and turnover were significantly related to between-cave geographic distance, while nestedness was not. However, distance-decay curves did not perfectly fit the observed dissimilarity values, due to the fact that the relationship between geographic distance and faunal dissimilarity was not always positive. Indeed, some caves far apart from each other shared more species of copepods than neighbouring caves. For instance, the Tana che Urla and the Grotte di Stiffe caves, which are 325 km apart, share 3 non-stygobitic species (Bryocamptus echinatus, Bryocamptus zschokkei and Paracyclops imminutus) and the wide-ranging stygobitic harpacticoid Elaphoidella phreatica, leading to a low turnover (βsim = 0.2). Contrarily, the Abisso di Trebiciano and the Grotta Lindner caves, belonging to the same hydrogeological unit (Classical Karst83) and being closer than 20 km, shared the non-stygobitic species Bryocamptus echinatus only. In this case, a high between-cave turnover (βsim = 0.8) emerged because Abisso di Trebiciano hosted 3 stygobitic (Troglodiaptomus sketi, Diacyclops charon and Elaphoidella jeanneli) and 4 non-stygobitic (Acanthocyclops robustus, Macrocyclops albidus, Megacyclops viridis and Bryocamptus zschokkei) species which were not found in Grotta Lindner, while the latter hosted 3 stygobites (Acanthocyclops hypogeus, Elaphoidella elaphoides and Elaphoidella phreatica) and the non-stygobite Paracyclops imminutus which were absent from Abisso di Trebiciano. These examples suggest that the actual degree of between-cave species replacement is determined by both the stygobitic and the non-stygobitic species. Specifically, low species turnover between distant caves could be primarily related to non-stygobitic, often widely distributed, species that have established populations (most likely temporarily) within the caves under consideration. On the other hand, high turnover between close caves was largely driven by the stygobitic copepods, which are often unable to cross the rocky boundaries between even adjacent caves due to their low potential for dispersal, and thus usually respond to the rule “each species per each cave”13,18,74. Along with dispersal barriers related to the lithology of the areas where the caves open14, changes in groundwater faunistic assemblages over regional spatial scales could also derive from differences in the size, spatial arrangement and heterogeneity of the different habitat types and/or in the physical–chemical conditions (e.g., pH, electric conductivity) characterising distinct caves belonging to adjacent—or even to the same—hydrogeological units84.

By fitting Generalized Dissimilarity Models (GDM) fed with information about surface climatic conditions around the sampled caves, we found that a major portion of the observed between-cave compositional dissimilarity is explained by a mix of spatial (i.e., geographic distance) and climatic factors. Indeed, quarterly temperature averages (i.e., Bio10 and Bio11) consistently emerged as the most influential explanatory variables in the GDMs fitted for between-cave turnover. With respect to total beta diversity, Bio10 was still by far the most influential variable when sampling climate data over a 10 km-buffer radius, but geographic distance showed a not negligible influence in the corresponding model. Therefore, environmental filtering, primarily related to thermal ranges, appeared to mainly determine the set of species being unique to the different caves, and thus between-cave turnover. This may derive from the combination of two processes: (i) differences in temperature averages between regions may translate into differences in the timing and amount of groundwater recharge following ice and snow melting, along with rainfall, which in turn affects the degree to which non-stygobitic copepods can move in the groundwater through the hydrologic continuum between surface water and groundwater85; (ii) since temperature within caves is generally correlated to surface mean annual temperature86, temperature patterns in the neighbourhoods of the different caves may directly translate into thermal differences in cave waters, thus shaping the corresponding assemblages based on the thermal niche breadth of the species forming the respective regional pools87,88.

Further, we found that the extent over which climate averages were computed significantly influenced, for both turnover and total beta diversity, the goodness-of-fit of the GDMs, their predictive performance and the relative importance of temperature-related explanatory variables. Indeed, when increasing the radius of the buffer around the sampled caves up to 10 km, the proportion of explained deviance and the correlation between observed and predicted dissimilarity values increased, the RMSE of model predictions on the test data decreased, and Bio10 became by far the most influential variable. Additionally, the variance partitioning experiment clearly showed that temperature-related variables provided the greatest amount of information, when used in isolation or coupled with geographic distance, to model between-cave partial ecological distance. Again, the contribution of temperature-related variables within the GDMs fitted on subsets of the available explanatory variables significantly increased when using a 10 km-radius buffer. These patterns may be linked to the fact that the aquatic habitats of caves are replenished by surface rainfall and temperature-driven ice and snow melting over recharge areas which usually span dozens-to-hundreds of kilometres and that are often located far from the caves’ openings14,89. Thus, extending the area from which data are acquired should provide more information about how differences in regional climate affect the aquifers' recharge in the different caves, and this in turn should increase our ability to isolate climatic correlates of between-cave biotic dissimilarity. Our findings also recall those from Keil et al.90 who showed that, over regional extents (i.e., single countries or more restricted areas), climatic factors were more important than land cover and geographic distance in explaining pairwise species turnover for various taxonomic groups in Europe, and that the relative contribution of climate increased at coarser grid resolutions. The residual unexplained deviance in our models, namely the portion of between-cave beta diversity which could not be directly linked either to climatic averages or to geographic distance, suggests that the inclusion of some of the local-scale factors cited above (e.g., habitat size and heterogeneity, physical–chemical parameters) would be recommendable to fully understand the drivers behind the composition of crustacean assemblages in cave waters. However, while measuring differences in average physical–chemical parameters between distinct caves is rather straightforward, quantifying between-cave differences in habitat heterogeneity is a far more complex operation, also affected by subjectivity in classifying the single habitat patches within the compared caves.

In conclusion, this study highlighted that Italian karst caves harbour a rich and diverse array of copepod species, including several stygobitic and narrow-ranging species. Further, we demonstrated that copepod assemblages populating a set of caves along a latitudinal gradient spanning the Italian peninsula were noticeably dissimilar from one another, with turnover rather than nestedness driving most of the observed beta diversity. This is also important from a conservation perspective because the high species turnover implies that environmental degradation within a single cave may lead to the loss of unique species not occurring elsewhere, similarly to what has emerged from a recent study performed on groundwater-fed karst springs41. Additionally, we showed that geographical distance had a not negligible yet secondary role in shaping the between-cave dissimilarity of copepod assemblages in our studied caves, with surface temperature patterns having the greatest influence, particularly on species replacement. Finally, we showed that widening the spatial extent from which climatic data were retrieved in the surroundings of the target caves increased the ability of Generalized Dissimilarity Models to explain the observed beta diversity patterns, along with the relative importance of temperature-related variables. A limitation of the present study about this latter finding is that, by using circular buffers to compute climatic averages, we did not take into account the inherent anisotropy of the environmental gradients affecting the groundwater recharge dynamics (e.g., altitude and geomorphology influencing the direction from which water flows towards a given cave). Further, the possible presence of cryptic (i.e., with very similar phenotypes yet genetically distinct) groundwater-dwelling species may affect the estimates of between-cave beta diversity and its components, a risk which could be lowered by coupling molecular species delimitation methods to the morphology- and ecology-based taxonomic assessment91. Future studies replicating our analyses by taking into account these points, also focusing on caves from other regions, broader spatial extents and/or different subterranean taxa, would increase our understanding of the drivers behind the structuring and diversification of groundwater communities.

Data availability

Geographic coordinates of the sampled caves are provided in Table 1. Presence-absence data of the recorded copepod species are provided in Supplementary Materials Table S1. Climate data used to fit the Generalized Dissimilarity Models are freely available from the Worldclim online repository (https://www.worldclim.org/).