Introduction

Soil biodiversity is an essential component of every terrestrial habitat, affecting nutrient cycling, soil fertility, and plant-soil feedbacks, among other ecosystem functions1,2,3. Soil functioning is jointly driven by multiple components of soil biota that are closely interconnected, including plants, microorganisms, micro-, meso-, and macrofauna4,5. Land use, human activities, and climate changes induce widespread fundamental changes in the abundance, diversity, and activity of soil biota, altering functional connections and ecosystem-level processes in the terrestrial biosphere6. To understand and adapt to these changes, comprehensive knowledge about the global distribution of multiple soil biota components is urgently needed7,8.

With a growing understanding of the biogeography of microorganisms9, micro-10, and macrofauna11, a critical knowledge gap is the global distribution of soil mesofauna. Springtails (Hexapoda: Collembola) are among the most abundant groups of mesofauna and soil animals from the equator to polar regions12,13. They are mostly microbial feeders, but also graze on litter and are often closely associated with plant roots14,15. Through these trophic relationships, springtails affect the growth and dispersal of prokaryotes, fungi, and plants, thereby supporting nutrient cycling via the transformation, degradation, and stabilisation of organic matter13,16. Furthermore, springtails are a key food resource for soil- and surface-dwelling predators13,14, thus occupying a central position in terrestrial food webs and supporting biodiversity at higher trophic levels.

To assess different functional facets of biological communities, metrics such as population density and biomass (reflecting carbon stocks), taxonomic and phylogenetic diversity (ensuring multifunctionality and stability), and metabolic activity (quantifying energy fluxes and thus functional influence) are commonly used17,18,19,20. Recent assessments have found unexpected global biodiversity hotspots in temperate regions for microorganisms (fungi and prokaryotes)9 and macrofauna (earthworms)11, which do not correspond to the common latitudinal biodiversity gradient found in aboveground organisms21. Functional complementarity principles19 suggest that diverse soil communities in temperate ecosystems are able to support higher organismal densities and have a more efficient resource use (i.e., a higher total activity) than at other latitudes. However, there are no global assessments of soil animal metabolic activities. In contrast to expectations of complementarity principles, previous studies on plants22,23 and microbes9,24 suggest that diversity and activity (represented by respiration) do not co-vary at the global scale, probably because strong environmental constraints (e.g., temperature) limit this relationship. These discrepancies emphasize the need to investigate relationships of multiple metrics of soil animal communities. Springtails are an ideal model organism group for exploring such relationships at the global scale, due to their ubiquity, functional diversity, and high local species richness12,13,14.

Current knowledge suggests that springtails are especially abundant and diverse in temperate coniferous forests, but less diverse in polar regions20,25. Many springtails are adapted to high and stable humidity, and sensitive to drought and temperature changes26,27. Consequently, springtail density and diversity are likely to decrease with future climate change, detrimentally affecting soil food webs and ecosystem functioning28. At the same time, springtail densities are relatively high in urban areas and in agricultural fields29,30, so global springtail biomass may be moderately affected by land-use changes worldwide. Disentangling the roles of vegetation, climate, human disturbance, and other predictors of various springtail community metrics will be critical to understand their contribution to soil functioning under different global change scenarios7,10.

Here, we report global projections of density, diversity, and metabolic activity of soil springtail communities, and test whether high species richness supports increased density and total activity (i.e. community metabolism) across springtail communities globally, or whether this relationship is constrained by environmental and biotic controls. We aimed (1) to assess whether the global distribution of springtail diversity matches that of aboveground biota or other soil animals; (2) to test how different metrics of springtail communities are affected by climate and human activities; and (3) to quantify the global biomass of springtails as a component of the global carbon stock. Using an extensive dataset of soil springtail communities collected within the framework of the #GlobalCollembola initiative13 (2470 sites and 43,601 samples across all continents; Fig. 1a), we show contrasting patterns across soil biodiversity metrics at the global scale and demonstrate that springtails are among the most functionally important and ubiquitous animals in the terrestrial biosphere.

Fig. 1: Sampling locations and latitudinal gradients in springtail community metrics.
figure 1

a Distribution of the 2470 sampling sites (43,601 soil samples). The histogram shows the number of sites in each 20-degree latitudinal belt, relative to the total land area in the belt. bg, Variation in density (n = 2210 independent sites), biomass, community metabolism, average body mass and average individual metabolism (n = 2053), and local species richness (n = 1735) with latitude. Grey circles across panels show sampling sites; red points are averages for 5-degree latitudinal belts; trends are illustrated with a quadratic function based on 5-degree averages (red line shows the mean, shaded band shows the 95% confidence interval). Source data are provided as a Source data file.

Results and discussion

Latitudinal gradient

To calculate total biomass and metabolism of each springtail community, we used recorded population densities together with estimated individual body masses and metabolic rates. Body masses and metabolic rates were derived from taxon-specific body lengths using mean annual soil temperature and allometric regressions (for calculations and parameter uncertainties see Methods). For the assessment of local species richness, we selected 70% of the sampling sites with taxonomically-resolved communities and calculated rarefaction curves to account for unequal sampling efforts; we also performed analyses using raw species richness data from a subset of samples. As such, our trends refer to local diversity (hundreds of metres), but may not be representative of regional-level diversity31.

Springtail density varied c. 30-fold across latitudes (Fig. 1b), with maximum densities in tundra (median = 131,422 individuals m−2) and minimum densities in tropical forests (5831 individuals m−2) and agricultural ecosystems (3438 individuals m−2; Supplementary Fig. 2; n = 2210). Springtail dry biomass followed the same trend, with c. 20-fold higher biomass in tundra (median = 3.09 g m−2) compared to tropical agricultural and forest ecosystems (c. 0.16 g m−2), due to a lower average community body mass in polar as opposed to temperate and tropical ecosystems (Fig. 1c, d; Supplementary Fig. 2; n = 2053). These density and biomass estimates are in line with earlier reported cross-biome comparisons20, confirming these trends across wider environmental gradients. The difference in average community body mass may be explained by lower proportion of large surface-dwelling springtail genera in polar regions32.

Being dependent on temperature and body mass, average individual metabolism was approximately 20 times higher in tropical than in polar ecosystems (Fig. 1e), which resulted in similar community metabolism across the latitudinal gradient (Fig. 1f; total n = 2053). Hence, tropical springtail communities expend a similar amount of energy per unit time and area as polar communities, despite having 20-fold lower biomass. This striking pattern resembles aboveground ecosystem respiration, which also changes little across the global air temperature gradient23. High metabolic rates but low densities of springtail communities are consistent with the high soil respiration rates and low litter accumulation in the tropics compared to biomes at higher latitudes8,24. Litter removal is facilitated by soil animals, which have to consume more food per unit biomass to meet their metabolic needs under high tropical temperatures33 and thus enhance decomposition in wet and warm tropical ecosystems34. This suggests that soil animal communities in the tropics are under strong bottom-up control (by the amount and quality of litter), but also under strong top-down control by predators, which likewise have to feed more at high temperatures33,35. By contrast, polar communities have access to ample organic matter stocks8, are under weaker top-down control33,35, but their activity is constrained by the cold environment. The latitudinal gradient in environmental and biotic controls may explain why community metabolism did not increase as expected towards warm tropical ecosystems.

We found only weak latitudinal trends in local species richness (extrapolated values), which was highest in tropical forests (mean = 36.6 species site−1) and lowest in temperate agricultural (19.5 species site−1) and grassland ecosystems (22.8 species site−1; Fig. 1g; Supplementary Fig. 2). Generally, the similar local diversity in different climates deviates from the latitudinal biodiversity gradients reported for aboveground and aquatic taxa21,22, and corroborates the hypothesized mismatch between above- and belowground biodiversity distributions36. This mismatch calls for explicit assessments of soil biodiversity hotspots for monitoring and conservation of soil organisms7.

Global distribution and its predictors

To map the global distribution of springtail community metrics and uncover its predictors, we pre-selected climatic, vegetation, soil, topographic, and anthropogenic variables with known ecological effects on springtails (Supplementary Fig. 9a). To perform a global extrapolation, we used 22 of the pre-selected variables that were globally available and applied a random forest algorithm to identify the strongest spatial associations of community parameters with environmental layers10. To reveal the key driving factors of springtail communities, we ran a path analysis with 12 non-collinear variables (Supplementary Fig. 9b). The European spatial clustering in our data distribution (Fig. 1a), was taken in consideration with a continental-scale validation in both analyses (see Methods). In addition, we ran linear modelling on a subset of data to explore the effect of seasonal climate variation and sampling methodology.

At the global scale, species richness was not related to biomass (Pearson’s R2 = 0.02) or density (Pearson’s R2 = 0.03 for extrapolated and R2 = 0.07 for raw species richness; Fig. 2a). Our extrapolations revealed at least five types of geographical areas with specific combinations of density and species richness patterns (Fig. 2a): (1) polar regions with very high densities and medium-to-high species richness such as the Arctic; (2) temperate regions with medium densities and high species richness such as mountainous and forested areas in Europe, Asia, and North America; (3) temperate regions with medium to high densities but moderate species richness such as arid temperate biomes (e.g., dry grasslands); (4) temperate, subtropical, and tropical arid ecosystems with low densities and species richness such as semi-deserts and other arid regions (largely masked on the map); (5) tropical areas with low densities but high species richness such as tropical forests and grasslands. Hotspots of springtail community metabolism were observed across a range of different latitudes (Fig. 2b), but were not associated with biodiversity hotspots (Pearson’s R2 < 0.01 for extrapolated and R2 = 0.07 for raw species richness), emphasizing that species richness is neither associated with higher density nor metabolism of springtail communities at the global scale.

Fig. 2: Global maps overlapping modelled springtail density and local species richness and community metabolism in soil.
figure 2

In (a) colours distinguish areas with different combinations of density and species richness, e.g., low density—low richness is given in yellow and high density—high richness in violet. In (b) the colour gradient indicates community metabolism, with potential coldspots shown in yellow and hotspots shown in blue. Pixels below the 90% extrapolation threshold for the corresponding variables are masked (see methods). Correlations between density or metabolism and species richness (inset graphs) are based on site-level data (points; n = 1257).

Path analysis suggested that springtail density increases with latitude, NDVI (vegetation biomass), and soil pH, but decreases with increasing mean annual air temperature, aridity index (under dryer conditions), and elevation (Fig. 3; similar responses were obtained by linear modelling; Supplementary Fig. 10). The negative global relationship of density with aridity was expected for physiologically moisture-dependent animals such as springtails26, and was also observed in nematodes10. Similar to patterns for earthworms11, soil properties had less evident linear effects on springtail density than climate at the global scale. However, the relationships of density with soil pH and organic carbon content were hump-shaped, suggesting that intermediate values of these parameters are optimal for springtails (Supplementary Fig. 8), which is also observed for nematodes10. Unfortunately, we could not evaluate the effects of nutrient elements such as nitrogen and phosphorus on springtail communities due to a lack of independent global assessments of these properties. Performing them would be an important step towards understanding the soil biosphere. Existing evidence points to soil properties as key predictors of microfauna (nematodes)10, climate as a key predictor of macrofauna (earthworms)11, and a combination of both as predictors of mesofauna (springtails) at the global scale.

Fig. 3: Environmental predictors of springtail communities at the global scale.
figure 3

Standardized effect sizes for direct (semi-transparent colour) and total (direct and indirect, solid colour) effects from path analysis are shown for density (R2 = 0.36 ± 0.01, n = 723 per iteration), local species richness (R2 = 0.20 ± 0.02, n = 352), biomass (R2 = 0.40 ± 0.02, n = 568), and community metabolism (R2 = 0.17 ± 0.02, n = 533). Mean values (squares) and data distribution (violins) are shown. Asterisks denote factors with a significant direct effect (two-tailed; p < 0.05) on a given springtail community metric for >25%(*), >50%*, >75%** and >95%*** of iterations. Source data are provided as a Source data file.

Springtail density and biomass were lower in woodlands, grasslands, and agricultural sites in comparison to scrub-dominated landscapes (Fig. 3). In contrast to previous global assessments of soil animal biodiversity10,11, tundra was extensively sampled in our dataset (n = 253; Fig. 1a), and densities >1 million individuals per square metre were recorded at 12 independent sites. The high species richness of tundra communities (Fig. 2a) suggests a long evolutionary history of springtails in cold climates; indeed, they are currently the most taxonomically represented group of terrestrial arthropods in the Arctic32 and the Antarctic37. Tundra remains under snow cover for most of the year, flourishing during summer when high springtail densities were recorded. During winter, springtails survive under the snow using remarkable adaptations to subzero temperatures (dehydration and supercooling38). Our linear modelling showed that the effect of seasonal climatic variation on springtail density and biomass is limited in comparison to the global variation in annual means (Supplementary Fig. 10), and that model with quadratic relationship with mean annual temperature explains better observed patterns in extrapolated species richness than a linear one (AIC 9611 vs 9501). However, seasonal climatic variation has critical effects on springtail activity (Supplementary Fig. 10), suggesting that functioning of the soil ecosystem is highly dynamic in time. Importantly, tundra soils contain a major proportion of the total soil organic matter and microbial biomass stored in the terrestrial biosphere8. As climate warming alters carbon cycling in the tundra39, longer active periods of springtails could accelerate soil carbon release to the atmosphere in polar regions40.

Across tropical ecosystems in the Amazon basin, equatorial Africa, and Southeast Asia, low density and biomass of springtails were recorded and extrapolated (Fig. 2a, Supplementary Figs. 4 and 6). Mesofauna in general have low abundances in tropical ecosystems, where the litter layer is shallow and larger soil-associated invertebrates, such as earthworms, termites, and ants, commonly dominate20. Our study supports this trend also found in recent global assessments of other soil invertebrates10,11,41. However, considering the high mass-specific metabolism of springtails and high predation rates in tropical communities18,25,33, a quantitative comparison of energy flows and stocks across latitudes and groups of soil fauna is needed.

Interestingly, we found no pronounced influence of agriculture and human population on springtail communities at the global scale; agriculture tended to have a marginally positive impact on biomass but a negative impact on species richness, although these trends were statistically significant only in some of model iterations (Fig. 3). Agricultural sites had similar springtail densities compared to woodlands and grasslands in the temperate zone (ca. 15–25k individuals m−2; Supplementary Fig. 3), which may be explained by large variation in management within each of these habitat types and reduced competition with more sensitive soil invertebrate groups. Some springtail species effectively survive in agricultural fields30, where they are involved in nutrient cycling and serve as natural biocontrol agents by grazing on pathogenic fungi42 and supporting arthropod predators43. Springtails are also commonly found in urban areas29. However, negative effects of agriculture and other human activities are supported by the moderate predicted local species richness in many areas of highly transformed landscapes in Europe and North America (Fig. 2). Also, our linear modelling that explicitly accounted for sampling months and methods suggested negative effects of agriculture on density and extrapolated and raw species richness of springtails (up to –40%; Supplementary Fig. 10). Overall, the negative trend in species richness at human-modified sites suggests that intensive land use may reduce springtail diversity, which is indeed often recorded29,30,44.

The only variable that was positively associated with both density and local species richness of springtails in the path analysis was NDVI (as a proxy for vegetation biomass), reinforcing the close connection between springtail communities and the vegetation15. Overall, high local species richness was predicted in warm, acidic woodlands with high soil organic carbon stocks (Fig. 3). Geospatial extrapolation emphasized tropical regions and some boreal forests in North America and Eurasia as springtail diversity hotspots (Supplementary Fig. 5). In our dataset, sites with the highest extrapolated local species richness (i.e., >100 species) were located in European woodlands (Czech Republic, Slovakia). However, this picture may be biased by the historical clustering of taxonomic expertise in Europe13. Outside Eurasia, species-rich sites (i.e., 60–80 species) were located in Vietnamese monsoon forests and some Brazilian rainforests, but 70–90% of species in tropical communities remain undescribed45,46. Hence, despite low springtail density, tropical forests contribute substantially to global springtail diversity but the full extent of this contribution is unknown. Our linear modelling also demonstrated that correct estimation of density and especially species richness critically depends on the sufficient sampling area and sampling of litter and soil layers (Supplementary Fig. 10).

Our extrapolations suggest that there are c. 2 × 1018 soil springtails globally, and their total biomass comprises c. 27.5 Mt C (16.2–28.8 Mt C minimum and maximum estimates), which corresponds to c. 190 Mt fresh weight, with respiration of c. 15.2 Mt C month−1 (i.e. c. 0.2% of the global soil respiration24; 14.6–18.6 Mt C month−1 minimum and maximum estimates). An insufficient representation of specific environmental combinations by our global extrapolation (Fig. 2) could have biased these numbers, however, most of the underrepresented areas are covered with arid biomes where densities of springtails are very low. Our biomass estimates are very similar to the global estimated biomass of nematodes (c. 31 Mt C10), but lower than that of earthworms (c. 200 Mt C11), and exceeding by far that of all wild terrestrial vertebrates (c. 9 Mt C)17, demonstrating that springtails are among the most abundant, biomass-rich, and ubiquitous animals on Earth.

Overall, our global dataset on soil springtail communities synthesizes the work of soil animal ecologists across the globe. It presents another milestone towards understanding the functional composition of global soil biodiversity. Being highly abundant in polar regions and some human-modified landscapes, springtails are facing two main global change frontiers: warming in the polar regions, and land-use change and urbanization in temperate and tropical regions. While the global abundance and biomass of springtails may decline with climate warming and/or vegetative biomass reduction in the coming decades, their global activity may remain unchanged. The global diversity of springtails will depend on the balance between anthropogenic transformations and conservation efforts of biomes worldwide.

Methods

Data reporting

The data underpinning this study is a compilation of existing datasets and therefore, no statistical methods were used to predetermine sample size, the experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. The measurements were taken from distinct samples, repeated measurements from the same sites were averaged in the main analysis.

Inclusion & ethics

Data were primarily collected from individual archives of contributing co-authors. The data collection initiative was openly announced via the mailing list of the 10th International Seminar on Apterygota and via social media (Twitter, Researchgate). In addition, colleagues from less explored regions (Africa, South America) were contacted via personal networks of the initial authors group and literature search. All direct data providers who collected and standardised the data were invited as co-authors with defined minimum role (data provision and cleaning, manuscript editing and approval). For unpublished data, people who were directly involved in sorting and identification of springtails, including all local researchers, were invited as co-authors. Principal investigators were normally not included as co-authors, unless they contributed to conceptualisation and writing of the manuscript. All co-authors were informed and invited to contribute throughout the research process—from the study design and analysis to writing and editing. The study provided an inclusive platform for researchers around the globe to network, share and test their research ideas.

Data acquisition

Both published and unpublished data were collected, using raw data whenever possible entered into a common template. In addition, data available from Edaphobase47 was included. The following minimum set of variables was collected: collectors, collection method (including sampling area and depth), extraction method, identification precision and resources, collection date, latitude and longitude, vegetation type (generalized as grassland, scrub, woodland, agriculture and other for the analysis), and abundances of springtail taxa found in each soil sample (or sampling site). Underrepresented geographical areas (Africa, South America, Australia and Southeast Asia) were specifically targeted by a literature search in the Web of Science database using the keywords ‘springtail’ or ‘Collembola’, ‘density’ or ‘abundance’ or ‘diversity’, and the region of interest; data were acquired from all found papers if the minimum information listed above was provided. All collected datasets were cleaned using OpenRefine v3.3 (https://openrefine.org) to remove inconsistencies and typos. Geographical coordinates were checked by comparing the dataset descriptions with the geographical coordinates. In total, 363 datasets comprising 2783 sites were collected and collated into a single dataset (Supplementary Fig. 1).

Calculation of community parameters

Community parameters were calculated at the site level. Here, we defined a site as a locality that hosts a defined springtail community, is covered by a certain vegetation type, with a certain management, and is usually represented by a sampling area of up to a hundred metres in diameter, making species co-occurrence and interactions plausible. To calculate density, numerical abundance in all samples was averaged and recalculated per square metre using the sampling area. Springtail communities were assessed predominantly during active vegetation periods (i.e., spring, summer and autumn in temperate and boreal biomes, and summer in polar biomes). Our estimations of community parameters therefore refer to the most favourable conditions (peak yearly densities). This seasonal sampling bias is likely to have little effect on our conclusions, since most springtails survive during cold periods38,48. Finally, we used mean annual soil temperatures49 to estimate the seasonal mean community metabolism (described below) and tested for the seasonal bias in additional analysis (see Linear mixed-effects models).

All data analyses were conducted in R v. 4.0.250 with RStudio interface v. 1.4.1103 (RStudio, PBC). Data was transformed and visualised with tidyverse packages51,52, unless otherwise mentioned. Background for the global maps was acquired via the maps package53,54. To calculate local species richness, we used data identified to species or morphospecies level (validated by the expert team). Since the sampling effort varied among studies, we extrapolated species richness using rarefaction curves based on individual samples with the Chao estimator51,52 in the vegan package53. For some sites, sample-level data were not available in the original publications, but site-level averages were provided, and an extensive sampling effort was made. In such cases, we predicted extrapolated species richness based on the completeness (ratio of observed to extrapolated richness) recorded at sites where sample-level data were available (only sites with 5 or more samples were used for the prediction). We built a binomial model to predict completeness in sites where no sample-level data were available using latitude and the number of samples taken at a site as predictors: glm(Completeness~N_samples*Latitude). We found a positive effect of the number of samples (Chisq = 1.97, p = 0.0492) and latitude (Chisq = 2.07, p = 0.0391) on the completeness (Supplementary Figs. 1719). We further used this model to predict extrapolated species richness on the sites with pooled data (435 sites in Europe, 15 in Australia, 6 in South America, 4 in Asia, and 3 in Africa).

To calculate biomass, we first cross-checked all taxonomic names with the collembola.org checklist55 using fuzzy matching algorithms (fuzzyjoin R package56) to align taxonomic names and correct typos. Then we merged taxonomic names with a dataset on body lengths compiled from the BETSI database57, a personal database of Matty P. Berg, and additional expert contributions. We used average body lengths for the genus level (body size data on 432 genera) since data at the species level were not available for many morphospecies (especially in tropical regions), and species within most springtail genera had similar body size ranges. Data with no genus-level identifications were excluded from the analysis. Dry and fresh body masses were calculated from body length using a set of group-specific length-mass regressions (Supplementary Table 1)58,59 and the results of different regressions applied to the same morphogroup were averaged. Dry mass was recalculated to fresh mass using corresponding group-specific coefficients58. We used fresh mass to calculate individual metabolic rates60 and account for the mean annual topsoil (0–5 cm) temperature at a given site61. Group-specific metabolic coefficients for insects (including springtails) were used for the calculation: normalization factor (i0) ln(21.972) [J h−1], allometric exponent (a) 0.759, and activation energy (E) 0.657 [eV]60. Community-weighted (specimen-based) mean individual dry masses and metabolic rates were calculated for each sample and then averaged by site after excluding 10% of maximum and 10% of minimum values to reduce impact of outliers. To calculate site-level biomass and community metabolism, we summed masses or metabolic rates of individuals, averaged them across samples, and recalculated them per unit area (m2).

Parameter uncertainties

Our biomass and community metabolism approximations contain several assumptions. To account for the uncertainty in the length-mass and mass-metabolism regression coefficients, in addition to the average coefficients, we also used maximum (average + standard error) and minimum coefficients (average—standard error; Supplementary Table 1) in all equations to calculate maximum and minimum estimations of biomass and community metabolism reported in the main text. Further, we ignored latitudinal variation in body sizes within taxonomic groups62. Nevertheless, latitudinal differences in springtail density (30-fold), environmental temperature (from −16.0 to +27.6 °C in the air and from −10.2 to +30.4 °C in the soil), and genus-level community compositions (there are only few common genera among polar regions and the tropics)55 are higher than the uncertainties introduced by indirect parameter estimations, which allowed us to detect global trends. Although most springtails are concentrated in the litter and uppermost soil layers20, their vertical distribution depends on the particular ecosystem63. Since sampling methods are usually ecosystem-specific (i.e. sampling is done deeper in soils with developed organic layers), we treated the methods used by the original data collectors as representative of a given ecosystem. Under this assumption, we might have underestimated the number of springtails in soils with deep organic horizons, so our global estimates are conservative and we would expect true global density and biomass to be slightly higher. To minimize these effects, we excluded sites where the estimations were likely to be unreliable (see data selection below).

Data selection

Only data collection methods allowing for area-based recalculation (e.g. Tullgren or Berlese funnels) were used for analysis. Data from artificial habitats, coastal ecosystems, caves, canopies, snow surfaces, and strong experimental manipulations beyond the bounds of naturally occurring conditions were excluded (Supplementary Fig. 1). To ensure data quality, we performed a two-step quality check: technical selection and expert evaluation. Collected data varied according to collection protocols, such as sampling depth and the microhabitats (layers) considered. To technically exclude unreliable density estimations, we explored data with a number of diagnostic graphs (Supplementary Table 2; Supplementary Figs. 1220) and filtered it, excluding the following: (1) All woodlands where only soil or only litter was considered; (2) All scrub ecosystems where only ground cover (litter or mosses) was considered; (3) Agricultural sites in temperate zones where only soil with sampling depth <10 cm was considered. Additionally, 10% of the lowest values were individually checked and excluded if density was unrealistically low for the given ecosystem (outliers with density over three times lower than 1% percentile within each ecosystem type). In total, 237 sites were excluded from density, and 394 sites from biomass, and community metabolism analyses based on these criteria (Supplementary Figs. 15 and 16). For the local species richness estimates, we removed all extrapolations based on sites with fewer than three samples and no (morpho)species identifications (647 sites; Supplementary Figs. 1 and 20).

Data expert evaluation

We performed manual expert evaluation of every contributed dataset. Evaluation was done by an expert board of springtail specialists, each with extensive research experience in a certain geographic area: Anatoly Babenko—high latitude regions in both north and south hemispheres; Bruno Bellini—Central and South America; Jean-François Ponge—Central and Western Europe; Louis Deharveng—Africa and Asia; Lubomir Kovac—Southern Europe; Mikhail Potapov and Natalia Kuznetsova—Eastern and Northern Europe. Each dataset was scored by the experts separately for density and species richness estimation as either trustworthy, acceptable, or unreliable. Density estimation quality was assessed using information about the sampling and extraction method and the density estimation itself. Species richness estimation quality was assessed using information about the identification key, experience of the person who identified the material, species (taxa) list, and the species richness estimation itself. Based on the expert opinions, unreliable estimates of density (together with biomass and community metabolism) and species richness were excluded (Supplementary Fig. 1). The resulting final dataset included 2470 sites and 43,601 samples64 with a median of six samples collected at each site. The dataset comprised 2210 sites with density estimation (69–2,181,600 individuals m−2), 2,053 sites with mean fresh body mass (1.8–3110 µg), mean metabolic rate (0.028–2.4 mJ h−1), dry biomass (0.5–93,000 mg m−2), fresh biomass (1.6–277,000 mg m−2) and community metabolism estimations (0.03–1000 J h−1), and 1735 sites with local species richness estimation (1–136.7 species; Supplementary Figs. 1 and 2). The dataset covered all major biomes (Supplementary Fig. 3), years 1970–2019, and all months: 8% of the samples were taken between December and February, 14% between March and May, 55% between June and August, and 23% between September and November (see Data availability).

Data transformation

All parameters except for extrapolated local species richness were highly skewed (e.g., density had a global median of 21,016 individuals m−2 and a mean of 60,454 individuals m−2) and we applied log10-transformation prior to analysis. This greatly improved the fit of all statistical analyses.

Latitudinal and ecosystem trends

To explore changes in springtail communities with latitude, we sliced the global latitudinal gradient into 5-degree bins and calculated average parameters across sites in each bin after trimming to ensure the same statistical weight for each latitudinal bin while plotting the gradient. The latitudinal gradient was plotted with ggplot265, and quadratic smoothers were used to illustrate trends. Mean parameters of springtail communities were compared across ecosystem types using a linear model and multiple comparisons with the Tukey HSD test using HSD.test in the agricolae package66. Habitats were classified according to the vegetation types. Climates were classified as polar (beyond the polar circles, i.e., more than 66.5 and less than −66.5 degrees), temperate (from the polar circles to the tropics of Capricorn/Cancer, i.e. to 23.5 and −23.5 degrees) and tropical (in between 23.5 and −23.5 degrees). Habitats and climates were combined to produce ecosystem types. For the analysis, only well-represented ecosystem types were retained: polar scrub (n = 253), polar grassland (n = 39), polar woodland (n = 28), temperate woodland (n = 907), temperate scrub (n = 104), temperate grassland (n = 445), temperate agriculture (n = 374), tropical agriculture (n = 68) and tropical forest (n = 141; Supplementary Fig. 3).

Selection of environmental predictors

To assess the predictors of global distributions of springtail community metrics, we pre-selected variables with a known ecological effect on springtail communities (based on expert opinions) and constructed a hypothetical relationship diagram (Supplementary Fig. 9a). Environmental data were very heterogeneous across the springtail studies, so we used globally available climatic and other environmental layers. Overall, we included global layers bearing the following information: climate (mean annual air temperature, air temperature seasonality, air temperature annual range, mean annual precipitation, precipitation seasonality, precipitation of the driest quarter67, inversed aridity index68), topography (elevation, roughness69), vegetation and land cover (aboveground biomass70, tree cover71, Net Primary Production, Normalized Difference Vegetation Index [NDVI]72), topsoil physicochemical properties (0–15 cm depth C to N ratio, pH, clay, sand, coarse fragments, organic carbon, bulk density73) and human population density74. Some of environmental layers could not be included due to the lack of appropriate data. For example, soil phosphorus and nitrogen concentrations had to be omitted. While the global distribution of soil nitrogen concentration is available73, it is a modelled product, which strongly correlates with soil carbon concentration, and thus cannot be used as an independent predictor.

Geospatial global projections

To create global spatial predictions of springtail density, species richness, biomass, and community metabolism, we followed the approach previously used for nematodes10,75 that is based on spatial associations of community parameters with global environmental information. The analysis for geospatial modelling was done in Python version 3.6.5 (Python software foundation). A Random Forest algorithm was applied to identify the spatial associations and extrapolate local observations to the global scale at the 30 arcsec resolution (approximately 1 km2 pixels)18,75. After retrieving the environmental variable values for each location, we trained 18 model versions, each with different hyperparameter settings, i.e., variables per split (range: 2–7); minimum leaf population (range: 3–5). To minimize the potential bias of a single model, we used an ensemble of the top 10 best-performing models, selected based on the coefficient of determination (R2), to create global predictions of each of the community parameters.

Model performance was assessed by 10-fold cross validation, with folds assigned randomly. The R2 values for each of the five response variables were in the range of 0.30–0.57 (density: 0.567, dry biomass: 0.463, community metabolism: 0.359, extrapolated species richness: 0.302). For some of the modelled variables we observed positive spatial autocorrelation: at ranges below 150 km for density, below 100 km for community metabolism and below 150 km for extrapolated species richness (Supplementary Note). Yet, the Moran’s I values were very close to zero (the highest value was 0.07), indicating that the effect of spatial autocorrelation was very weak. These results were obtained by performing Moran’s I tests using the spatialRF package in R76. To investigate the effect of spatial autocorrelation on model performance we performed a buffered leave-one-out cross-validation tests (described in detail as an alternative performance statistic for models with potential spatial autocorrelation77,78). As expected, the predictive power declined with increasing buffer sizes. At the scales at which we observe positive autocorrelation, i.e., where we have significant Moran’s I values, coefficient of determination remained positive.

To reduce potential artifacts produced by extrapolation, geographical regions with climatic conditions poorly represented by our sites and without NPP data were excluded from the extrapolation (e.g., Sahara, Arabian desert, Himalayas). We evaluated our extrapolation quality based on spatial approximations of interpolation versus extrapolation75. In this approach, we first determined the range of environmental conditions represented by the observations. Next, we classified all pixels to fall within or outside the training space, in univariate and multivariate space. For the latter, we first transformed the data into principal component space, and selected the first 11 PC axes, collectively explaining 90% of the variation. Finally, we classified pixels to fall within or outside the convex hulls drawn around each possible bivariate combination of these 11 PC axes; pixels that fell outside the convex hulls in >90% of cases were masked on the main maps; for the map with density-species richness visualisation, two corresponding masks were applied (Fig. 2).

To estimate spatial variability of our predictions while accounting for the spatial sampling bias in our data (Fig. 1a) we performed a spatially stratified bootstrapping procedure. We used the relative area of each IPBES79 region (i.e., Europe and Central Asia, Asia and the Pacific, Africa, and the Americas) to resample the original dataset, creating 100 bootstrap resamples. Each of these resamples was used to create a global map, which was then reduced to create mean, standard deviation, 95% confidence interval, and coefficient of variation maps (Supplementary Figs. 47).

Global biomass, abundance, and community metabolism of springtails were estimated by summing predicted values for each 30 arcsec pixel10. Global community metabolism was recalculated from joule to mass carbon by assuming 1 kg fresh mass = 7 × 106 J80, an average water proportion in springtails of 70%58, and an average carbon concentration of 45% (calculated from 225 measurements across temperate forest ecosystems)81. We repeated the procedure of global extrapolation and prediction for biomass and community metabolism using minimum and maximum estimates of these parameters from regression coefficient uncertainties (see Parameter uncertainties).

Path analysis

To reveal the predictors of springtail communities at the global scale, we performed a path analysis. After filtering the selected environmental variables (see above) according to their global availability and collinearity, 13 variables were used (Supplementary Fig. 9b): mean annual air temperature, mean annual precipitation (CHELSA database67), aridity (CGIAR database68), soil pH, sand and clay contents combined (sand and clay contents were co-linear in our dataset), soil organic carbon content (SoilGrids database73), NDVI (MODIS database72), human population density (GPWv4 database74), latitude, elevation69, and vegetation cover reported by the data providers following the habitat classification of European Environment Agency (woodland, scrub, agriculture, and grasslands; the latter were coded as the combination of woodland, scrub, and agriculture absent). Before running the analysis, we performed the Rosner’s generalized extreme Studentized deviate test in the EnvStats package82 to exclude extreme outliers and we z-standardized all variables (Supplementary R Code).

Separate structural equation models were run to predict density, dry biomass, community metabolism, and local species richness in the lavaan package83. To account for the spatial clustering of our data in Europe, instead of running a model for the entire dataset, we divided the data by the IPBES79 geographical regions and selected a random subset of sites for Eurasia, such that only twice the number of sites were included in the model as the second-most represented region. We ran the path analysis 99 times for each community parameter with different Eurasian subsets (density had n = 723 per iteration, local species richness had n = 352, dry biomass had n = 568, and community metabolism had n = 533). We decided to keep the share of the Eurasian dataset larger than other regions to increase the number of sites per iteration and validity of the models. The Eurasian dataset also had the best data quality among all regions and a substantial reduction in datasets from Eurasia would result in a low weight for high-quality data. We additionally ran a set of models in which the Eurasian dataset was represented by the same number of sites as the second-most represented region, which yielded similar effect directions for all factors, but slightly higher variations and fewer consistently significant effects. In the paper, only the first version of analysis is presented. To illustrate the results, we averaged effect sizes for the paths across all iterations and presented the distribution of these effect sizes using mirrored Kernel density estimation (violin) plots. We marked and discussed effects that were significant at p < 0.05 in more than a given number of iterations (arbitrary thresholds were set to 25%, 50%, 75% and 95% of iterations; Fig. 3).

Linear mixed-effects modelling

To test if our results are biased by seasonal effects, sampling methodology, and/or species richness extrapolation, we selected a subset of sampling events with known sampling year and month (2997 sampling events representing 1703 sites) and ran linear mixed-effects models for springtail density, species richness (both raw and extrapolated), biomass, and community metabolism. The models were run using the lme4 package84. Data were transformed as described above and analysed using Gaussian distributions except for raw species richness, which was analysed using generalised models with Poisson distribution. Sampling site was included as random effect to account for the dependence of the sampling events coming from the same sites. We included mean monthly air temperatures (offset from the annual mean) and the sum of monthly precipitation at the sampling month as additional climatic predictors. We also included total collection area and the presence of litter (or any other soil cover such as mosses) and soil in the sample to account for methodological biases. All models were run using the full dataset (n = 2884 sampling events for density, n = 2540 for raw species richness; n = 1708 for extrapolated species richness; n = 2462 for dry biomass; n = 2289 for community metabolism). To test if the effect of temperature on species richness is non-linear, we additionally ran the same model including quadratic function poly(MAT, 2).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.