Soil biota, including bacteria, archaea, protists, fungi, and animals, underpin globally important ecosystem functions. Fundamental functions of soil communities include nutrient and hydrological cycling, decomposition, pollution mitigation, and supporting terrestrial primary production, which are inextricably linked to global food security, climate regulation, and other ecosystem services1,2. Nevertheless, until recently, characterising soil biodiversity (popularly referred to as a ‘black box’) has been constrained by our inability to identify typically intractable levels of diversity using either traditional or molecular approaches. High-throughput sequencing has however resulted in a step change, facilitating the characterisation of bacteria3,4,5,6,7, archaea6,7,8, fungi9,10, protists11,12,13, and animals14 within the belowground biosphere. Increasingly, efforts have been made to investigate the total biodiversity of the soil biosphere across large ecological15,16,17 and taxonomic scales15,16,18,19.

Understanding the response of the total soil biosphere to changes in land use and environmental drivers has become an important research focus in regional soil monitoring programmes15,16,19 and in small-scale field20,21 and mesocosm experiments18,20. Yet despite the move towards unified study of soil biota, fundamental challenges of technique and scale remain. Often, such studies require the comparison of soil biota metrics captured through both traditional and modern molecular techniques15,19,20,21. To our knowledge, relatively few studies have attempted to assess all components of belowground communities using a multi-marker metabarcoding approach22.

There is mounting evidence that the microbial and animal fractions of soil communities may respond differentially to land use change. Microbial richness increases15, whereas richness of soil fauna declines in response to more intense land use15,23,24. However, these findings come from relatively homogenous landscapes, such as grasslands15. It is unclear whether the differential responses of soil microbes and fauna extend across heterogeneous land uses. For example, across heterogeneous landscapes of Wales, UK, α-diversity of mesofauna is both lowest in agricultural and bog systems, which are the most- and least-intensively managed systems in the country, respectively23. Changes in soil properties may further dictate declines of common soil fauna in low-intensity land uses. Therefore, it is critical to assess whether the positive effect of increasing land use intensity on microbial richness is consistent across regions made up of markedly diverse ecosystems and land uses. Similarly, the importance of individual soil properties in shaping belowground communities has also proven difficult to disentangle. Many studies have demonstrated the consistent dominance of pH in shaping belowground community composition at national23,25,26,27,28 and global scales4,5,9,29. However, climatic factors9,30 and other soil properties, including organic matter, nitrogen (N) availability, and the carbon (C)-to-N ratio9, are also recognised as important drivers of belowground community composition yet consistent trends remain elusive30. Therefore it is unclear whether the total soil biosphere responds to changes in land use and soil properties in the same manner across heterogeneous landscapes.

Here, we sought to assess whether divergent responses to land use and soil properties in the microbial and animal fractions of soil communities persist across heterogeneous systems at the national-scale using a standardised metabarcoding approach. We present a national-scale analysis of soil biodiversity across Wales, UK, from the micro-to-macro scale including all major groups of soil microbes in addition to animals, from 436 sites over 2 years across a diverse array of oceanic-temperate ecosystems, including grasslands, forests, bogs, and managed systems. Biotic metrics come from high-throughput sequencing of prokaryotic, fungal, microbial eukaryotic, and soil animal communities using 16S, ITS, and 18S rRNA marker genes; these are complemented by an extensive suite of co-located abiotic soil properties and vegetation cover data. Specifically, we investigate how richness and β-diversity of all major fractions of subterranean life respond to land use type and prevailing soil properties (e.g. organic matter, pH, and N) to explore which lineages play a demonstrable role in determining belowground community structures across large and complex ecological gradients. Our results demonstrate that across a gradient of heterogeneous land uses, richness of soil animals is governed more by land use regime rather than intrinsic soil properties. In contrast, microbial richness is driven by soil properties and demonstrates a largely linear trend of decreasing richness along a productivity gradient of land use based on decreasing soil nutrient availability.


Sequencing results

Illumina sequencing and environmental data were collected from across Wales as part of the Glastir Monitoring and Evaluation Programme (GMEP)31. Sample sites were categorised into Aggregate Vegetation Classes (AVCs) based on plant species assessments using established criteria (see Supplementary Note 1). An explanation of the composition of AVCs is described in Supplementary Table 1. Briefly, the 7 AVCs used in the current study were established by clustering samples based on an assessment of vegetation data using a detrended correspondence analysis32. The ordination of the detrended correspondence analysis has shown that the land use categories follow a gradient of soil nutrient content32 from which soil productivity and management intensity can also be inferred (see Supplementary Note 1 and Supplementary Table 1). The AVCs in descending order of productivity are crops/weeds, fertile grassland, infertile grassland, lowland wood, upland wood, moorland grass-mosaic, and heath/bog.

In total, 29,690 bacterial and 156 archaeal operational taxonomic units (OTUs) were identified from 16S reads. Overall, the most abundant class was Alphaproteobacteria (Fig. 1a). Proportional abundances (OTU n/total × 100) of Acidobacteria increased in less-productive land use types from its lowest in crops/weeds to its highest in heath/bog AVCs. In contrast, abundances of Actinobacteria followed the exact opposite trend, as did Spartobacteria and Bacilli (Fig. 2a). For archaea, Nitrososphaeria was the most abundant class overall (Fig. 1d); however, the proportion of Thermoplasmata became dominant in less productive AVCs (Fig. 2d).

Fig. 1
figure 1

Sankey diagrams of proportional abundances of OTUs from all samples for major soil biota groups. Arms denote proportions of OTUs at the class-level for a bacteria; b fungi; of major lineages of c protists; class-level for d archaea; and at the phylum-level for e animals. For information on how this figure was created, please see Supplementary Methods

Fig. 2
figure 2

Proportionate abundances of OTUs for major soil biota groups within each Aggregate Vegetation Class. Land uses are ordered from most (crops/weeds) to least (heath/bog) using the same divisions as Fig. 1 for a bacteria; b fungi; c protists; d archaea; and e animals

There were 7582 OTUs recovered from ITS1 sequences. Agaricomycetes were the most abundant class of fungi overall. There was also a large proportion of Sordariomycetes (Fig. 1b). Proportionate abundances of Sordariomycetes and Agaricomycetes followed contrasting trends, with the dominance of the former replaced by the later in lower productivity AVCs (Fig. 2b).

In total, 8683 protist OTUs were recovered from the 18S reads. Chloroplastida (green algae) was by far the most abundant protist group, followed by Rhizaria, Stramenopiles, and then Alveolates (Fig. 1c). Green algae, largely comprised of unidentified sequences (Supplementary Fig. 1a), were least abundant in crops/weed and heath/bog sites (Fig. 2c). Proportions of Rhizaria were relatively constant across AVCs (Fig. 2c) and entirely comprised of Cercozoa (Supplementary Fig. 1b). Among Stramenopiles, proportions of Ochrophyta were also largely consistent, while those of Oomycetes and Bicosoecida followed contrasting trends across the productivity gradient of AVCs, declining and increasing, respectively (Supplementary Fig. 1c). Ciliates were the most common Alveolates in most AVCs; however, the proportion of Apicomplexa was greater in the lowland wood and grassland AVCs (Supplementary Fig. 1d). The proportion of Amoebozoa was surprisingly low (Fig. 1c), potentially due to primer bias in our study when compared to other studies12,15. Across AVCs Tublulinea was consistently dominant among the Amoebozoa, though divergent trends in Gracilipodida and Discosea can be seen along the productivity/intensity gradient (Supplementary Fig. 1e).

In the animal dataset, 1138 OTUs were recovered. Nematode OTUs were the most abundant animal group across all samples (Fig. 1e). Annelids and arthropods followed opposing trends in proportionate abundance, increasing and decreasing respectively, across the productivity gradient. Proportions of Platyhelminthes and Tardigrades also increased in less-productive AVCs (Fig. 2e).

Effect of land use on belowground richness

We found significant differences in biodiversity trends across land use types. There was a marked shift along the productivity gradient of crops/weeds-to-heath/bog in all organismal groups, except animals (Fig. 3). Significant differences in the mean richness of bacterial OTUs were prominent (F6,264 = 78.47, p < 0.0001) following ANOVA. Bacterial richness decreased in AVCs across the productivity gradient with highest values in the most productive crops/weeds and grasslands and lowest in the low productivity land uses (i.e. moorland grass-mosaic, heath/bog) (Fig. 3a). The same trend was also observed in fungi (F6,248 = 48.98, p < 0.001; Fig. 3b), and protists (F6,249 = 59.86, p < 0.001; Fig. 3c). For individual pair-wise comparisons see Supplementary Note 4. Richness of archaeal OTUs had an opposing trend to that of other microbial groups. Archaeal OTU richness was significantly lower (F6,185 = 24.37, p < 0.001) in higher-productivity AVCs and highest in the least-productive land-use types (Fig. 3d). In the crops/weeds, AVC richness of archaeal OTUs was significantly lower than upland wood (p = 0.01), moorland grass-mosaic (p = 0.005), and heath/bog sites (p < 0.001) based on Tukey’s post hoc tests, with the remaining land uses displaying intermediate OTU richness values.

Fig. 3
figure 3

Boxplots of OTU richness for each organismal group. Richness of a bacteria; b fungi; c protists; d archaea; e animals are plotted against Aggregate Vegetation Class ordered from most (crops/weeds) to least (heath/bog) productive. Boxes are bounded on the first and third quartiles; horizontal lines denote medians. Black dots are outliers beyond the whiskers, which denote 1.5× the interquartile range. Source data are provided as a Source Data file

Animal OTU richness did not follow the trends observed in microbial communities. Differences observed with ANOVA were significant (F6,244 = 6.25, p < 0.001) but plateaued after the grassland AVCs, as opposed to the sloped trend of microbial groups across the productivity gradient (Fig. 3e). Richness in the infertile grasslands was significantly greater than in crops/weeds (p = 0.008), heath/bog (p = 0.003), and upland wood (p = 0.02) based on Tukey’s post hoc tests. Richness was lowest in the most intensively management crops/weeds sites and was shown to be significantly lower than richness of lowland woods (p = 0.04) with Tukey’s test. Collectively, the results demonstrate a strong divergence between the richness of animal and microbial communities across all AVCs.

Relationships of richness between organismal groups

Bacterial richness from the total dataset was significantly correlated with all other organismal groups (Supplementary Table 2). Such relationships were positive between bacterial richness and richness of fungi, protists, and animals. Similarly, there was a positive relationship between protistan richness and both fungal and animal richness. However, archaeal richness demonstrated significant, but negative correlations with all organisms except animals. Indeed animal richness (measured by metabarcoding) was only significantly correlated with animals (measured by taxonomic assessment; Table 1) and protists (Supplementary Table 2).

Table 1 Results of partial least squares regressions for soil biota against soil properties for richness

Relationships between richness and environmental variables

Partial least squares (PLS) regressions demonstrated that the divergence observed between animal and microbial communities may be due to the effects of soil properties. No soil properties were significantly correlated with richness of soil animal OTUs (Table 1). Conversely, there were strong relationships between microbial richness and a range of soil properties. However, although microbes were influenced by the same environmental variables, there were distinct patterns within each group. For example, while pH was the best predictor of bacterial richness, it was ranked as second for fungi and protists and third for archaea. Bulk density and C:N ratio were also major drivers of richness across all microbial groups. Elevation (here closely linked with precipitation and organic matter content) was the most important environmental variable in relation to archaea and protist richness. Organic matter and bulk density were strong predictors of fungal OTU richness. All environmental properties that had positive relationships with OTU richness of bacteria, fungi, and protists had negative relationships with archaea.

Community structure (β-diversity) across land uses

Non-metric multidimensional scaling (NMDS) using Bray–Curtis distances showed consistent differences in β-diversity between AVCs across all organismal groups. Plots show tight clustering of the crops/weeds, fertile grassland, and infertile grassland AVCs, whereas the other AVCs form a more dispersed organismal assemblage (Fig. 4 for bacteria and Supplementary Figs. 25). Results of PERMANOVAs were significant across all groups and analyses of dispersion were also significant (Fig. 4 for bacteria and Supplementary Figs. 25) for all groups except for the dispersion of animals (F6,401 = 0.67, p = 0.68) owing to the wide range of sample numbers within each AVC (Supplementary Fig. 5). We also found that this clustering was present using constrained canonical analyses of principle components (CAP) ordinations for each organismal group (Supplementary Figs. 610).

Fig. 4
figure 4

Plot of the non-metric dimensional scaling ordination (stress = 0.06) of bacterial community composition across GMEP sites. Samples are coloured by Aggregate Vegetation Class. Results of PERMANOVA (F6,427 = 30.76, p = 0.001) and dispersion of variances of groups (F6,427 = 10.97, p = 0.001) were significant

pH was the best predictor of β-diversity from linear fitting for all soil organisms (Table 2 and Supplementary Tables 36). The carbon-to-nitrogen (C:N) ratio was the second most important variable in all major groups except animals. Mean C:N values were higher in the crops/weeds and grassland AVCs and lower in the remaining land use types (Supplementary Table 6). Mean pH values and C:N ratios (Supplementary Table 6) reflect the distribution of points in NMDS plots, with tight groupings observed in the crops/weeds and grasslands AVCs and increasingly more spread out groupings in all other AVCs as pH values decreased and became more varied (Fig. 4 for bacteria and Supplementary Figs. 25). Across all groups, all or nearly all variables were significant following linear fitting; however, most were only weakly correlated with β-diversity values. Other important variables varied in their ranked importance, including elevation, mean annual precipitation, organic matter content, total C, bulk density, volumetric water content, and clay content of soil (Table 2 and Supplementary Tables 36). The results of linear model fitting for CAP ordinations, though not identical (Supplementary Tables 711), were highly related to those of the NMDS ordinations (Supplementary Fig. 11).

Table 2 Summary of relationships amongst environmental factors and bacteria communities


High-throughput sequencing of the biosphere amongst heterogeneous soils revealed both expected and novel relationships between soil organisms and environmental drivers. The richness of microbes and animals had notable contrasting trends across land use types. The richness of microbial communities was strongly influenced by both land use and environmental variables, especially pH, C:N ratio, elevation, organic matter, and annual precipitation. Conversely, we found no significant associations between measured environmental variables and animal richness, which was negatively impacted by higher intensity land use, suggesting that richness patterns of microbial and macrobial life fractions adhere to different ecological determinants. For β-diversity, pH was by far the most important environmental variable in shaping community composition of all organismal groups, yet other drivers were attributable for influencing patterns of α-diversity.

Our findings demonstrate that diverging trends between soil microbes and fauna extend across distinct, heterogeneous land uses. Furthermore, we build on the work of Gossner et al.15 by demonstrating that microbial richness, with the exception of archaea, increases with greater land use intensity across heterogeneous ecosystems at the national-scale. The divergence between microbes and animals at this scale is supported by previous findings from French soils17,25. Across France, bacterial richness17 and biomass25 were strongly linked to belowground environmental properties but largely unaffected by aboveground climatic variables, which commonly influence animal and plant biogeography25,30. Our findings show that richness of fungi and protists also follow this trend—whereas archaea follow an opposing trend to all other groups.

There are several mechanisms that may explain the relationship between higher microbial richness and intensifying anthropogenic disturbance. One explanation is that consistent nutrient inputs from fertilisers and disturbance under tillage stimulate high α-diversity in these areas16. Indeed higher α-diversity has been observed in cropping systems than in forest or grassland sites for both bacteria16,17 and fungi16. Interestingly, high microbial richness in more productive land use types (e.g. arable) may illustrate the intermediate disturbance hypothesis (IDH) within soil ecosystems. Under the IDH, as outlined by Connell33, diversity reaches its highest levels where succession has been interrupted by intermittent disturbance events. In our sites, microbial richness was highest in AVCs concurrent to disturbances (augmented by nutrient inputs) from agricultural interventions such as fertilisation, tilling, clearing, and the cultivation of livestock. However, it is also possible that the high diversity observed in the grassland and especially in agricultural land uses stems from organisms that have entered a dormant state after disturbance-induced changes to their environment13,34. Disturbance pressures can also lead to high bacterial diversity through the reduction in dominant OTUs, which are replaced by a wide range of weaker competitors. It has been demonstrated that α-bacterial diversity is greater in the phyllosphere of ivy in urban habitats associated with more anthropogenic stressors than in less disturbed sites35. Our findings suggest that the phenomenon of greater species richness resulting from the addition of nutrients and non-equilibrium dynamics induced by disturbance may extend to across all microbial groups, with the possible exception of archaea.

Richness of all microbial groups, except archaea, followed the land use productivity/management intensity gradient32 with higher richness in the highly productive and more disturbed grasslands and arable sites and lower richness in the least productive, relatively undisturbed upland heath/bog sites. Changes within bacterial and fungal communities reflected expected within-community changes following the shift in soil nutrient quality across land uses. Actinobacteria36 and Sordariomycetes37 are known to dominate bacterial and fungal communities in high productivity grasslands as witnessed here. In contrast, Acidobacteria increased in proportion in low productivity, highly acidic AVCs as expected based on previous studies from the UK27 and across the globe7. Likewise, the greater proportion of Agaricomycetes OTUs in low productivity AVCs is intuitive as many Agaricomycete fungi are common in bogs and related low-productivity habitats across Wales38.

Protists have been chronically overlooked in European soil monitoring programmes (but see ref. 28), as extracting trends of protist diversity across land uses is difficult. For example, Gossner et al.15 were not able to show changes in richness across all protists with land use intensification. We demonstrate that protistan richness follows the trends of bacteria and fungi across land uses, with the highest richness levels in arable land. As with other microbes, there is evidence of increased protist richness at the mesocosm39 and field40 level, in response to fertiliser addition. Furthermore, in German grassland soils, protist richness has been shown to increase with land use intensity41. Our results show that an association between intensification and protistan richness extends across the national-scale over multiple land uses.

Unlike other microbes, archaeal richness was greatest in low productivity AVCs and lowest in highly productive sites (Fig. 3d). Furthermore, our understanding of the extent of soil archaeal diversity and its functional capabilities is continually increasing6,7,8. Recent research has revealed many lineages of Thaumarchaeota are crucial links in the N cycle and methanogenesis in soils7,8. Archaeal richness was highest in the moorland grass-mosaic and heath/bog AVCs, likely due to the specialised nature of acidophilic lineages. In particular, the Thaumarchaeota42 and Thermoplasmata43 are known to proliferate (Fig. 2d) under reduced competition from bacteria.

Animal richness did not change linearly with land use and was not strongly influenced by environmental variables. Our molecular analysis of soil eDNA supports recent findings by George et al.23 based on morphological assessments of coincident soil mesofauna. Both the present work and George et al.23 demonstrated that animal richness and abundance were lowest in land uses associated with more intensive management. Animal richness peaked in infertile grasslands and was lowest in crops/weeds sites (Fig. 3e). Agricultural disturbance negatively affects soil faunal richness and diversity across large geographic scales14,23,24. However, in the low-productivity land uses, although proportional abundances of arthropod taxa declined similarly to the findings of George et al.23, overall richness was not as strongly affected due to an increase in fractions of Annelids, Platyhelminthes, and Tardigrades. Such an increase in the peat-rich, low-disturbance, higher elevation sites is rather intuitive since Annelids, Platyhelminthes, and Tardigrades are susceptible to desiccation and require moist habitats to be active components of the soil community44,45. As soil animals still exhibited expected lower diversity trends in more intensively managed land uses15,23,24, there are further opportunities for research into understanding the mechanisms underlying the divergent richness trends between microscopic animals and the rest of soil communities.

Soil pH, as evidenced by ordination results, was the most important environmental variable in our study for β-diversity and in most cases richness as has been previously observed across the UK27,28 and at larger national25,26 and continental scales4,5,6. pH has been implicated with driving richness of soil Archaea42,43 and is the most important driver of protist communities in the UK28. However, pH only plays a marginal role in shaping soil protist communities globally11. Likewise, pH is a poor predictor of global fungal biogeography, yet is a good predictor of ectomycorrhizal fungal richness9, which may contribute to the Agaricomycetes OTUs observed in the present study. Nevertheless, it is important to acknowledge the inconsistent nature of correlations between microbial biodiversity and pH, potentially due to variations in soil properties occurring at scales that do not align with large-scale soil surveys30.

We also observed a strong effect of C:N ratio in determining richness of microbes and β-diversity of all organismal groups, as has been observed in bacterial27 and protistan28 β-diversity across Britain and some fungi globally9. Yet C:N ratio is often co-correlated with other soil properties including bulk density, total C, organic matter, elevation, and mean annual precipitation. Disentangling such related variables is difficult; despite using PLS analyses46 we could not disentangle co-correlated soil properties. For example, AVCs such as moorland grass-mosaic and heath/bog generally had higher elevation, mean annual precipitation, C:N ratio, and both total C and N (Supplementary Table 12) owing to their less-disturbed, upland location, and often peat-rich soils. Higher C:N ratios are indicative of lower-quality soils47 and have historically been associated with a shift in microbial biomass from bacterial to fungal dominance48. Our results suggest that, with the exception of archaea, microbial richness is equally susceptible to the effect of soil quality degradation. According to our results, archaea, on the contrary, appear to be well adapted to habitats with lower nutrient quality.

We observed strong relationships between soil properties and microbial, but not animal richness. We suspect this is due to the direct effects of soil properties on microbes. For example, shifts in pH towards either a more alkaline or acidic condition inhibit the ability of most non-specialised bacteria to uptake nutrients from their environment26. In addition the quality of soil nutrients, as discussed previously, was likely a strong determinant of available nutrient resources and therefore total richness of microbes. We also found strong relationships between soil properties and β-diversity and across all organismal groups. These relationships between Bray–Curtis dissimilarities and soil properties demonstrate that more dissimilar belowground communities correlate positively with indicators of better quality soils across the breadth of soil biota (Supplementary Table 6). However, associations between nutrient quality and animal community composition are likely the result of nutrients influencing the composition of the aboveground plant community49 rather than direct interactions with animals. Furthermore, animals are more vagile than microbes and can actively seek out microhabitats with better resources50, limiting the direct impact of soil properties on animal richness.

Using an extensive soil sampling programme and metabarcoding, we present perhaps the most comprehensive assessment of the belowground diversity in Europe. Despite uncertainties on the ability of environmental DNA methods using small soil volumes to accurately characterise communities of larger organisms51, we were still able to detect key differences in larger organisms (i.e. animals) across land uses. Our results highlight the complexity of belowground ecology by demonstrating a divergence of patterns of richness between soil fauna and microorganisms at a national-level. We show that microbial richness is strongly influenced by soil properties in a near-uniform manner, whereas animal richness is not. Rather, animal richness is likely driven by changes in aboveground communities that stem from intensive land use management, while microbial richness was affected by soil properties in addition to land use. A particularly interesting outcome of our analyses is the near-uniform trend of declining microbial richness along a gradient of decreasing land use productivity/management intensity. The data therefore suggest that soil properties strongly affect bacteria, fungi, and protists in a similar manner, whereby richness decreases with soil quality; whereas archaea showed an opposing trend with increasing richness as productivity declined. The richness of animal OTUs, on the contrary, was not affected by soil properties although β-diversity was. Although often considered as ecological ‘black boxes’, soils continue to provide unique and coherent insights into the differences between interconnected microbial and macrobial assemblages. Our findings also highlight the importance of the dynamics between biotic and abiotic processes that drive the organisation of belowground biological diversity.



Soil samples were collected between late spring and early autumn in 2013 and 2014 as part of GMEP (Supplementary Note 2), established to monitor the Welsh Government’s agri-environment scheme, Glastir. The scheme covered an area of 3263 km2 with 4911 landowners31. Briefly, surveyors collected samples from randomly selected 1 km2 squares with up to 3 locations within squares, following protocols established by the UK Countryside Survey52. As described previously, habitat within plots was classified using plant species assessments into one of seven AVCs32: crops/weeds (n = 9), fertile grassland (n = 98), infertile grassland (n = 162), lowland wood (n = 17), upland wood (n = 44), moorland-grass mosaic (n = 54), and heath/bog (n = 52) (Supplementary Note 1; Supplementary Table 1). Soil type was derived from the National Soil Map53 (Supplementary Note 3; Supplementary Table 13). Organic matter content was classified by loss-on-ignition (LOI) following the protocols of the 2007 Countryside Survey51.

A total of 436 cores were collected from 1 km2 squares, with up to 3 samples coming from an individual square based on a randomised sampling design. Cores were transported to the Centre for Ecology and Hydrology, Bangor, UK, and stored at −80 °C until DNA extraction. Soil physical and chemical properties were taken from 4 cm diameter by 15 cm deep cores co-located with the high-throughput sequencing cores. These included total C (%), N (%), P (mg kg−1), organic matter (% LOI), pH (measured in 0.01 M CaCl2), mean soil water repellency (median water drop penetration time in seconds), bulk density (g cm3 −1), volume of rocks (cm3), soil bound water (g water g dry soil−1), volumetric water content (m3 m3 −1), as well as clay and sand content (%) of soil. Abundances of mesofauna collected as part of GMEP were taken from George et al.23 and geographic data including grid eastings, northings, and elevation were also included in our analyses. For complete details on chemical analyses, see Emmett et al.51. Temperature (°C) and mean annual precipitation (mL) were extracted from the Climate Hydrology and Ecology research Support System dataset54. Mean values for each variable were recorded for each AVC (Supplementary Table 12) and soil properties were normalised where appropriate.

Soil texture data were measured by laser granulometry with a LS320 13 analyser (Beckman-Coulter). We subsampled approximately 0.5 g of soil taken from 15 cm cores by manual quartering and removed organic C using H2O2 and then transferred the sample into 250 mL bottles, added 5 mL of 5% Calgon® and shook overnight at 240 rpm. Bottles were emptied manually into the laser diffraction instrument for measuring particle size distribution. Full Mie theory was used to obtain a particle size distribution from the raw measurement data, with the real refractive index set to 1.55 and the absorption coefficient at 0.1 as in Özer et al.55. The cut-off points for clay, silt, and sand were 2.2, 63, and 2000 μm, respectively. Clay and sand percentages were selected for subsequent analyses and normalised using Aitchison’s log-ratio transformation.

DNA extraction

Soils were homogenised by passing through a sterilised 2 mm stainless steel sieve. Sieves were sterilised between samples by rinsing under the tap water using high flow, applying Vircon laboratory disinfectant and UV-treating each side for 5 min DNA was extracted by mechanical lysis and the homogenisation step performed in triplicate from 0.25 g of soil per sample using a PowerLyzer PowerSoil DNA Isolation Kit (MO-BIO). Pre-treatment with 750 μL of 1 M CaCO3 following Sagova-Mareckova et al.56 was performed as it was shown to improve PCR performances, especially for acidic soils. Extracted DNA was stored at −20 °C until amplicon library preparation began. To check for contamination in sieves 3 negative control DNA extractions were completed and an additional 2 negative control kit extractions were performed using the same technique but without the CaCO3 solution.

Primer selection and PCR protocols for library preparation

Amplicon libraries were created using primers for rRNA marker genes, specifically for the V4 region of the 16S rDNA gene targeting bacteria and archaea (515F/806R)57, ITS1 targeting fungi (ITS5/5.8S_fungi)58, and the V4 region of the 18S rDNA gene (TAReuk454FWD1/TAReukREV3)59 targeting a wide range of, but not all, eukaryotic organisms. We used a two-step PCR following protocols devised in conjunction with the Liverpool Centre for Genome Research. Amplification of amplicon libraries was run in triplicate on DNA Engine Tetrad® 2 Peltier Thermal Cycler (BIO-RAD Laboratories) and thermocycling parameters for each PCR started with 98 °C for 30 s and terminated with 72 °C for 10 min for final extension and held at 4 °C for a final 10 min For the 16S locus, first-round PCR amplification followed 10 cycles of 98 °C for 10 s; 50 °C for 30 s; 72 °C for 30 s. For ITS1, there were 15 cycles of 98 °C for 10 s; 58 °C for 30 s; 72 °C for 30 s. For 18S there were 15 cycles at 98 °C for 10 s; 50 °C for 30 s; 72 °C for 30 s. Twelve μL of each first-round PCR product was mixed with 0.1 μL of exonuclease I, 0.2 μL thermosensitive alkaline phosphatase, and 0.7 μL of water and cleaned in the thermocycler with a programme of 37 °C for 15 min and 74 °C for 15 min and held at 4 °C. Addition of Illumina Nextera XT 384-way indexing primers to the cleaned first-round PCR products were amplified following a single protocol which started with initial denaturation at 98 °C for 3 min; 15 cycles of 95 °C for 30 s; 55 °C for 30 s; 72 °C for 30 s; final extension at 72 °C for 5 min and held at 4 °C. Twenty-five μL of second-round PCR products were purified with an equal amount of AMPure XP beads (Beckman Coulter). Library preparation for 2013 samples was conducted at Bangor University. Illumina sequencing for both years and library preparation for 2014 samples were conducted at the Liverpool Centre for Genome Research.


Bioinformatics analyses were performed on the Supercomputing Wales cluster. A total of 130,219,260, 104,276,828, and 98,999,009 raw reads were recovered from the 16S, ITS1, and 18S sequences, respectively. Illumina adapters were trimmed from sequences using Cutadapt60 with 10% level mismatch for removal. Sequences were then de-multiplexed, filtered, quality-checked, and clustered using a combination of USEARCH v. 7.061 and VSEARCH v. 2.3.262. Open-reference clustering (97% sequence similarity) of OTUs was performed using VSEARCH; all other steps were conducted with USEARCH. Sequences with a maximum error greater than 1 and shorter than 200 bp were removed following the merging of forward and reverse reads for 16S and ITS1 sequences. A cut-off of 250 bp was used for 18S sequences, according to higher quality scores. There were 15,202,313 (16S), 7,242,508 (ITS1), and 9,163,754 (18S) cleaned reads left at the end of these steps. Sequences were sorted and those that only appeared once in the dataset were removed. Briefly, filtered sequences were matched first against a number of different reference databases: Greengenes 13.863, UNITE 7.264, and SILVA 12865 for 16S, ITS1, 18S, respectively. Ten percent of sequences that failed to match were clustered de novo and used as a new reference database for failed sequences. Sequences that failed to match with the de novo database were subsequently also clustered de novo. All clusters were collated and chimeras were removed using the uchime_ref command in VSEARCH.

Chimera-free clusters and taxonomy assignment were used to create an OTU table with QIIME v. 1.9.166 using RDP67 methodology with the GreenGenes database for 16S and UNITE database for ITS1 data. Taxonomy was assigned to the 18S OTU table using BLAST68 against the SILVA database and OTUs appearing only once or in only 1 sample were removed from each OTU table.

Newick trees were constructed for the 16S and 18S tables using 80% identity thresholds. The trees were combined with their respective OTU tables as part of analyses using the R package phyloseq69, removing OTUs that did not appear in both the tree and OTU table. OTUs identified as eukaryotes in the 16S OTU table, non-fungi OTUs in the ITS OTU table, as well as OTUs identified as fungi, plants, and non-soil animals were removed from the 18S OTU table. Read counts from each group were normalised using rarefaction. The OTU tables were rarefied 100 times using phyloseq69 (as justified by Weiss et al.70) and the resulting mean richness was calculated for each sample. The read depth used for rarefaction varied for each group (Supplementary Table 14). Samples with lower read counts than this cut-off were removed before rarefaction. A summary of number of replicates per AVC is included in Supplementary Table 1.

Statistical analyses

All statistical analyses were run using R v. 3.3.371 using the rarefied data sets for each organismal group. The vegan package72 was used to assess β-diversity via NMDS and CAP ordinations based on Bray–Curtis dissimilarities. A linear model for each environmental variable was fit separately to the ordination using the envfit function, the results are presented ranked according to goodness-of-fit. Results of goodness-of-fit for each variable from both ordination methods were compared using regression analyses to look for congruence. The values of all variables were plotted against NMDS scores to determine if there were positive or negative relationships with each NMDS axis. Differences in β-diversity amongst AVCs were calculated with PERMANOVA. The assumption of homogeneity of dispersion was verified using the betadisper function.

Linear mixed models were constructed using package nlme73 to test the differences in α-diversity amongst AVCs for each organismal group. Model selection was performed using AVC, soil type, LOI classification, and sample year as fixed factors; sample square identity was the random factor. To determine the best possible model, predictors other than AVC were dropped to find the lowest AIC scores using the AICcmodavg package74. For each model, significant differences were assessed by ANOVA and pairwise differences were identified with Tukey’s post-hoc tests from the multcomp package75.

PLS regressions found in package pls76 were used to identify the most important environmental variables for richness. Such analysis is ideal for data where there are many more explanatory variables than sample numbers or where extreme multicollinearity is present46. As in Lallias et al.46, we used the variable importance in projection (VIP) approach77 to sort the original explanatory variables by order of importance; variables with VIP values > 1 were considered most important. Relationships between important variables and richness values for each group of organisms were investigated by linear regression. Richness was normalised before regression when necessary. Pearson’s correlation coefficient was used to directly compare richness of organismal groups.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.