Introduction

The interactions between plants and soil organisms can have important ramifications for ecosystem functioning and plant community dynamics, but the extent to which these interactions influence the spatial distributions of soil communities remains poorly understood. Knowing how plants control the spatial variation in belowground communities is important for building a predictive understanding of the heterogeneity in soil communities and contributing to pre-existing research that has identified how certain site and abiotic soil properties can influence the spatial variation in soil communities across large geographic scales [1,2,3,4]. Further, this information will aid our ability to probe the undescribed and likely diverse ways in which soil organisms interact with plants since comparatively few plant–microbe interactions are well understood [5].

Certain soil organisms are known to form close associations with particular plant species [6, 7]. Mycorrhizal relationships, for instance, involve a direct exchange of nutrients between plants and symbiotic soil fungi, and these relationships can influence plant–soil diversity linkages [8, 9]. Indirect mechanisms, such as the release of root exudates and microbial attraction to those exudates, can also drive associations between specific microbes and plant species [10]. However, these described interactions are likely only a small fraction of the numerous interactions among plants and soil organisms in a given ecosystem. Thus, it is uncertain whether the composition of soil communities as a whole is associated with plant community attributes under field conditions.

It has long been known that individual plant species can exert a powerful influence on soil microbial communities [11,12,13], and there is evidence that divergence in soil bacterial and fungal communities is broadly linked to plant community composition at landscape [14, 15] and global scales [16]. Additionally, correlational analyses have revealed associations between individual plant species and soil fungal [17], bacterial [18], nematode [19], and arthropod [20] communities. However, it is unclear whether these relationships are driven by shared environmental preferences or by the direct effects of locally dominant plant species on soil communities. While plant invasions can elicit shifts in soil community structure [21, 22], the effects of plant species identity on the overall composition of belowground communities are often weak or difficult to quantify, with several studies having failed to identify strong links between changes in plant assemblages and corresponding changes in soil communities [17, 23,24,25,26]. As such, the existence of a general relationship between plants and soil communities remains uncertain and difficult to predict a priori.

There are multiple plant community attributes that could potentially be used to predict variation in soil communities. Plant species identity could be a strong predictor of variation in soil communities [11, 17, 19], as could evolutionary history (i.e. the phylogeny) of plants, given the potential for more closely related plants to be associated with more similar belowground communities [27]. Such patterns could arise as a product of coevolution between plants and soil microbes or if phylogenetic relatedness corresponds to other plant attributes that affect soil organisms [28]. It has also been proposed that plant functional traits could be used to predict plant–microbe associations a priori given that plant species’ distributions and community diversity are generally predictable based on their traits [29, 30], and soil communities can form associations with plants based on these traits [6]. Although previous studies have shown that plant traits can explain variation in soil microbial processes involved in C and N cycling [31,32,33,34,35], it remains unclear whether variation in soil community composition is directly caused by, or merely associated with, differences in plant traits. Further, past studies show that links between plant traits and the composition of soil communities are not always observed [27] and when they have been found, they are often based on crude assessments of microbial community composition, such as the relative abundance of fungi and bacteria [15, 35]. Likewise, most previous work has focused on the relationships between soil biota and aboveground plant traits, despite increasing evidence that root traits are likely to play a more important role in structuring belowground communities [36,37,38].

Here we provide the first in-depth evaluation of the predictive power of plant community attributes, alongside abiotic factors, for explaining spatial (i.e. horizontal) variation in soil communities at the individual plant and community scale. While previous work has investigated effects of plant species and community attributes on soil communities, we are not aware of any previous study that has comprehensively assessed these effects across such a wide range of functionally important belowground taxonomic groups. Specifically, we address the overarching question: Can plant community attributes (i.e. taxonomic composition, phylogenetic composition, and plant functional traits) be used to predict spatial variability in soil community composition? To address this question, we sampled soils from both monocultures of 21 common temperate grassland plant species spanning eight families and a range of life history strategies, and we sampled an adjacent field experiment where grassland community composition had been manipulated through plant species additions to create a gradient of plant species and plant functional diversity. We used DNA sequencing-based approaches to target soil fungal, bacterial, protistan, and metazoan (faunal) communities. We first assessed whether the identity, phylogenetic history, and/or functional traits of individual plant species (both leaf and root traits) could be used to explain variation in soil communities. Next, we determined whether observations made at the individual plant scale correspond to similar trends in mixed plant communities in the field.

Materials and methods

Mesocosm experiment

To evaluate effects of individual plant species, their phylogeny, and their functional traits on soil communities, mesocosms containing plants grown in monoculture were established in a fenced enclosure at Colt Park within the Ingleborough National Nature Reserve in England (54°11'38.7″N 2°20'54.4″W). Mesocosms were constructed from polypropylene pots (38 × 38 × 30 cm) filled with 10 cm of rinsed gravel and 20 cm sieved and homogenized top soil (pH ~5.8; 8.9 C%; 0.92 N%). Top soil was a brown earth sourced from the adjacent grassland, a mesotrophic temperate grassland under extensive agricultural management, which involved light grazing by sheep and cattle from autumn to spring, but no grazing during the growing season when an annual hay crop was taken, and an occasional light dressing of farmyard manure or mineral fertilizer (~25 kg ha−1 N) in early spring [39]. Twenty-one grassland plant species (Fig. 1) were germinated and grown in a greenhouse from commercial seed (Emorsgate Seeds, Norfolk, PE34 4RT, UK) or from seed collected at the site. Mesocosms were planted and arranged in a randomized block design with four blocks. Plants were actively weeded and harvested annually. Plant biomass and soil was collected in July, approximately two years following planting, during the height of the growing season and before seed filling. Eight to 20 leaves from at least three individuals per mesocosm were clipped and stored in sealed plastic bags at 4 °C prior to processing. A representative 6.8 cm diameter soil core was taken from the complete soil column of each mesocosm, and soil subsamples were frozen and shipped on dry ice to the University of Colorado for molecular soil community analysis. The remainder of the soil was immediately passed through a 4-mm sieve. All root material not passing through the sieve was retained and stored at 4 °C before being washed free of soil prior to processing for root trait measurements.

Fig. 1
figure 1

The effects of plant species identity on the composition of soil communities from mesocosms containing monocultures. Boxplots represent pairwise Bray–Curtis dissimilarities in community composition between vs. within soils from the same plant species (a). Hierarchical clustering diagrams based on mean dissimilarities across the plant species (b). Bipartite network diagram, where edges (lines) connect plant species (green circles) to fungal taxa (red points) that occurred in the same mesocosm (c). The composition of cosmopolitan soil taxa (those taxa associated with all plant species), intermediate (taxa associated with only 2 to 20 plant species), and specialized (taxa that associate with only a single plant species) (d). The composition of functional groups of fungal taxa identified as being cosmopolitan, intermediate, and specialized across plant species (e)

Field plots design and sampling

Experimental field plots were established 2 km from the mesocosm enclosure at Selside Shaw, within the Ingleborough National Nature Reserve. The plots were established in 2012, in a mesotrophic grassland with similar management, vegetation and soil to the meadow at Colt Park. The soil was characterized as a clayey brown earth soil with 60% clay, <1% silt, 39% sand, 5.7 ± 0.4 pH (mean ± standard deviation), 4.9 ± 1.4% C, and 0.46 ± 0.13% N. Native grassland species were added to the existing plant communities in 6 m × 6 m field plots with the aim of creating a gradient of plant communities of increasing functional diversity and complexity. Over two years the plots were seeded (2014–2015) and planted with seedlings (2013–2015) of species belonging to one of three plant functional groups, namely the grasses (Cynosurus cristatus, Dactylis glomerata, Festuca rubra, Poa trivialis, and Briza media), forbs (Achillea millefolium, Geranium sylvaticum, Geum rivale, Leucanthemum vulgare, Plantago lanceolata, Prunella vulgaris, Hypochaeris radicata, Leontodon hispidus, Filipendula ulmaria, and Centaurea nigra), and legumes (Lathyrus pratensis, Lotus corniculatus, Trifolium pretense, and Trifolium repens) or their respective two- and three-way combinations. These species are typical of species-rich mesotrohic meadow communities (UK National Vegetation Classification MG3b; [40]), the target plant community for biodiversity [41]. Together with unmodified control communities, this created a total of eight plant community treatments with five replicates of each arranged in a randomized design (n = 40 plots). Details on species added, seedling densities, and sowing rates across all treatments are given in Table S1. We note that most, but not all, of the species contained in the mesocosms were represented in the field plots.

We sampled vegetation and soil from four of the eight treatments (control, forb addition, legume addition, and grass–forb–legume addition) in July 2015. To sample vegetation and soil, 30 cm diameter sampling rings were placed at representative locations within plots (n = 4 per plot with 5 plots per treatment; i.e. n = 20 per treatment), and aboveground plant biomass was harvested from within each sampling ring. One 6.8 cm × 10 cm soil core was collected from within the center of each sampling ring and processed identically to the mesocosm soil samples. Root material was processed as above for use in the root-based assessment of plant community composition.

Soil community composition

Fungal, bacterial, protistan, and metazoan communities were assessed in soil samples following molecular marker gene sequencing protocols as described in Prober et al. [16] and Ramirez et al. [42]. Briefly, DNA was extracted from each sample, and ribosomal marker genes were amplified using PCR with barcoded primers unique to each sample. We used the ITS1F/ITS2 and the 515 f/926r primer pairs for fungi and bacteria, respectively, and the 1391f/EukBr primer set for protists and metazoa. Amplicon pools were sequenced on an Illumina MiSeq instrument using 2 × 251 bp sequencing kits at the BioFrontiers sequencing facility at the University of Colorado. Appropriate controls were used throughout the laboratory process to ensure there were no contaminants. Raw sequence data are available at figshare.com using the following digital object identifiers https://doi.org/10.6084/m9.figshare.4879940, https://doi.org/10.6084/m9.figshare.4879889, https://doi.org/10.6084/m9.figshare.4879943.

Raw sequences were processed using the DADA2 pipeline [43], which is designed to resolve exact biological sequences from Illumina sequence data and does not involve sequence clustering. Raw sequences were first demultiplexed by comparing index reads to a key, and paired sequences were trimmed to uniform lengths. Sequences were then dereplicated, and the unique sequence pairs were denoised using the ‘dada’ function with ‘err = NULL’ and ‘selfConsist = TRUE’. Potential primers and adapters were then screened and removed using a custom script (https://github.com/leffj/dada2helper). Next, paired-end sequences were merged and chimeras were removed. Taxonomy assignments were determined using the RDP classifier trained on the UNITE [44], Greengenes [45], or PR2 databases [46] for fungi, bacteria and protists and metazoa, respectively. Zygomycota classifications were changed to Mucoromycota as per Spatafora et al. [47]. 16S rRNA gene sequences identified as chloroplasts, mitochondria, or Archaea were removed. To account for differences in sequencing depths, samples were rarefied to 5300, 1300, 2400, and 1250 sequences per sample for fungi, bacteria, protists, and metazoa, respectively. Putative fungal functional groups were identified using FUNGuild [48].

Plant community composition

Plant community composition in the field plot samples was assessed in four ways: (1) by sorting the aboveground biomass to species and measuring the biomass (dry weight) of each species, (2) by molecular analysis of the aboveground biomass, (3) by molecular analysis of the roots contained in the soil cores, and (4) by molecular analysis of DNA extracted from the soil samples. For visual inspection, harvested aboveground biomass was identified the same day as collection, and tissue from each species was dried and weighed. For molecular assessments, aboveground and root biomass samples were freeze-dried, ground, and homogenized prior to DNA extraction. We prepared DNA for sequencing following a protocol similar to Kartzinel et al. [49]. We identified the genus-level plant community composition by targeting both the P6 loop of the trnL gene and the rRNA internal transcribed spacer (ITS) region. We extracted DNA using the PowerSoil DNA Isolation Kit (Mo Bio Laboratories, Inc., Carlsbad, CA, USA), and soil samples were diluted 1:10 prior to amplification. The primer set trnL(UAA)c/trnL(UAA) with included Illumina sequencing adapters was used to amplify the trnL-P6 marker following a PCR protocol of: denaturing at 94 °C for 2 min followed by 36 cycles of 94 °C for 1 min, 55 °C for 30 s, and 72 °C for 30 s, with a 5-min final extension at 72 °C. To amplify the ITS region, we used the forward primer, ITS1-F, and included two reverse primers, ITS1Ast-R and ITS1Poa-R [49], to specifically target Asteraceae and Poaceae species. All primers included appropriate Illumina adapters, and PCR reactions were carried out as for trnL amplification. Each PCR was done in duplicate and the amplification product was combined. All products for each sample were combined in equal volumes and cleaned using the UltraClean PCR Clean-Up Kit (Mo Bio Laboratories, Inc.). Illumina Nextera barcodes were added to the amplicons using an 8-cycle PCR, amplicons were cleaned and pooled using the SequalPrep kit (Invitrogen, Carlsbad, CA, USA), and sequenced on an Illumina MiSeq instrument with a 2 × 151 bp kit at the University of Colorado BioFrontiers sequencing facility.

We processed raw plant sequences in a similar manner as for soil community sequences described above. We used the DADA2 pipeline [43] to trim forward and reverse paired reads to 145 and 130 bp, respectively. Following the denoising step, Illumina adapters were removed, paired, end reads were merged, and chimeras were filtered. We assigned taxonomy to each sequence using BLAST searches against the GenBank NR database. Sequences were assigned taxonomy only if ≥80% of the sequence aligned to a reference sequence and they matched the reference sequence with ≥95% identity. If a sequence had multiple best matches to reference sequences, a common genus and/or family name was assigned if one existed. Otherwise, sequences were assigned as ‘unknown’. Taxonomy assignments were manually checked and verified in reference to species known to exist at the site. Separate taxa tables were created based on trnL amplicons and each of the Asteraceae and Poaceae ITS amplicons. Samples with fewer than 550, 1000, and 100 sequences were removed from taxa tables based on trnL, Asteraceae ITS, and Poaceae ITS amplicons, respectively. We calculated the relative abundance of individual plant genera in each sample using the trnL sequence counts. Because the trnL gene yields limited taxonomic resolution for the Asteraceae and Poaceae, we replaced the total relative abundances of taxa (mostly unknown genera) within these two families with normalized relative abundances of genera determined using the ITS sequence data. Raw sequence data are available at figshare.com using the https://doi.org/10.6084/m9.figshare.4880060.

Plant traits

All leaf and root traits were measured using standard protocols [50]. Briefly, we measured specific leaf area, specific root length, leaf dry matter content, and root dry matter content by weighing and scanning the fresh leaf and root samples. The samples were then oven dried at 60 °C for 48 h and their dry weights measured. The scanned digital images were analyzed in WinRhizo (Reagent Instruments Inc., Ville de Québec, QC, Canada) to determine leaf areas, root lengths, and root diameters. Shoot and root N and C contents from the mesocosm-grown plants and the field sample plant communities were measured on an Elementar Vario elemental analyzer (Langenselbold, Germany). In both cases, plant material was freeze-dried and thoroughly homogenized prior to measurement.

Soil characteristics

Soil characteristics were measured as in Orwin et al. [35]. pH was measured using a ratio of 1 g fresh soil: 2.5 ml dH2O. Dissolved inorganic N, individual ions (NO3-N, NH4-N), and net N mineralization were assessed using 1 M KCl extracts, and dissolved organic N was assessed using water extracts as in Bardgett et al. [51]. Total soluble N was determined following oxidation of these extracts using potassium persulphate [51]. Extracted mineral fractions were quantified using standard spectrophotometric protocols on a AA3 segmented flow analyser (SEAL Analytical Inc., Mequon, WI, USA). Total C and N of dried and ground subsamples were measured using an Elementar Vario EL elemental analyzer.

Statistical analyses

All statistical analyses were performed in R [52] using specific packages where noted, and the package ‘mctoolsr’ (http://leffj.github.io/mctoolsr/) was used to facilitate data manipulation and analyses. To represent differences in community composition, we calculated Bray–Curtis dissimilarities using square-root transformed relative abundances. Permutational analysis of variance (PERMANOVA), as implemented in the ‘adonis’ function from the ‘vegan’ package, was used to test for differences in soil community composition across factors. To test for differences in soil community composition across mesocosm plant species, we used PERMANOVA and included block identity as a random factor in the model. Network analysis plots were created using the ‘igraph’ package with multidimensional scaling to distribute points. Soil taxa were considered present if their mean relative abundance was ≥0.1%, and only taxa with a relative abundance >0.5% that associated with ≥1 plant species are shown. We identified particular soil taxa that associated with specific plant species using indicator analyses [53]. ‘Cosmopolitan’ soil taxa were defined as those taxa associated with all plant species (i.e. had a mean relative abundance ≥0.1% across replicates for each species), ‘intermediate’ as taxa associated with only 2 to 20 plant species, and ‘specialized’ as taxa that associated with only a single plant species.

To test the relationship between the composition of soil communities and plant species relatedness in the mesocosms, we used the phylogeny from Durka and Michalski [54]. Relationships between difference in soil community composition and plant phylogenetic distances were evaluated using Mantel tests with Spearman correlations. We tested for a phylogenetic signal in the relative abundance of individual protist taxa using the phylosig function in the ‘phytools’ package, where the statistic, K, represents the strength of the signal [55]. We calculated multivariate dissimilarities in trait values by normalizing and standardizing individual trait values and calculating Euclidian distances. We tested the relationship between Euclidian trait distances and community composition dissimilarities using Mantel tests.

For the field samples, we calculated differences in the phylogenetic structure of plant communities (i.e. phylogenetic dissimilarity) using UniFrac [56] as implemented in the package, ‘picante’. We used the plant phylogenetic tree as reported in Durka and Michalski [54], and plants not identified to the genus level were removed. We assessed the relationship between phylogenetic dissimilarity and the Bray–Curtis dissimilarities in soil community composition using Mantel tests with Spearman correlations.

To assess whether differences in plant community composition predicted variation in soil community composition beyond the explanatory power of soil characteristics, we built models of soil community composition dissimilarity using multiple regression on distance matrices (MRM) as implemented in the ‘ecodist’ package and compared the explanatory power of the model with and without the addition of plant community dissimilarity as a predictor variable. In these models, each soil variable was transformed using log or inverse transformations where necessary to approximate a normal distribution, and they were standardized prior to calculating Euclidian distances. MRM was implemented with rank (i.e. Spearman) correlations, and the “best” models containing only soil variables were derived by first including all soil variables and using backwards elimination until all predictors explained significant levels of variation in the response dissimilarities.

Results and discussion

The effect of plant species identity on soil communities

Overall, the mesocosm soils contained expectedly diverse communities (Fig. S1A). Soil fungal communities were primarily composed of Ascomycota [43% of ITS sequence reads, on average], Basidiomycota (31%), and Mucoromycota (21%); bacterial communities were primarily composed of Acidobacteria (31% of 16S rRNA gene reads, on average), Proteobacteria (20%), and Verrucomicrobia (16%); protistan communities were primarily composed of Rhizaria (26%), Amoebozoa (25%), Alveolata (22%), and Stramenopiles (16%); and metazoan communities were primarily composed of Nematoda (33%), Arthropoda (28%), and Annelida (15%; Fig. S1B). The structure of these communities was similar to those found in other temperate grasslands [1, 57, 58].

Plant species identity explained differences in the overall composition of soil fungal (R2 = 0.33; P < 0.001), bacterial (R2 = 0.27; P = 0.02), protistan (R2 = 0.32; P < 0.001), and metazoan (R2 = 0.31; P < 0.001) communities (Fig. 1a). Further, these plant species effects were driven by differences among multiple plant species rather than one or a small number of plant species associating with distinct belowground communities (Fig. 1b, Fig. S2). Certain fungal, protistan, and metazoan taxa tended to be strongly associated with individual plant species, while others tended to have more general associations (Fig. 1c, Fig. S3). For example, the fungal taxa identified as Olpidium brassicae and Phoma sp. associated with Achillea millefolium, while several Ascomycota, Basidiomycota, and Mucoromycota taxa were associated with all plant species (Fig. S4). We used an indicator analysis approach to identify those taxonomic groups that were most strongly associated with each of the individual plant species and found that many of the plant species formed specific associations (Fig. S4). Since there are likely to be different traits associated with more specialized versus more cosmopolitan soil taxa [59], we investigated whether soil taxa unique to individual plant species tended to represent different taxonomic groups when compared to taxa that were more ubiquitous across plant species. Cosmopolitan taxa were represented by a higher proportion of Mucoromycota, Acidobacteria, Rhizaria, and Nematoda, while more specialized taxa were represented by a greater proportion of Glomeromycota, Planctomycetes, Alveolata, and Rotifera (Fig. 1d). Additionally, cosmopolitan fungal taxa represented a greater proportion of putative saprotrophs compared to more specialized taxa, which had a greater proportion of pathogens and mutualists (Fig. 1e). This suggests that, in temperate grasslands, pathogens and mutualists tend to be more strongly limited to individual plant species, while saprotrophs are more cosmopolitan and less influenced by plant species identity. This finding is in concordance with a previous study conducted in an Amazon rainforest showing stronger plant–soil linkages for pathogenic and mycorrhizal fungi compared to saprotrophs [60].

Can the effect of plant species identity be explained by plant phylogeny or functional traits?

We next sought to assess whether plant species identity effects could be explained by plant phylogeny or leaf and root functional traits, two attributes that could potentially be used to predict plant associations with belowground communities a priori. The mesocosm plant species represented eight families including Poaceae, Asteraceae, and Fabaceae, providing an opportunity to evaluate the influence of a wide-ranging phylogeny on the composition of soil communities. Plant phylogenetic distances were not significantly related to differences in fungal, bacterial, or metazoan community composition (P > 0.1 in all cases; Fig. 2a). Differences in protistan community composition were related to plant phylogenetic distance, but this relationship was relatively weak (rho = 0.29, P = 0.002; Fig. 2a). Nonetheless, the relative abundance of Stramenopiles was significantly related to plant species phylogeny (K = 0.51, P = 0.004; Fig. S5). We might expect plant phylogenetic differences to be associated with the structure of belowground communities due to coevolution with mutualists or pathogens [28, 61]; however, this did not appear to be the case for most soil taxonomic groups. Further, the general lack of a relationship between plant phylogeny and belowground communities found in our study is consistent with studies of plant–soil feedbacks, which likewise have shown no relation to plant phylogeny [62].

Fig. 2
figure 2

Relationships between plant species’ relatedness and differences in the composition of soil communities. Panel a shows a plant phylogenetic tree with species names colored by family (key shown in Fig. 1) with the corresponding heatmap showing the dissimilarities in the composition of each soil community. Colors represent the first principal coordinate analysis axis calculated from Bray–Curtis dissimilarities (a). The relationship between differences in the composition of soil communities and plant trait distances (b). Euclidean trait distances were calculated using all the traits shown in panel c. The relationship between differences in the composition of soil communities and individual plant traits (c). Points represent Spearman correlation coefficients (rho) and Mantel test results (P-value)

The measured leaf and root traits were highly variable across the mesocosm species. Grassland plants vary in their ecological strategies. Exploitative species grow fast under high nutrient conditions and have characteristically high specific leaf areas and N contents while conservative species are selected to survive under lower nutrient conditions and have opposite traits [63, 64]. For each plant species in the mesocosms, we measured the plant traits that are known to be indicative of the tradeoffs in these life history strategies (Fig. S6A, Table S2). For example, the Fabaceae species tended to have a greater shoot and root N and C content, while Poaceae species tended to have high leaf dry matter contents (Fig. S6B). Yet, there were no strong or significant relationships (i.e. Bonferroni corrected P < 0.05) between belowground community composition and individual leaf or root traits (Fig. 2c). Furthermore, multivariate dissimilarity in leaf and root traits of plant species was not predictive of differences in communities of any of the soil taxonomic groups (P > 0.1 in all cases; Fig. 2b).

These results suggest that the plant traits we measured are not effective indicators of the specific relationships plants form with belowground communities. Previous studies have detected relationships between plant traits and coarse measures of microbial community composition [15, 35] or specific microbial groups, such as ammonia oxidizers [37]. However, our findings are in line with other studies. For example, Porazinska et al. [25] found that certain soil communities were linked to individual plant species in a prairie grassland, but they were unable to identify traits that could predict soil communities. Likewise, Barberán et al. [65] demonstrated that plant species identity is more predictive of soil communities than plant traits. Nonetheless, it is possible that the plant–soil organism associations we observed could have been driven by unmeasured plant traits given that certain plant characteristics must explain the species identity effects we observed. For example, variations in the quantity and quality of root exudates can influence soil community composition [66]. Likewise, leaf litter chemistry has been shown to be related to coarse measures of soil microbial community composition in a manner broadly consistent with the leaf economic spectrum [35]. Also, while we did not observe relationships between plant traits and the overall composition of soil communities, it is possible that specific soil organisms do respond to plant traits, including those taxa directly involved with N cycling [34, 36, 37]. Other potential reasons exist for our failure to detect strong associations between soil communities and plant traits or phylogeny. First, it is possible that if the experiment had a longer duration, additional effects on soil communities would become evident, and these effects would more strongly correspond to differences in plant traits and/or phylogeny. Second, soil can contain DNA from cells that are no longer viable [67], and this ‘relic’ DNA could obscure ecological relationships among organisms.

Are soil communities in the field predictable based on plant community attributes?

The results from the mesocosm study demonstrated that plant species identity is a more important determinant of soil community composition than plant phylogeny or plant traits. Given this, we would hypothesize that knowledge of the species composition of mixed plant communities in the field should be an effective predictor of soil communities. We tested this hypothesis by analyzing plant and soil samples from a series of experimental plots established at a grassland site close to the mesocosm experiment, where grassland community composition had been manipulated for three years to create a gradient of plant species composition and diversity. Plant community composition was assessed using marker gene sequencing of plant DNA extracted from dried and ground representative samples of plant biomass collected immediately above each soil sample, and this molecular approach was verified for efficacy by comparing it to visual assessments of aboveground biomass (Fig. S7).

Differences in the composition of each soil taxonomic group were related to differences in plant community composition (P < 0.05 in all cases). By comparing the compositions of the plant communities across experimental plots (using the first principal coordinate score based on aboveground assessments), we could identify specific plant genera that drove variation in soil community composition across the samples (Fig. 3a, Table S3). For instance, some samples had comparatively high relative abundances of Lolium spp. while other samples had high relative abundances of Agrostis spp. These differences in plant community composition were related to the relative abundance of certain groups of soil taxa, including the Ascomycota, Mucoromycota, Acidobacteria, Amoebozoa, Stramenopiles, and Arthropoda (Fig. 3a). These specific associations between plant and soil taxa can ultimately be used to predict the composition of soil communities from plant species abundances. For example, our results suggest that plant communities dominated by Agrostis spp. are likely to have greater relative abundances of Ascomycota and lower relative abundances of Acidobacteria in the soils in which they grow.

Fig. 3
figure 3

Soil community composition is related to plant community composition in the field. Variation in plant community composition across the field samples ordered by the first principal coordinate score (i.e. the x-axis represents a gradient of plant community compositions where communities further apart are more dissimilar), and relationships between soil taxonomic group relative abundance and the plant first principal coordinate score (a). Linear trend lines were only plotted for groups that had a Pearson correlation P ≤ 0.05. Relationship strength between dissimilarities in soil communities and dissimilarities in plant communities (*P < 0.05, **P < 0.01, ***P = 0.001; Mantel tests; b). Pairwise Bray–Curtis dissimilarities in plant community composition, as assessed using aboveground tissue, are not related to dissimilarities in plant community composition as assessed using root tissue, but they are related to dissimilarities in plant community composition as assessed using plant DNA in soil (c)

We also evaluated whether the phylogenetic structure or community-aggregated plant traits [15, 32] could explain relationships between plants and soil communities. We did this by testing whether plant communities containing genera with more similar phylogenetic histories or trait values were associated with more similar soil communities. However, plant community phylogenetic structure was not significantly related to the composition of any of the soil taxonomic groups (P > 0.3 in all cases), suggesting that phylogenetic relatedness is not predictive of soil community composition. This finding is in agreement with the monoculture mesocosm study described above and a field study conducted in a tropical rainforest that failed to find a strong effect of tree species phylogenetic relationships on soil communities [27]. Furthermore, differences in community-aggregated trait values, including leaf and root N and C content, also did not significantly relate to the composition of any of the soil taxonomic groups (P > 0.1 in all cases). The trait values we measured were not predictive of soil community composition in mixed grassland communities, results that are consistent with those from the mesocosm experiment of individual plant species.

In addition to assessing relationships between the composition of soil taxonomic groups and plant communities based on aboveground biomass, we evaluated plant community composition in two other ways: using root DNA and plant DNA in soil. We used these approaches because roots of different species are intermingled and difficult to identify visually, and assessing plant communities via soil DNA provides an alternate approach to determine which plant species have occupied a given location currently or in the past [68]. Roots might also might be more strongly associated with soil community structure than aboveground tissue [35]. As with the aboveground plant biomass-based analysis, differences in the compositions of each of the soil taxonomic groups were related to differences in plant community composition assessed using the plant DNA extracted from soil (P < 0.05 in all cases). However, the differences in the composition of soil communities were not significantly related to differences in plant community composition assessed using root DNA (P > 0.1 in all cases; Fig. 3b). It is possible that the composition of plant communities as assessed via roots were unrelated to soil communities because much of the root biomass consisted of dormant plants or dead tissue [69]. Further, it is possible that root distributions are so variable over time that they obscure plant species effects on belowground communities.

Differences in aboveground plant community composition were unrelated to differences in root community composition (P = 0.11), but they were related to differences in the plant community composition as assessed using plant DNA in soil (rho = 0.2; P < 0.001; Fig. 3c). This shows that shoot and root biomass in a given location do not represent the same plant community, as also found in a tropical rainforest [27]. Additionally, these results suggest that plant DNA in soil can be used as a proxy for the community composition of the aboveground biomass [68]. This has implications for future research since it is often logistically easier to obtain a representative sample of surface soils rather than sampling and homogenizing aboveground plant biomass.

Are the associations between plant and soil communities driven by soil characteristics?

We aimed to assess whether relationships between soil communities and plant communities in the field plots were attributable to the direct effects of the plants, shared environmental drivers, or intermediary effects of the plants on soil properties. Therefore, we evaluated whether plant community composition contributed additional explanatory power to the observed variation in soil community composition given differences in edaphic characteristics. Shifts in the composition of soil communities across the field plots were significantly correlated with multiple, individual edaphic properties (Table S4), and combinations of these properties explained 13–29% of the variation in soil community composition (P = 0.001 in all cases; Fig. S8A). For example, soil N content and pH were typically predictive of the composition of the four taxonomic soil groups. Only differences in fungal community composition could be predicted more accurately when information on aboveground plant community composition was added to the models containing only soil characteristics as predictor variables (P = 0.01; Fig. S8). When soil DNA-based plant community composition information was used instead of aboveground plant community composition, fungal, bacterial, and protistan community composition could all be predicted more accurately with the addition of information on plant community composition (R2 increased 9–24%; P < 0.02 in all cases; Fig. S8). These results suggest that shifts in aboveground community composition likely influence soil communities in ways not accounted for in commonly measured soil properties, and indicate that the structure of complex soil communities in grasslands is controlled by a combination of plant and soil characteristics [11, 70].

Conclusions

We demonstrate that plant community composition is an effective predictor of the structure of complex grassland soil communities, especially when combined with information on soil abiotic properties. Furthermore, we show that plant community composition is particularly effective for predicting distributions of certain groups of soil organisms, such as fungal symbionts and pathogens. Importantly, we found that plant species identity, rather than plant phylogeny or functional traits, was the best predictor of soil community composition at both the individual plant and community scale. This is significant because it raises questions about the effectiveness of phylogenetic and trait-based approaches for explaining spatial variation in soil community composition at a local scale. Such approaches are increasingly being used to predict how changes in plant community composition impact soil properties and functions [38, 71], but our findings indicate that, at a local scale in temperate grasslands, they are ineffective for explaining variation in soil communities. Finally, it is important to note that much of the variation in soil community composition could not be explained by the measured soil characteristics or plant community attributes, highlighting the difficulty of predicting complex soil communities in situ and the need to build a mechanistic understanding of which specific plant attributes are responsible for driving plant species effects on the biodiversity of soil. Combined, our findings provide new evidence that associations between specific plant species and complex soil communities, associations that are not explained by plant phylogeny or commonly measured plant traits, act as key determinants of spatial patterns of biodiversity in grassland soils.