Introduction

Arbuscular mycorrhizal (AM) fungi form an obligate symbiosis with 80% of plant species worldwide, including the majority of agricultural crops such as maize, wheat, rice and soybean (Smith and Read, 2008). There exist far fewer AM fungi worldwide (~300 morphospecies) than plant species (Smith and Read, 2008), and the majority of AM fungi are host generalists that occur on multiple continents (Davison et al., 2015). Despite this superficial tractability, understanding their community assembly has proven difficult. First, intraspecific variability in traits such as plant growth stimulation and phosphate uptake can be large (Munkvold et al., 2004; Koch et al., 2017), limiting the value of species-based community analysis to understand functioning. Second, assembly processes are highly scale dependent in that fundamentally different processes (for example, interspecific interactions, dispersal limitation, environmental filtering) operate at different spatial scales (Dumbrell et al., 2010; Johnson et al., 2010; Peay et al., 2016; Vályi et al., 2016). Third, commonly used rRNA markers to identify AM fungi have a coarse resolution, which could swamp ecological signal (for example, Bruns and Taylor, 2016). Either of these difficulties, technical or ecological in nature, may result in failure to find existing ecological patterns or to generalize findings from one study to another. This may strongly limit our understanding of how AM fungal communities assemble, function and respond to their environment.

Recent analysis of the human microbiome has shown that some bacterial communities associated with humans (for example, gut and oral inhabitants) exhibit universal ‘dynamics’ (Bashan et al., 2016). That is, interactions among microbes themselves, and their environment, are not fundamentally altered between subjects (Faust and Raes, 2016). The method by which this was assessed was ‘dissimilarity–overlap curve’ (DOC) analysis where communities that have high overlap of taxa should become increasingly similar if these taxa interact in a similar way across subjects (see Figure 1). DOC analysis holds promise to test whether a census of a community can inform a reasonable prediction of a second community, which implies that there are regularities in their community assembly. If this holds true for AM fungal communities across large spatial scales, it would suggest that taxa within AM fungal communities interact in a similar manner and that this pattern is discernable at the coarse genetic resolution obtained through parallel sequencing.

Figure 1
figure 1

Conceptual representation of possible relationships between community overlap and distance. Bars represent a community in which each different color represents a different OTU. In each instance, two communities are compared that vary from low (a, c) to high (b, d) overlap and low (c, d) and high (a, b) dissimilarity of the fraction of the community that overlaps. A combination of situation (a) and (d) will lead to a negative DOC, as here an increase of community overlap is associated with a decreasing dissimilarity of these communities. A potential explanation for such a pattern is that in situation (a) the relative abundances of overlapping OTUs is as much shaped by the non-overlapping OTUs as by those that overlap, leading to a large dissimilarity of the overlapping taxa. In situation (d), a large fraction of the community has overlapping taxa leading to a convergence of relative abundance of taxa if they interact similarly, resulting in low dissimilarity.

Materials and methods

Sample selection and characteristics

We collected data sets representing AM fungal communities of a relatively large number of independent sites each. Studies were selected based on the following criteria: (1) a portion of the small subunit rRNA gene of AM fungi was targeted and sequenced using high-throughput methods. (2) Given the focus on larger scales, studies had to represent a minimum of 20 sites that preferably spanned a large distance, that is, not be part of one contiguous landscape. (3) Sites should represent the same ecosystem or vegetation, that is, grassland, or forest, or grow the same crop species; furthermore, when multiple studies represented the same geographical unit (for example, country) preference was given to the data set having the largest sequencing depth and/or sites. Communities were only compared within a study to control for differences in sampling, DNA isolation procedure, primers used and potentially other confounding lab-specific factors. Studies were selected using prior knowledge of recent publications and through searching Web of Science and Google Scholar.

Using this approach, six data sets were collected. The first two represent a sampling in northern China of 50 paired seminatural grasslands and maize fields, where in each of the fields 15 soil cores were taken and pooled (Xiang et al., 2014). The third data set is a census of 38 seminatural grassland sites across Europe (Van Geel et al. in preparation; see Ceulemans et al., 2014 for a description of sites). Each site was further subdivided into three individual plots, in each of which three soil samples were analyzed, but for the main analysis this data was aggregated by grassland site (see below for further analysis). The fourth data set is from a tropical rainforest in Panama (Sheldrake et al., 2017a, 2017b) where soil from a total of 39 plots had been sampled that had been subject to various (litter manipulation and fertilization) treatments. Although this data set is at a smaller scale than the others (Table 1), we have chosen to include it because tropical rainforests represent a primary AM fungal ecosystem type and other tropical data sets are not available. The fifth is a data set of wheat fields across Canada (Bainard et al., 2015) and consisted of a pooled sample of 30 cores each. From this data set, we only included the 40 sites that contained at least 250 AM fungal sequencing reads. The sixth data set consisted of 24 apple orchards (Van Geel et al., 2015) in which roots of between three and five tree individuals were analyzed for DNA, which were here aggregated per orchard. Each of the data sets span environmental gradients, such as soil type and humidity (see Xiang et al., 2014; Ceulemans et al., 2014; Van Geel et al., 2015; Bainard et al., 2015; Sheldrake et al., 2017b). Data sets with unequal read number per sample were rarefied (using the rrarefy function in vegan (Oksanen et al., 2013)) to the minimum number of reads per sample within that data set (see Table 1 for further descriptives).

Table 1 Descriptives of the different data sets used in this study

Analysis of DOCs

DOC analysis was performed on each of the operational taxonomic unit (OTU) tables using the MATLAB code provided in Bashan et al. (2016); first, for each sample pair the community overlap and dissimilarity (using the metric root Jensen–Shannon distance) were calculated. Then the relationship between the two was assessed using two complementary statistics: (1) a P-value, calculated as the fraction of bootstraps (200) that is non-negative, using a linear mixed-effect model based on the 50% data points with the largest overlap (because at lower overlap, no relationship is expected). This calculation accounts for non-independence of distance pairs (as n samples yield (n2n)/2 distances). (2) With a sliding window of five data points, a change point was calculated from which point onwards community overlap is negatively related to similarity using the LOWESS method; the fraction of data points beyond this change is reported as the ‘fraction with negative slope (Fns)’. As a confirmation that overlap and dissimilarity were truly independent, the same statistics were calculated for each OTU table where OTU abundances were reshuffled across samples in which these OTUs occurred, which should not display a negative DOC. We additionally tested whether a negative slope depended on scale, that is, whether comparisons between samples within a site and between different (distant) sites both exhibited negative slopes between overlap and dissimilarity. This was carried out by splitting the distance matrices for individual samples (each rarefied to 200 reads) of the data set for which these different levels were available, the European grasslands, into within-site comparisons and between-site comparisons. Then we calculated the slope using linear regression for each individual site in R (R Core Team, 2014). Good’s coverage of OTU tables was calculated using the ‘entropart’ package (Marcon and Hérault, 2015).

Results

We found that communities from Chinese grasslands (Fns (=fraction of data points that are within an interval with a negative slope)=0.67; P<0.001), European grasslands (Fns=0.28; P<0.001) and Panamanian tropical rainforest (Fns=0.94; P<0.001) showed a clear negative DOC (Figure 2), while communities from agricultural ecosystems, that is, Chinese maize fields (Fns=0.43; P=0.3), Canadian wheat fields (Fns=0.003, P=1.0) and Belgian apple orchards (Fns=0.11; P=0.5) showed no negative slope (Figure 2). The same analysis on randomized communities did not show a negative DOC in any of the data sets (Supplementary Figure S1), confirming that the calculations and statistics employed do not lead to a negative slope per se due to mathematical constraints. The further analysis of European grasslands comparing samples within a site and between different sites yielded predominantly negative slopes, indicating that the negative DOCs in grasslands were similar when the communities were close or far apart (Figure 3a).

Figure 2
figure 2

Relationships between community overlap and dissimilarity for the six data sets. Blue lines represent smoothed averages and 95% confidence intervals; vertical green lines indicate the level of community overlap beyond which a negative slope is observed. Each data point represents a community pair. Dissimilarity is represented by the root Jensen–Shannon distance.

Figure 3
figure 3

Comparisons of overlap and dissimilarities for the European grasslands and Chinese plots where (a) data are split between within-site comparisons and between-site comparisons in the European grasslands. Each plot is represented by a point. The y axis represents slopes of a linear ordinary least square regression of Dissimilarity~Overlap, where only pairs with an overlap >0.5 are taken into account. In (b), community overlap and dissimilarity (calculated as in Figure 2) are plotted for all pairs of one grassland and one crop site, thus only assessing between-land-use comparisons. The line is fitted using a LOWESS regression.

Discussion

Here we report that AM fungal communities exhibit a downward curve between community overlap and dissimilarity for each of the three natural ecosystems. These curves indicate a preserved relationship between taxa, getting more pronounced as communities share more of them in terms of fractional abundance (Figure 2). This preservation strongly suggests that knowing the taxa that make up an AM fungal community is informative to predict the relative abundance of the same and other taxa within communities for different sites. Although we know that AM fungal communities are shaped to a considerable extent by soil conditions such as pH and soil type (Jansa et al., 2014), nutrient availability (Camenzind et al., 2014) and vegetation type (Rodriguez-Echeverria et al., 2017), it is common to find a strong element of unpredictability in their community composition (see, for example, Powell and Bennett, 2015). Our results suggest that some of this unpredictabilty may be caused by in themselves predictable interactions between taxa.

In the comparison of DOC slopes of within-site with between-site AM fungal community pairs, we find that negative slopes predominate in both sets, suggesting that AM fungal communities are structured by similar processes at both scales. These results are somewhat surprising: we would expect species interactions at local scales to leave a more pronounced and consistent signal, because there is no dispersal limitation and the environment is more similar, and thus more likely to support OTUs that represent the same ecotypes (Vályi et al., 2016). That the negative DOC also found for between-site comparisons (Figures 2 and 3a) strongly suggests that drivers of AM fungal community compositions are indeed more constant in outcome than has been proposed (Dumbrell et al., 2010).

This relationship was, however, not universal, as in each of the three agricultural data sets no significant negative DOC slopes are observed (Figure 2). When we plot Chinese community pairs of only grassland–cropland comparisons (so between land-use pairings), there is also no negative slope (Figure 3b), indicating that, in contrast to only grassland comparisons, communities do not converge when they share more AM fungal taxa, which suggests a difference in interactions of AM fungal taxa and of AM fungi and their environment between these land-use types. Even when a negative DOC is common in AM fungal communities, such a relationship will still be absent when communities are not in equilibrium, or when multiple stable states occur (Bashan et al., 2016). It is possible that agricultural practices such as tillage (Schnoor et al., 2011), generous application of fertilizers (Williams et al., 2017) and pesticides, or accessing the fields with heavy machinery imposes pressures that cause AM fungal communities to be transient or change to (multiple) alternative stable states. Under these conditions, then, the tendency of the communities to sort into a particular configuration of relative abundance would not be detected because they have not yet sorted (transience) or different configurations exist (multiple states).

For this analysis, we wanted to assess data sets that were similar in spatial extent and especially the number of individual plots sampled. We do want to point out some considerations that may contribute to variation in DOCs. Two agricultural data sets (Canadian crops and Belgian orchards) have relatively few pairs with high community overlap (see Figure 2). Although this does not seem to be related to differences in sampling intensity with other data sets (see Table 1 for coverage statistics and sequence counts), it may contribute to the absence of a significant downward trend of the DOC. Relatedly, the Belgian orchard data set is the only one based on root samples instead of soil samples. Differences between soil and root AM fungal communities are known to occur (for example, Hempel et al., 2007), and this could also be true for interactions between AM fungi that may affect their DOC. To be able to truly assess the dynamic of fungal taxa and its regularity between sites, future research should focus on measuring temporal development of AM fungal taxa abundances in a systematic and controlled manner.

Collectively, our findings suggest that DOC analysis provides an approach to estimate deviation from natural community behavior. Such an approach could be used to test whether AM fungi or other bio-inoculants introduced into natural or agricultural ecosystems disturb resident fungal communities (Rodriguez and Sanders, 2015; Bender et al., 2016; Hart et al., 2017), with broader implications for agriculture and ecosystem restoration.