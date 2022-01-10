Data collection and processing

Forest inventory data

The International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests) has been intensively monitoring several hundred permanent forested plots across Europe since 1995 or later [40]. Each intensive Level II monitoring plot is at least 0.25 ha, and most trees with at least 5 cm diameter at breast height (DBH) are identifiable by a unique number used to make periodic measurements [41]. DBH was measured using a caliper or measuring tape approximately every five years, a commonly used interval for estimating forest growth and yield. Tree species identity is reported for every measured tree, and each plot was classified by the dominant tree species (>50% abundance). We used data from plots dominated by Pinus sylvestris (Scots pine), Fagus sylvatica (European beech), Picea abies (Norway spruce), and Quercus robur and Q. petraea (pedunculate and sessile oak; hereafter: mixed oak).

We used periodic DBH observations to calculate a diameter increment growth rate for each tree. We removed dead trees, trees with DBH < 5 cm, and trees which shrank over the growth period which were occasionally included in the census. To investigate long-term drivers of tree growth and avoid potential short-term abnormalities, we used the first and last periodic growth measurement to calculate diameter growth increment. The entire period covered was 1999–2017, the mean initial census year was 2005, the mean final census year was 2008, and the mean growth interval was 5.5 years. Although there is some variation in the annual sampling of tree growth versus the fungal community at some sites, previous work has shown that year-to-year variation in fungal communities is low at regional to continental scales [42, 43]. Allometric equations for dominant tree species within our observed size and geographic range were used to calculate aboveground biomass using equations from the GlobeAllomeTree database [44]:

Pinus sylvestris12.91-2.8035*(DBH) + 0.3535*(DBH)2

Fagus sylvatica0.19465*(DBH)2.418775

Picea abies0.4626*(DBH)2.133

Quercus robur & Q. petraea0.23095*(DBH)2.27265

The percentage of tree mass which is carbon versus other elements was estimated using IPCC Good Practice Guidance for Land Use, Land-Use Change and Forestry and assumed to be 50% [45].

We also downloaded metadata for each plot (where available) within the ICP Forests database based on long-term in situ measurements using harmonized methods for N deposition (similar to ref. [46]), soil pH, and soil inorganic N concentrations (ammonium and nitrate). Because N deposition measurements were incomplete across the study plots, we used N deposition data from 2010–2015 from the European Monitoring and Evaluation Program (EMEP) at a 1-km spatial resolution, which was tightly correlated with the ICP Forest N deposition measurements (R2 = 0.5, p < 0.0001; similar to ref. [38]). Soil solution measurements for inorganic N were made on soil water fractions collected weekly, bi-weekly, or monthly except at sites where water for collection was too scarce and/or where snow and ice prevented winter sampling. Soil pH was measured in CaCl 2 using potentiometry. We calculated inorganic N and soil pH values obtained between 2010–2017 at a 0–25 cm depth. This falls within the same time-frame and soil depth as the EMF community sampling. Detailed description of analytical methods may be found in the ICP forest manual [47]. We also downloaded and used information on the average stand age class of each plot. Lastly, we downloaded 1-km spatial resolution mean annual temperature (MAT) and precipitation (MAP) data from WorldClim2 [48].

Fungal ITS data analysis

Full length ITS DNA sequences obtained from ectomycorrhizae by [39] were used for the EMF community analysis. Complete information on the sampling is described in the original publication and in ref. [49]. In short, ectomycorrhizae were characterized for 137 ICP Forest level II sites between 2006–2008 and 2013–2015. At each site, four soil cores (25 cm deep, 2 cm diameter) were collected from underneath 24 tree-to-tree transects and sampled for 288 ectomycorrhizae per plot. A total of 87 plots had paired EMF community and forest inventory data and 71 plots had complete data for all variables included in the full statistical models.

After downloading the DNA sequence data from DRYAD, a fastq file was produced using Phred scores from the qual file using the Unipro UGENE software [50], and sequences were trimmed at a Phred score threshold of <20 and sequences <100 bp were removed using the Sequence quality trimmer function in UGENE. Of the 35,989 sequences, we retained 32,219 after quality control. We then used the usearch_global function in usearch (v11) [51] to match sequences at a 97% sequence similarity threshold against the full UNITE + INSD database (2018-11-18) [52].

Using the fungal genomic portal, MycoCosm [53], we assigned ITS sequences genomic traits related to the functioning of ectomycorrhizae. Similar methods have been developed for phylogenetic inference using prokaryote 16S surveys [54, 55] but were modified for ITS analysis in this study since the ITS region is less suitable for phylogenetic analyses across all fungi. We used the MycoCosm All-Fungi Species Tree (downloaded from [56] to determine whether there was phylogenetic signal to the genomics traits of interest, including numbers of enzyme nomenclature (EC) related protein sequences (hereafter: gene numbers) based on functions in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein family (PFAM) groups related N cycling (i.e., N permeases and ammonium sensing genes), proteolysis (i.e., peptidases, proteases), decomposition of N-bearing compounds (i.e., oxidases and multicopper oxidases), and fungal cell wall biosynthesis (i.e., chitin and glucan biosynthesis). We also summed peptidase, protease, and decomposition of N bearing compounds to study total organic N cycling genes given the overall importance of ectomycorrhizal organic N acquisition in forest ecosystems [16]. Many KEGG functions are directly relevant to the exchanges occurring between host plants and EMF, including energy and nutrient, nitrogen and amino acid, and carbohydrate metabolism and various anabolic and catabolic pathways. We also considered the total number of gene models per genome as a proxy for metabolic activity [57] and specifically how microorganisms grow and metabolize carbon.

The phylogenetic tree was pruned to only include species from genera in our dataset which restricted the analysis to species from 101 fungal genera. We tested for phylogenetic signal of gene numbers using the phylosig function in the phytools package [58] using 10,000 simulations and after setting method = “K” and method = “lambda” to calculate Bloomberg’s K and Pagel’s lambda, respectively (see Table S1). After screening for phylogenetic signal, we first assigned functional genes to direct species matches, accounting for 50% of the matched OTUs, and then to species and genera without direct reference genome matches when there was significant phylogenetic signal. Where there was phylogenetic signal, but not a direct species match, we assigned the average genus-level gene number from all species within a genus in MycoCosm to an OTU. OTUs were not included in this analysis if they were not assigned a genus-level taxonomy. See Fig. S1 for a decision tree outlining this approach. To account for differences in genome size which could lead to spurious correlations, we standardized for genome size by calculating the proportion of genes representative of each function relative to genome size using the total number of genes, an approach similar to ref. [56]. Proportional gene numbers were then weighted based on relative abundance in the OTU table to calculate community weighted mean (CWM) trait values. Trait values were weighted based on relative abundance using the base weighted.mean R function. Of the 101 genera identified in the dataset, 54 KEGG and 47 PFAM assignments were made; of the 1022 OTUs in the dataset, 512 KEGG and 455 PFAM assignments were made, and of the total sequences, 46% were assigned KEGG and 25% PFAM annotations. Half (258) of the assigned OTUs were exact reference species matches.

It is important to address that inferring potential microbiome functions from DNA metabarcoding studies is very common [55, 59,60,61,62] but has been debated in the literature [63]. Criticism has focused on 16S analyses inferring functional profiles from environmental samples when there is poor overlap between observed taxa and those with reference genomes [63]. Yet in our study, we largely avoided this issue by focusing on root-associated EMF—50% of the species and genera identified in our dataset have a direct species or genus-level match to reference genomes in MycoCosm (see above). This method is also uniquely informative. We could not use metagenomics since high bacterial ribosomal copy numbers largely prevents fungal analyses [64], and alternative DNA-based methods are laborious and cost-prohibitive [65]. Thus, cautiously assigning functional potentials to EMF on a study-by-study basis using reference genomes may be a viable technique as long as there is high overlap between observed and reference taxa in MycoCosm.

Data analysis

Fungal community analyses

All statistical analyses were conducted in R (v3.6.1) [66], and significance was set to p ≤ 0.05. We calculated beta diversity, fungal richness (# of OTUs), community diversity (Shannon index), and CWM functional gene values. First, we randomly rarified the dataset to the lowest number of observations (115 DNA sequences per plot) using the rrarefy function in the vegan package [67]. This is a robust sequencing depth for EMF Sanger sequencing [68,69,70], and has previously been shown to correspond well with high-throughput EMF DNA sequencing techniques [71, 72]. We then calculated relative OTU abundances, produced a Bray–Curtis dissimilarity matrix using the vegdist function (vegan), and represented EMF composition using principal coordinates analysis (PCoA) via the pcoa function in the ape package [73]. Fungal richness and diversity (Shannon index) were calculated using the specnumber and diversity functions in vegan, respectively. The relationship between fungal community composition, fungal functional potentials inferred using CWM gene numbers, and tree growth was assessed using distance-based redundancy analysis (db-RDA).

Using the OTU-based Bray–Curtis dissimilarity matrix (computed for all 137 sites), we performed hierarchical clustering using the base hclust function in R. The optimal number of clusters was evaluated using the elbow method [74]. We then used analysis of means and the ANOM function in the ANOM package [75] to identify clusters from sites with greater and lower tree growth rates than the overall mean. This resulted in a fairly balanced number of sites between slow and fast growth communities in the needle- (n slow = 14 vs. n fast = 12) and broadleaf (n slow = 14 vs. n fast = 19) sites, respectively. EMF cluster was then used as a discriminatory factor for indicator species analysis performed using the multipatt function in the indicspecies package [76].

Tree growth models using generalized additive models

Generalized additive models (GAMs) were used to predict tree growth (kg C yr−1) rates at the plot level using the gam function in the mgcv package [77]. We used statistically independent fixed effects (r2 < 0.5) including tree density, forest stand age class, a binary categorical factor for needleleaf and broadleaf tree types, MAT, MAP, N deposition, soil pH, and inorganic N concentrations (soil ammonium + nitrate concentrations). We fit smoothing functions using penalized regression splines to reduce over-fitting to predictors with non-linear correlations to tree-growth, including stand age and stand density (Fig. S2). Spline fits were assessed using the plot.gam function, and smoothness selection optimization and basis dimensions were determined using the gam.check function. We used restricted maximum likelihood methods for smoothing parameter estimation. Separate models were created for each fungal parameter without smoothing functions. Models were inspected for normal distribution of the residuals, residual versus fitted plots, and issues of multicollinearity among predictors based on variance inflation factors. Growth was natural log transformed to satisfy the assumption of homoscedasticity.

We also estimated plot level tree growth (Mg C ha−1 y−1) rates as opposed to individual tree growth rates (kg C y−1) to explore differences at the stand level between forests classified as part of the slow- versus fast-tree growth associated EMF community types. Since periodic DBH measurements are not made on every tree in the plots (i.e., only a subset of trees in level II ICP Forest plots are measured for growth) we could not simply compute the sum biomass C gain of all trees. We therefore randomly sampled with replacement trees which are periodically measured until reaching the in situ stem density of each plot 1000 times. From this distribution, we summed the biomass-C gain across all trees. Following the same procedure as above, we then modeled plot level tree growth using fungal community composition as a predictor variable. Significant differences between sites classified as part of the slow- versus fast-tree growth associated EMF community types were evaluated using heteroscedastic t-tests.

For visualization, we calculated model partial residuals with respect to the fungal predictor [78], and then added the effect of all other predictors at their mean values, so that data could be interpreted on their original scale. Mathematically, this can be expressed as:

$$y\_i = f(x\_i {\,}^{\wedge} 1) + f(x^{-} {\,}^{\wedge} (2\backslash - n)) + \varepsilon$$

Where \(y\_i\) is the vector of partial residuals on the original scale of the data, \(x\_i \wedge 1\) is the vector of observed values with which partial residuals are calculated relative to, \(x - \wedge (2\backslash - n)\) are the remaining model covariates at their mean values, and \(\varepsilon\) is the vector of fitted model residuals. \(f(x)\) represents the fitted functional forms of how each independent variable affects the dependent variable output by the generalized additive model.