Field site and sampling of substrates

This study was conducted on the Wade Tract Preserve (30°45′ N; 84°00′ W). This preserve contains 85 ha of pine savanna/woodland located on moderately dissected terrain 25–50 m above sea level ~ 80 km north of the Gulf of Mexico in Thomas County, Georgia, USA (Supplementary Fig. S1, Supplement 1). Over the past couple of centuries, frequent fires32, coupled with no logging at this site, have maintained an open physiognomy with patches of old-growth overstory pines and diverse herbaceous-dominated ground layer vegetation (see photographs23,34,35). Fire return intervals of 1–2 years are possible because substantial rainfall (averaging ~ 1350 mm), and a 10–11 month growing season results in rapid post-fire regrowth of herbaceous ground layer plants that, coupled with abundant pine needles, generate sufficient fuel for burning of ground layer vegetation within a year28,36. These fires are of low intensity and do not burn the whole landscape, but leave many areas unburnt, thus patches may burn that were not burnt in previous fires, while burnt patches may be left unburnt. This generates the patchy nature of these frequent, low intensity fires. Further details of the Wade Tract Preserve, larger study site, and fire regimes are presented in Supplement 1 and in Semenova-Nelsen et al.10.

This study was established following two prescribed fires in the spring of 2014. Shortly after the fires, patches of the landscape that burnt and did not burn could be readily identified in the field by distinguishing ground layer vegetation and litter that had been burnt (i.e., charred), from vegetation and litter that remained intact and unaltered. For long-term study, unburnt patches of ground layer vegetation > 5m2 in size in a 50 ha upland pine savanna plot were established in 197827,28. In 2014, these unburnt patches comprised 5–10% of the total 50 ha mapped plot. 30 naturally unburnt patches were then randomly selected from those mapped, 15 near and 15 away from pines, comprising pine proximity near and away, respectively. These unburnt patches were left naturally unburnt by the prescribed fire, without human intervention, and were selected subject to the constraint that they had burnt naturally at least once within the two prior years. Next, 30 burnt patches were randomly selected so as to pair (location within 5-10 m) each burnt patch with a nearby unburnt patch that had similar overstory pine proximity conditions. These burnt patches had naturally burnt in each of the past three years. Subsequently, one 1 × 1 m centrally located plot was installed inside each patch so that no plot edge was a border of the patch. A total of 60 total plots were used in this study (Supplementary Fig. S1, Supplement 1). More complete descriptions of plot selection are provided in Supplement 1.

Plots were sampled in mid-July following fires. Each vascular plant species present was recorded in each plot. Plant nomenclature followed Weakley37. Voucher specimens of each plant species included in the study were collected outside plots and deposited in the herbarium at Tall Timbers Research Station. Collection of plant material complied with institutional, national, and international guidelines and legislation. Specimens are in the Florida State University herbarium (https://herbarium.bio.fsu.edu/) under barcodes TTRS_000010437-TTRS_000010579. PC ORD 638 was used to test plant community composition patterns related to burn and pine proximity treatments. Concurrently, three 9 × 9 cm areas were randomly sampled in each of the 60 plots, located so as not to require destruction of above-ground vegetation. In each area, two samples were collected, one of surface litter and one of surface soils directly below the litter to a depth of 1.5 cm. This is the depth to which temperature increases during fires are greatest5,8. All litter collection avoided recently fallen, post-fire material. The three samples for a given sample type (litter and soil) were pooled in separate sterile plastic bags. All sampling equipment was sterilized with 10% bleach and 90% isopropyl alcohol between plots to avoid cross-contamination. Samples were kept in a cooler with freezer packs, frozen at -20 °C within 4 h, and shipped overnight to the University of Kansas, where they were stored at -80 °C until laboratory analysis.

Laboratory procedures

Each sample was thawed and thoroughly homogenized within the sealed collection bag. Following homogenization, approximately a 100 g subsample was used for chemical analyses, and a 2 g subsample was taken for soil or litter molecular analysis. Soil physical and chemical properties were assayed for each plot. The methods of analysis of soil samples and results of these analyses are presented in Supplement 2.

DNA was extracted and then amplified. 0.25 g of DNA was extracted from each 2 g subsample. MoBio PowerSoil Kits (MoBio, Carlsbad, USA) were used, and extracted DNA was quantified using Qubit 2.0 (LifeTechnologies, Carlsbad, USA). The V4 hyper-variable region of 16S rDNA was amplified from 5 ng of template DNA using the standard Earth Microbiome Primers 515F and 806R39 and Q5 polymerase (New England Biosystems, Ipswich, USA). PCR protocol consisted of an initial denaturing at 98 °C for 30 s; then 25 cycles of denaturing at 98 °C for 10 s, annealing at 55 °C for 30 s, and extension at 72 °C for 30 s; followed by a final extension at 72 °C for 5 min. PCR products were cleaned using Agencourt AMPure XP magnetic beads (Beckman Coulter, Indianapolis, USA). A second PCR was then used to ligate unique Nexterra indices (Illumina, San Diego, USA) to each sample. Cycling conditions for the indexing PCR were similar to the first PCR, except for only running for 8 cycles, and then purified as before. Individual libraries were pooled at equimolar concentrations into a single, 4 mM library; concentration and amplicon size was verified using a TapeStation 2200 (Agilent, Santa Clara, USA). Amplicon sequences were then generated using a 301 bp, paired-end run on an Illumina Mi-seq (Illumina, San Diego, USA) at the Kansas State Integrated Genomics Center. Sequencing data included a negative water-only library control (from 1st PCR step above) and 120 samples, generated from soil and litter in the 60 plots. Paired reads were concatenated, and Nexterra indices were used as barcodes for demultiplexing the sample reads from the 120 sequenced samples. One sample failed and was excluded from this study, to make a total of 119 samples plus one negative control.

Raw barcode reads were demultiplexed and assigned to exact sequence variants (ESVs). Reads were compared to the 119 possible barcode combinations and accounted for base pair read error to increase demultiplexing accuracy and identification rate. A maximum likelihood approach was used that assigned a test statistic from the geometric means of matched and unmatched barcode reads, weighted by Phred scores Z Renaud 40. Z Renaud accounts for greater number of N reads than Qiime and down-weights matches with high error, therefore recovering more samples than other approaches. To determine a Z Renaud threshold for removal, 500 clearly denoted samples were identified that were missing 2–3 base pairs and were of similar barcodes with very high error probabilities. For this dataset, Z Renaud > 6 maintained strong selectivity; so samples with a score of Z Renaud ≤ 6 were removed. DADA2 was used for processing demultiplexed sequences following the big data script for paired end reads41. A truncation score of 2, maximum allowed errors of 3 and 5, and truncation lengths of 200 and 151 nucleotides for forward and reverse reads, respectively, were chosen.

ESVs were processed to further remove erroneous sequences. Since PCR amplification can result in erroneous reads in the order of 1:103, rare ESVs were removed by using a less conservative threshold of 1:105 for removal than often applied (1:104)42 because type-I errors at the ESV level were not relevant to this study, and because erroneous reads could have been present in more common ESVs as well as more rare ESVs. Loss of ecological signal was minimized by avoiding arbitrary removal of correct reads. This threshold required an ESV to have a minimum total read count of ~ 70 over all samples to be part of the final dataset.

ESVs were then assigned taxonomy based on two known databases to maximize identification breadth and increase certainty. Identifications were compared using SILVA v13243 and RDP v11.544 databases, which map well together45. These databases use the same species identification, but they use different systematics for higher taxa (family to class); SILVA was used for higher-order classification due to its greater size. Taxonomies were assigned hierarchically from kingdom to genus via a bootstrapping method that compared ESVs to database sequences where each lower level had increased chance of greater assignment error. All levels were output regardless of database match error and identified genera were compared. If genera agreed between both databases, that assignment and all higher-order SILVA taxonomies were used. Where genera did not agree, SILVA’s 80% threshold was used for identification to assign the lowest level included in analyses. This allowed substantially more classifications than SILVA alone.

Identified taxonomic groups were finally standardized into final datasets. ESVs were aggregated and standardized by taxonomic level (genus and higher) to examine relationships among identified taxonomic groups rather than ESVs directly. DeSeq2 was used to standardize consistent mean–variance relationships for each taxon among samples with assumed negative binomial distributions46. For overdispersed taxa, total reads were adjusted towards the mean to fit the overall mean relationship. Variation was removed in the substrate-fire treatment and then normalized within substrate-fire interaction to account for systematic read error between samples. To normalize reads to the mean, all taxa within each sample was divided by the geometric mean of positive counts. A value of 1 for a taxon meant that, in that sample, that taxon’s reads were exactly the mean; values X higher or lower than 1 denote that reads for those taxa are X times the mean of that sample. Thus, within a sample, these standardized data were approximately a 1:1 transformation of abundances, and presented relative abundances as multiplicative factors of taxon reads. Across samples, however, relative abundances of taxa were not 1:1 and could result in different results and model behavior. Total reads were estimates of genus-level total ESVs abundance per sample. Analysis used R package DeSeq2 v1.2446.

Data metrics and analyses

Community-level analyses were conducted to relate data on bacterial taxa to experimental field treatments. Partial constrained principal coordinate analyses, hereafter PCoA47, was used. Analysis was conditioned on spatial variation effects, by using pairing of burnt/unburnt plots and certain soil properties, especially soil type as described in Supplement 2. Analysis was constrained on treatment effects of fire status (burnt/unburnt), substrate (litter/soil), pine proximity (near/away from overstory pines), and the interactive effects of these three experimental conditions. The dataset was first transformed via Hellinger transformation. Multiple PCoAs were then generated and compared on each taxonomic level to identify the taxonomic level that provided the greatest resolution of treatment effects. All PCoAs used Euclidian distances of the bacterial ESV response matrix. The family-level PCoA had the most proportion of variance explained by the combined linear combination and conditional effects, so family was selected as the taxonomic unit to measure relationships with experimental treatments. This family-level PCoA performed dimension reduction and presented two dimensions that represented the larger multidimensional site-taxa and explanatory variable relationships. Scores on the PCoA were then used to identify influential families, based on having absolute values of scores greater than the mean of the distance from the origin to interaction centroids on the PCoA. Influential families were families that strongly associated with the categorical treatment effects, and showed large differences in relative abundances across treatments, which resulted in larger scores on the PCoA, and thus identification as an influential family. Differences of influential family relative abundances, within each family, among treatments, were determined with the first generalized linear mixed model (GLMM). This first GLMM’s response variable was influential family relative abundances, and explanatory variables were the triple interaction of fire*substrate*taxa. This GLMM fit the triple interaction to a Tweedie distribution, a mixture of Poisson and Gamma distributions that accounts for large numbers of zeroes in count or continuous data and that can model data that have many zeros, but are otherwise Poisson distributed48. This first GLMM analyzed families simultaneously with a within-taxon standardization to the mean relative abundance of that family across treatments. This allowed inferences of relative abundance patterns for treatment versus treatment for any individual family, but did not allow comparisons of family versus family relative abundance patterns for any treatment. Due to this standardization, the families considered as part of this analysis occurred in all treatments at least once, but did not necessarily occur in all plots. A heatmap was then generated to visualize the influential family relative abundances tested in this first GLMM. Then, overall abundance and frequency of occurrence between influential and non-influential families were plotted to understand the relative impacts of influential families in the larger ecosystem. Further details of procedures used for PCoA analyses and the first GLMM for influential families are presented in Supplement 3. Further details of analysis of taxonomic levels and determination of family level use are provided in Supplement 4.

Beta diversity metrics were assigned on the family level over all plots in each treatment. Diversity metrics were assigned independently for each litter and soil sample in each plot. For each sample, differences in taxon richness and evenness were calculated among fire and substrate treatments using a set of additional GLMMs, one for each family-level diversity metric and treatment. Response variables were the beta diversity metrics, and explanatory variables were the main effects and interactions among fire treatments, substrate, and pine proximity, with all models including paired plots as random effects to match the experimental design randomly pairing burnt and unburnt patches. The GLMMs for richness were modeled as Poisson distributed with log link function. The GLMMs for evenness were modeled with a beta distribution and logit link because values of evenness were on the interval (0,1), but not binomial. Models were fit with mgcv 1.8 in R 3.549,50.

PCoA results were also used in tests of beta diversity among and within treatment groups. Multivariate homogeneity of dispersion was tested via PERMANOVA to identify similarities among the groups by main effects and interaction of substrate*fire. Pine proximity was excluded as a variable due to its lack of effect in the set of GLMMs for family-level diversity metrics. As in the PCoA, the family matrix was first Hellinger transformed, then analyses used Euclidean distances.