A metagenomic survey of forest soil microbial communities more than a decade after timber harvesting

The scarcity of long-term data on soil microbial communities in the decades following timber harvesting limits current understanding of the ecological problems associated with maintaining the productivity of managed forests. The high complexity of soil communities and the heterogeneity of forest and soil necessitates a comprehensive approach to understand the role of microbial processes in managed forest ecosystems. Here, we describe a curated collection of well replicated, multi-faceted data from eighteen reforested sites in six different North American ecozones within the Long-term Soil Productivity (LTSP) Study, without detailed analysis of results or discussion. The experiments were designed to contrast microbial community composition and function among forest soils from harvested treatment plots with varying intensities of organic matter removal. The collection includes 724 bacterial (16S) and 658 fungal (ITS2) amplicon libraries, 133 shotgun metagenomic libraries as well as stable isotope probing amplicon libraries capturing the effects of harvesting on hemicellulolytic and cellulolytic populations. This collection serves as a foundation for the LTSP Study and other studies of the ecology of forest soil and forest disturbance.

The scarcity of long-term data on soil microbial communities in the decades following timber harvesting limits current understanding of the ecological problems associated with maintaining the productivity of managed forests. The high complexity of soil communities and the heterogeneity of forest and soil necessitates a comprehensive approach to understand the role of microbial processes in managed forest ecosystems. Here, we describe a curated collection of well replicated, multi-faceted data from eighteen reforested sites in six different North American ecozones within the Long-term Soil Productivity (LTSP) Study, without detailed analysis of results or discussion. The experiments were designed to contrast microbial community composition and function among forest soils from harvested treatment plots with varying intensities of organic matter removal. The collection includes 724 bacterial (16S) and 658 fungal (ITS2) amplicon libraries, 133 shotgun metagenomic libraries as well as stable isotope probing amplicon libraries capturing the effects of harvesting on hemicellulolytic and cellulolytic populations. This collection serves as a foundation for the LTSP Study and other studies of the ecology of forest soil and forest disturbance.

Background & Summary
The Long-term Soil Productivity Study (LTSP) was initiated in 1989 to track changes in soil quality and productivity of managed forests. Foresters lacked data on the impact of growing trends in intensified forest management, such as shorter crop cycles, greater biomass extraction, and use of highly mechanized equipment 1 . Partners in the LTSP Study have since collected longitudinal data from a range of North America's most-productive forests (>100 sites) to assess the long-term effects of soil compaction and organic matter (OM) removal. The objective is to develop indices for monitoring soil quality based in physical, chemical, and biological properties of soil. Each LTSP site has replicated experimental plots for harvested treatments with three intensities of OM removal as well as unharvested reference plots. OM removal intensity corresponded to common harvesting strategies, such as debranching in situ (OM1) or removal of both trunks and branches (OM2), or the less common, and extreme, practice of removing trunks, branches and top soil (OM3). In the ensuing years, these treatments produced a consistent gradient in soil properties according to harvesting intensity, such as decreasing amounts of total carbon and nitrogen as well as increased dryness, mean daily temperature and temperature fluctuation. While the latest LTSP studies show that the effects of varying degrees of OM removal appear minor on net primary productivity 2-4 , the metagenomic datasets presented here show consistent changes in bacterial and fungal populations that reflected harvesting intensity [5][6][7][8][9] . We collected data on soil microbial communities from all treatment plots at eighteen LTSP sites in six different ecozones between 2008 and 2014 when reforested stands were 11 to 17 years old ( Fig. 1; Table 1). To capture the extent of harvesting impacts throughout the soil profile, corresponding samples from organic and mineral layers were included in all datasets. The goal was to identify changes in community composition, diversity, and functional potential resulting from the intensity of OM removal. We surveyed soil microbial communities in treatment plots using amplicon sequencing (Table 2 (available online only)) of bacteria (Data Citation 1) and fungi (Data Citation 2), along with shotgun metagenomes from all treatment plots at a single site within each ecozone (Data Citation 3; Table 3 (available online only)). Shotgun metagenomes revealed impacts on the functional potential of communities, such as decomposers involved in carbon cycling 5 , which led to further targeted studies of hemicellulolytic 7 (Data Citations 4,5) and cellulolytic 8 populations (Data Citation 6) using 13 C-stable isotope probing (Table 4 (available online only)). Stable isotope probing (SIP) can be used to track populations that assimilate 13 C-label into their biomass by recovering and sequencing the resultant 'heavier' 13 C-enriched DNA 10 .
Here, we provide an overview of the data collection, without detailed analysis of results or discussion, to draw attention to its unparalleled comprehensive and multi-faceted view into forest soil microbial communities. The collection is the first high-throughput sequencing-based survey of LTSP sites, and, as such, provides base-line data for future LTSP investigations at an important, stage of forest regeneration just prior to canopy closure. Researchers can revisit this collection as our understanding of the ecology of forest soils advances. This will be important given the substantial proportion of unknown taxa in these collections affected by timber harvesting 9 . SIP data offers unique insights into the effects of timber harvesting on decomposer populations, including detailed information on uncultured taxa provided by the ten partial genomes recovered from SIP-cellulose shotgun metagenomes (Data Citation 6). The consistency in experimental design, sequencing methodology and sample sources ensures the value of this collection for on-going studies of forest soil microbial communities, in particular those pertaining to biogeography, soil strata and forest disturbance.

Experimental design
Soil samples were collected from reforested experimental plots within the Long-Term Soil Productivity (LTSP) Study from eighteen different sites across six conifer-dominated North American ecozones named after the predominant tree species ( Fig. 1; Table 1): IDF BC (interior Douglas-fir), SBS BC (subboreal spruce), PP CA (ponderosa pine), BS ON (black spruce), JP ON (jack pine) and LP TX (loblolly pine). Ecozones were chosen to exemplify a broad range of climates and regions in North America where forestry is a major industry. These differed by several factors, including soil type, mean annual temperature and precipitation, tree species and bulk soil chemistry, such as carbon and nitrogen content and pH (Table 1). Each ecozone contained three sites with four treatments: REF (or OM0), a neighbouring unharvested reference plot; and three harvested treatments: OM1, where only tree boles (stems) were removed and woody debris was left in place; OM2, where whole trees including branches were removed and; OM3, where whole trees were removed and the upper organic layer of forest floor scraped away (Fig. 2). Compaction was controlled and, in all cases, plots with minimum compaction ('C0') were sampled. Moderate (C1) and severe (C2) compaction treatments were included in 16S rRNA gene and ITS pyrotag libraries in SBS BC and IDF BC (Table 2 (available online only)). Similarly, additional amplicon libraries were prepared from soils in JP ON and LP TX which had been exposed to glyphosate (Table 2 (available online only)). Samples were collected from triplicate plots at each of the three sites in BS ON and JP ON , while at sites in the other four ecozones triplicate samples were collected from a single, larger plot. Each sample corresponds to a composite from between three to five sampling points (consistent within a given ecozone) in a plot which helped account for soil heterogeneity. Organic layer  Tables 2 and 3 (available online only)), samples were collected from five soil horizons: LFH and mineral horizons (Ahe, Ae, AB and Bt), distinguished using criteria from the Canadian System of Soil Classification. Samples were stored at 4°C during transport and until each sample was sieved through 2-mm mesh to remove roots, then stored at −80°C until DNA was extracted within three months of sampling. Soil samples used in metatranscriptomics were flash frozen in liquid nitrogen, transported on dry ice and subsequently stored at −80°C until DNA and RNA was extracted within 12 months of sampling.
Amplicon and shotgun metagenome sequencing DNA was extracted from field samples (0.5 g of soil) using the manufacturer's recommended protocol for the FastDNA Spin Kit for Soil (MPBio, Santa Ana, CA). Amplicon libraries were prepared for the 16S rRNA gene (V1-V3 regions) and fungal internal transcribed spacer region (ITS2) according to the procedure of Hartmann et al. 5 . The region spanning V1-V3 was amplified using barcoded universal primers: 27F (5′-AGA GTT TGA TCM TGG CTC AG-3′) and 519R (5′-GWA TTA CCG CGG CKG CTG-3′) 11,12 and the fungal internal transcribed spacer (ITS2) region was amplified using barcoded primers: ITS3 (5′-GCA TCG ATG AAG AAC GCA GC-3′) and ITS4 (5′-TCC TCC GCT TAT TGA TAT GC-3′) 13 . Amplicons were generated via polymerase chain reaction (PCR) in triplicate for each sample and pooled prior to purification and quantification. DNA quantitation was performed using Pico-Green fluorescent dye (ThermoFisher, MA, USA). Samples were sequenced using the Roche 454 Titanium platform at the McGill University and at the Genome Québec Innovation Centre with a maximum of 40 samples multiplexed on each quarter plate. Table 2 (available online only) contains information on all amplicon libraries created from field soil samples, including 16S rRNA gene amplicon (Data Citation 1) and ITS amplicon libraries (Data Citation 2). This includes a second set of samples from Skulow Lake with a narrower focus on five soil depths in only REF and OM3 (Table 2 (available online only)). These amplicon libraries were made from primers targeting the V6-V8 region of the 16S rRNA gene and were amplified according to Gies et al. 14 using barcoded universal primers: 926F (5′-CC TAT CCC CTG TGT GCC TTG GCA GTC TCA GAA ACT YAA AKG AAT TGR CGG-3′) and Whole shotgun metagenomes were generated for a single site in each of the six ecozones resulting in three replicates for each treatment and each soil horizon for each ecozone. At Skulow Lake (SBS BC ),

Stable isotope probing amplicon and shotgun metagenome sequencing
Microcosms were prepared by adding between 0.75 and 2.0 g of 2 mm sieved organic or mineral layer soil to 30-mL serum vials. Larger quantities of mineral soil were necessary because of lower microbial biomass, requiring two DNA extractions per mineral soil to obtain the necessary 5 μg of DNA. Moisture content was standardized to 60% w/v for mineral soil, due to lower absorptive capacity, and 125% w/v for organic soil and pre-incubating at 20°C for one week. Microcosms were then amended with either 1.0% w/w of 13 C-labeled hemicellulose (97 atom %; IsoLife; U-10509, Lot: 0901-0273) or custom prepared bacterial cellulose (99 atom % 13 C). Each soil sample was paired with a microcosm containing the same amount of unlabelled (~1.1 atom % 13 C) substrate. Separation between 13 C-enriched DNA and unlabelled DNA is never complete, in part, due to variation in GC content 15 . Thus, 13 C-libraries are always compared to identically prepared sequencing libraries from samples amended with unlabelled substrate. Following either 2-day (hemicellulose), 11-day (cellulose-organic layer) or 14-day (cellulose-mineral layer) incubations, soil was lyophilized and stored at −80°C until DNA was extracted as previously described. DNA extracts from replicates within each site were pooled in equal amounts and unlabelled controls were processed identically. Harvested treatment OM2 was not included in SIP-hemicellulose libraries.
High purity 13 C-labled cellulose was necessary to ensure that only organisms possessing the necessary metabolic capability could assimilate the 13 C. Bacterial cellulose was utilized due to irremediable impurity in plant-derived cellulose from IsoLife (58% glucose+4.4% lignin+remainder of sugars from hemicellulose). Bacterial cellulose was produced by cultivating Gluconacetobacter xylinus str. KCCM 10,100 with either 13 C-labeled glucose (99 atom % 13 C, Cambridge Isotope Laboratories, MA, USA), or unlabelled glucose, as sole carbon source in Yamanaka medium 16 . Cellulose pellicules were purified by boiling in sodium hydroxide per previously described methods 17 , with the addition of a second boiling step and an increase in boiling time to 4 h. While IsoLife hemicellulose was also impure (53% hemicellulose sugars), the sources of impurity were more recalcitrant forms of carbon, such as cellulose and lignin, which were not substantially degraded during the 2-day incubation. 13 C-enriched DNA was separated and recovered using cesium chloride density gradient ultracentrifugation 10 . The atom % 13 C was measured before and after density separation using UHPLC-MS/MS 18 . Amplicon libraries were prepared and sequenced from 13 C-enriched DNA for SIPhemicellulose (Data Citations 4,5) and SIP-cellulose (Data Citation 6) experiments targeting both bacterial and fungal phylogenetic markers as previously described (Table 4 (available online only)). Four shotgun metagenome libraries were generated from 13 C-enriched DNA from SIP-cellulose incubations for REF, OM1 and OM3 treatments and one unlabelled control sample (REF) from PP CA . Sufficient DNA for metagenome library preparation was achieved by pooling the corresponding DNA extracts from mineral layer samples at all three sites in PP CA . Shotgun metagenomes were prepared from 40-50 ng of enriched DNA using the Nextera DNA Sample Preparation Kit (Illumina Inc., CA, USA) and were multiplexed on two lanes of Illumina HiSeq (2 × 100-bp), yielding 285 million paired-end reads (Data Citation 6). There was insufficient 13 C-enriched DNA to generate metagenomes from the organic layer samples. Raw sequences were quality-controlled, assembled and binned into partial genomes (Table 4 (available online only); Data Citation 6) according to methods described in Wilhelm et al. 8 .

Data Records
The raw pyrosequencing output (~400-bp reads) for all 16S rRNA gene (n = 724 samples) and ITS (n = 658 samples) amplicon libraries that were generated from field soil samples averaged 8,900 and 8,400 reads per library (post-QC), respectively, and are archived at the European Sequencing Archive (Table 2 (available online only)). The raw SFF files for 16S rRNA amplicon libraries can be found with the study accession PRJEB8599 (Data Citation 1), except for all libraries from British Columbia which were archived in study accession PRJEB12501 in 'fastq' format (Data Citation 2). The latter contains the entire collection of ITS amplicon libraries in 'fastq' format. All libraries were extracted from raw SFF files using the mothur command 'sffinfo' and either uploaded in standard flowgram format (SFF) or converted to 'fastq' format 19 using 'sffinfo' to produce paired 'fasta' and 'qual' files, which were merged into 'fastq' format using 'PairedFastaQualIterator' from the SeqIO module in BioPython 20 .
Raw shotgun metagenomic data for all ecozones can be downloaded in 'fastq' format with the study accession PRJEB8420 from the European Nucleotide Archive (Table 3 (available online only), Data Citation 3). After our quality filtering, libraries had a median count of 59.3 million sequences for 150-bp read libraries, while shorter read libraries (75-bp) had higher median counts (115 million). These numbers are provided as an estimate of the number of high quality reads obtainable from our raw data, but will change depending on the parameters selected during quality processing.
Due to cost of 13 C-labeled materials and additional labour required to process SIP samples, only a subset of ecozones were selected for SIP-hemicellulose (PP CA & IDF BC ) and SIP-Cellulose (PP CA ) characterizations. The raw pyrosequencing output (~400-bp reads) for SIP-hemicellulose 16S rRNA (Data Citation 4) and ITS (Data Citation 5) amplicon libraries, which averaged 4,200 and 3,800 reads per library (post-QC), respectively, are available in SFF format from the European Nucleotide Archive (Table 4 (available online only)). These datasets include both 13 C-and 12 C-libraries for 16S rRNA genes, n = 35 and 15, respectively, and for ITS, 18 and 2, respectively. Similarly, the raw sequencing data for all b a Figure 3. The successful enrichment and recovery of 13 C-enriched DNA in SIP experiments. The assimilation of 13 C by functional guilds during stable isotope probing experiments was evident in (a) the total 13 C-enrichment of soil DNA extract and (b) recovery of DNA from the densest CsCl gradient fractions (F 1 -F 7 ).
These differences were evident in comparing soils amended with either 12 C-(unlabelled) or 13 C-cellulose. These trends were apparent in both cellulose and hemicellulose SIP experiments. Boxplots depict the average (centre line) and spread (from 25th to 75th percentile) of data, while the whiskers extend to the extrema. A total of twelve samples were averaged for each factor. SIP-cellulose 16S rRNA amplicon and ITS amplicon libraries, averaging 11,800 and 10,300 reads per library, respectively, are available in 'fastq' format from the European Nucleotide Archive (Table 4 (available online only)). The 13 C-and 12 C-libraries from these data are more balanced, with a total of 24 16S rRNA gene libraries for both, and 24 and 23, respectively, for ITS libraries. Raw Illumina, pairedend, 100-bp shotgun metagenome sequencing libraries for pooled, SIP-cellulose mineral soil incubations were archived in 'fastq' format (Data Citation 6; Table 4 (available online only)), along with 10 partial genomes in 'fasta' format comprised of assembled scaffolds (Data Citation 6; Sample accessions: ERZ288956-ERZ288966).

Technical Validation
The recovery of 13 C-enriched DNA was validated by quantifying 13 C-content of nucleic acids 18 (Fig. 3). The completeness of draft genomes recovered from SIP-Cellulose metagenomes was assessed by scanning for essential single-copy, house-keeping genes with hidden Markov models 21 .

Usage Notes
Metagenomes for all ecozones, except IDF BC , can be also found at the DOE JGI portal (http://genome.jgi. doe.gov/) under proposal ID 543. This site provides the raw sequencing data and annotation for the assembled and unassembled metagenomes.