Metagenomes, metatranscriptomes and microbiomes of naturally decomposing deadwood

Deadwood represents significant carbon (C) stock in a temperate forests. Its decomposition and C mobilization is accomplished by decomposer microorganisms – fungi and bacteria – who also supply the foodweb of commensalist microbes. Due to the ecosystem-level importance of deadwood habitat as a C and nutrient stock with significant nitrogen fixation, the deadwood microbiome composition and function are critical to understanding the microbial processes related to its decomposition. We present a comprehensive suite of data packages obtained through environmental DNA and RNA sequencing from natural deadwood. Data provide a complex picture of the composition and function of microbiome on decomposing trunks of European beech (Fagus sylvatica L.) in a natural forest. Packages include deadwood metagenomes, metatranscriptomes, sequences of total RNA, bacterial genomes resolved from metagenomic data and the 16S rRNA gene and ITS2 metabarcoding markers to characterize the bacterial and fungal communities. This project will be of use to microbiologists, environmental biologists and biogeochemists interested in the microbial processes associated with the transformation of recalcitrant plant biomass. Measurement(s) metagenomic data • metatranscriptomic data • microbiome • RNA-seq of total RNA Technology Type(s) DNA sequencing • RNA-seq of total RNA • amplicon sequencing • RNA sequencing Factor Type(s) time of decomposition Sample Characteristic - Organism Fungi • Bacteria Sample Characteristic - Environment wood Sample Characteristic - Location Narodni prirodni rezervace Zofinsky prales Measurement(s) metagenomic data • metatranscriptomic data • microbiome • RNA-seq of total RNA Technology Type(s) DNA sequencing • RNA-seq of total RNA • amplicon sequencing • RNA sequencing Factor Type(s) time of decomposition Sample Characteristic - Organism Fungi • Bacteria Sample Characteristic - Environment wood Sample Characteristic - Location Narodni prirodni rezervace Zofinsky prales Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.14821752


Background & Summary
Forests, and especially unmanaged natural forests, accumulate and store large amounts of carbon (C) 1 . A substantial fraction of this C stock -73 ± 6 Pg or 8% of the total global forest C stock is contained within deadwood 2 . This C pool is transient because during its transformation by saprotrophic organisms most C is liberated as CO 2 into the atmosphere 3 , while the rest is sequestered in soils as dissolved organic C, within microbial biomass or as a part of the soil organic matter -along with other nutrients 4 . Fungi in deadwood appear to be major decomposers using extracellular enzymes for the decomposition of recalcitrant plant biopolymers as shown in associated study 5 . Fungi also determine the bacterial community composition 6,7 . Bacterial fixation of atmospheric N 2 was shown to substantially contribute to the nitrogen (N) increase in deadwood during decomposition 5,8,9 . In addition to bacteria and fungi, deadwood also hosts a suite of other organisms including archaea, viruses, protists, nematodes and insects, whose roles in deadwood are so far unknown. In order to understand the deadwood as a dynamic habitat, it is necessary to describe the composition of associated microorganisms with an emphasis on the major groups -fungi and bacteria -whose ecologies are often genus-specific 10 . Further, it is important to link deadwood-associated organisms to processes occurring at different stages of decomposition either by characterization of isolates 7,11 or by cultivation-independent techniques.
In this Data Descriptor we present the comprehensive datasets of DNA and RNA-derived data and sample metadata to characterize deadwood organisms and their activity at various stages of decomposition (Table 1,  Supplementary Table 1). The data derived from DNA representing the community composition and genomic potential, include 16S rRNA gene sequences and ITS2 sequences, metagenomics reads, metagenome assembly and bacterial metagenome-assembled genomes (MAGs 12 ). The data derived from RNA are represented by the total RNA reads whose majority originates from ribosomal RNA and which are taxonomically assignable and thus can be used as a proxy for the PCR-unbiased view of community composition. Further, data contain metatranscriptome raw reads and assembly that represent the processes occurring in deadwood. The dataset characterizes decomposing trunks of the European beech (Fagus sylvatica L.) in a beech-dominated natural forest in the temperate Europe (Fig. 1). The metagenome was assembled from 25 DNA samples of deadwood with decomposition time ranging from young wood (<4 years since tree death) to almost completely decomposed wood (>41 years of decomposition). It was possible to perform the resolving of 58 high-quality metagenome assembled genomes (MAGs) with a total of 19.5 × 10 3 contigs spanning 10 bacterial phyla including those that are difficult to culture, such as Acidobacteria, Patescibacteria, Verrucomicrobia and Planctomycetes (Fig. 2, Supplementary Table 2). 16S rRNA gene and ITS2 amplicon data contribute to comparison of microbial diversity and occurrence patterns at the global scale using public databases GlobalFungi 13 or Earth Microbiome 14 ( Supplementary Fig. 1). Deadwood metatranscriptome was assembled from 10 RNA samples spanning two age classes of decomposing deadwood (between 4 and 19 years old). The amount of raw and assembled data in individual data packages is summarised in the Table 2. Overview of data previously used to describe complementarity of fungal and bacterial roles in deadwood is specified in Data Records summary 5 .
The previous studies devoted to deadwood have not seen such a comprehensive set of information about the associated biota. These data significantly improve the width and resolution and thus the understanding of the biodiversity of deadwood associated biota and its function. Given that natural forests represent an essential ecosystem concerning C storage and nutrient cycling, the data within this Data Descriptor make it possible to fully appreciate the ecosystem-level roles that deadwood plays in forest ecosystems.

Methods
Study area and sampling. Deadwood was sampled in the core zone of the Žofínský prales National Nature Reserve, an unmanaged forest in the south of the Czech Republic (48°39′57″N, 14°42′24″E) as described earlier in the associated study 5 . The core zone had never been managed and any human intervention stopped in 1838 when it was declared as reserve. It thus represents a rare fragment of European temperate virgin forest left to spontaneous development. The reserve is situated at 730-830 m a.s.l., bedrock is almost homogeneous and consists of finely to medium-grainy porphyritic and biotite granite. Annual average rainfall is 866 mm and annual average temperature is 6.2 °C 15 .
Previous analysis indicated that deadwood age (time of decomposition) significantly affects both wood chemistry and the composition of microbial communities 16,17 . We thus randomly selected dead tree trunks that represented age classes 1-5 assigned based on the decomposition length 18 . Each age class was represented by five logs of 30-100 cm diameter ( Table 1). The age class 1 was <4 years since tree death, class 2 4-7 years, class 3 8-19 years, class 4 20-41 years and class 5 > 41 years (n = 5 per age class); only trees that were not alive and not decomposed before downing were considered. DNA was extracted from all logs. Due to sample-specific RNA extraction yields, RNA of sufficient amount and quality was extracted from the subset of logs (age classes 2 and 3). Sampling was performed in November 2016. The length of each selected log (or the sum of the lengths of its fragments) was measured and four samples were collected at the positions of 20%, 40%, 60% and 80% of the log length by drilling. This was performed vertically from the middle of the upper surface through the whole diameter using an electric drill with an auger diameter of 10 mm. The sawdust from all four drill holes from each log was pooled and immediately frozen using liquid nitrogen, transported to the laboratory on dry ice and stored at −80 °C until further processing.
Sample processing, DNA and RNA extraction. Sample characteristics as pH, C, N and water content were measured as described in the associated study 5 . Similarly, workflow of nucleic acid preparation, ligation and sequencing was described previously. Briefly, wood samples (approximately 10 g of material) were homogenized using a mortar and pestle under liquid nitrogen prior to nucleic acid extraction and thoroughly mixed. Total DNA was extracted in triplicate from 200 mg batches of finely ground wood powder using a NucleoSpin Soil kit (Macherey-Nagel).   p h a p r o t e o b a c t e r   www.nature.com/scientificdata www.nature.com/scientificdata/ Total RNA was extracted in triplicate from 200 mg batches of sample using NucleoSpin RNA Plant kit (Macherey-Nagel) according to manufacturer's protocol after mixing with 900 μl of the RA1 buffer and shaking on FastPrep-24 (MP Biomedicals) at 6.5 ms −1 twice for 20 s. Triplicates were pooled and treated with OneStep PCR Inhibitor Removal kit (Zymo Research), DNA was removed using DNA-free DNA Removal Kit (Thermo Fisher Scientific). The efficiency of DNA removal was confirmed by the negative PCR results with the bacterial primers 515F and 806R 19 . RNA quality was assessed using a 2100 Bioanalyzer (Agilent Technologies).

Analysis of deadwood-associated organisms.
To estimate the relative representation of deadwood-associated organisms in deadwood, total RNA was sequenced since the majority of the RNA represents either small subunit ribosomal RNA or large subunits ribosomal RNA that allows the identification of organisms by BLASTing against the curated databases from SILVA 20,21 . Read abundances represent the abundances of ribosomes of each taxon and thus reflect the abundance of each taxon. Libraries for high-throughput sequencing of total RNA were prepared using TruSeq RNA Sample Prep Kit v2 (Illumina) according to the manufacturer's instructions, omitting the initial capture of polyA tails to enable total RNA to be ligated. Samples were pooled in equimolar volumes and sequenced on an Illumina HiSeq 2500 (2 × 250 bases) at Brigham Young University Sequencing Centre, USA.

Metatranscriptomics and metagenomics.
For metatranscriptome analysis, the content of rRNA in RNA samples was reduced as described previously 5,22 using a combination of Ribo-Zero rRNA Removal Kit Human/ Mouse/Rat and Ribo-Zero rRNA Removal Kit Bacteria (Illumina). Oligonucleotide probes from both types of Ribo-Zero kits were mixed together and added to each sample which allowed their annealing to rRNA and subsequent rRNA removal. The efficiency of the removal was checked using a 2100 Bioanalyzer and removal was repeated when necessary. Reverse transcription was performed with SuperScript III (Thermo Fisher Scientific). Libraries for high throughput sequencing were prepared using the ScriptSeq v2 RNA-Seq Library Preparation Kit Metagenome assembly and annotation were performed as described previously 5 . Briefly, Trimmomatic 0.36 23 and FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) were used to remove adaptor contamination, trim low-quality ends of reads and omit reads with overall low quality (<30), sequences shorter than 50 bp were omitted. Combined assembly of all 25 samples was performed using MEGAHIT 1.1.3 24 . Metagenome sequencing yielded on average 22.5 ± 7.2 million reads per sample that were assembled into 17,936,557 contigs over 200 bp in length.
Metatranscriptome (MT) assembly and annotation were performed as described previously 5 . Trimmomatic 0.36 23 and FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) were used to remove adaptor contamination, trim low-quality ends of reads and omit reads with overall low quality (<30), sequences shorter than 50 bp were omitted. mRNA reads were filtered from the files using the bbduk.sh 38.26 program in BBTools (https://sourceforge.net/projects/bbmap/). Combined assembly was performed using MEGAHIT 1.1.3 24 . Metatranscriptome sequencing yielded on average 31.3 ± 9.1 million reads per sample that were assembled into 1,332,519 contigs over 200 bp in length.

Identification and analysis of metagenome-assembled genomes.
Bins that represent prokaryotic taxa present in the metagenome were constructed using MetaBAT2 25 as described previously 5 with default settings except for the minimal length of contigs set to 2000 bp, which produced bins with overall better statistics than the minimal 2500 bp size. CheckM 1.0.11 26 served for assigning taxonomy and statistics to bins with line-age_wf pipeline. Bins with a completeness score greater than 50% were selected for quality improvement using RefineM according to the instructions of the developers 27 . Briefly, scaffolds with genomic properties (GC content, coverage profiles and tetranucleotide signatures) whose values were different from those expected in each bin were excluded. These values were calculated based on the mean absolute error and correlation criteria. Next, the refined bins were further processed to identify and remove scaffolds with taxonomic assignments different from those assigned to the bin. Lastly, the scaffolds that possessed 16S rRNA genes divergent from the taxonomic affiliation of the refined bins were removed. The taxonomy of the bins was inferred by GTDB-Tk 28 . 58 bins with quality scores >50 (CheckM completeness value -5 × redundancy value) were considered metagenome-assembled genomes (MAGs) as defined by 27 and deposited in the NCBI database.  www.nature.com/scientificdata www.nature.com/scientificdata/ GToTree v1.5.39 29 together with Prodigal 30 , HMMER3 31 , Muscle 32 , trimAI 33 , FastTree2 34 were used to infer phylogeny of MAGs based on set of 74 bacterial single-copy gene HMM profiles with minimal marker share >25%.
ITS2 and 16S rRNA gene amplicon sequencing and analysis. Subsamples of DNA were used to amplify the fungal ITS2 region using barcoded gITS7 and ITS4 primers 35  Three PCR reactions were pooled together, purified by MinElute PCR Purification Kit (Qiagen) and mixed in equimolar amount according to concentration measured on the Qubit 2.0 Fluorometer (Thermo Fisher Scientific). Sequencing libraries were prepared using the TruSeq PCR-Free Kit (Illumina) according to manufacturer's instructions and sequencing was performed in-house on Illumina MiSeq (2 × 250 bases).
The amplicon sequencing data were processed using the pipeline SEED 2.1.05 36 . Briefly, paired-end reads were merged using fastq-join 37 . Sequences with ambiguous bases and those with a mean quality score below 30 were omitted. The fungal ITS2 region was extracted using ITS Extractor 1.0.11 38 before processing. Chimeric sequences were detected using USEARCH 8.1.1861 39 and deleted, and sequences were clustered using UPARSE implemented within USEARCH 40 at a 97% similarity level. The most abundant sequences were taken as representative for each OTU. The closest fungal hits at the species level were identified using BLASTn 2.5.0 against UNITE 8.1 41 . Where the best fungal hit showed lower similarity than 97% with 95% coverage, the best genus-level hit was identified. The closest bacterial hit from SILVA SSU database r138 21 was found by DECIPHER 2.18.1 package 42 using IDTAXA algorithm with threshold 60 43 . Sequences identified as nonfungal and nonbacterial were discarded.

Data Records
Data described in this study are summarized in the Supplementary Tables 1 and 2 together with the NCBI accession numbers. Raw sequencing reads (total RNA, metatranscriptomics and metagenomics), assembly files and resolved MAGs have been deposited under NCBI BioProject accession number PRJNA603240 44 . In the associated study 5 metatranscriptome assembly together with raw reads mapping was used for annotation of microbial functions, total RNA raw reads were used to infer mainly fungal and bacterial taxonomic composition, metagenome assembly and raw reads mapping served solely for MAGs identification. Amplicon data of bacterial 16S rRNA gene and fungal ITS2 that were not published previously, have been deposited under NCBI BioProject accession number PRJNA672674 45 .

Technical Validation
Deadwood samples were taken aseptically by using sterilized equipment and sterile RNase and DNase-free tubes. RNA and DNA were extracted in an RNase free environment. During the library preparation quantity and quality of the nucleic acids were measured with a Qubit 2.0 Fluorometer and 2100 Bioanalyzer, respectively. PCR with bacterial primers 515F and 806R, negative control containing PCR-grade water and positive control containing extracted bacterial DNA was used to confirm the success of the DNase degradation of RNA samples. 2100 Bioanalyzer measurement was used to confirm successful rRNA depletion. No positive or negative sequencing controls were used to obtain metagenomic and metatranscriptomic data. For 16S rRNA gene amplification, negative and positive controls in the form of PCR-grade water and bacterial DNA, respectively were included. The concentration of the 16S rRNA gene amplicons and controls was measured with a Qubit 2.0 Fluorometer and their quality were analysed using agarose gel electrophoresis. Equimolar pooling of all barcoded sequencing libraries was done according to the quantification using KAPA Library Quantification Kit (Roche).

Usage Notes
The metagenome and metatranscriptome data described in this Data Descriptor were used to demonstrate the complementarity of fungal and bacterial functions in the carbon and nitrogen cycling in decomposing deadwood and linked them to corresponding biogeochemical processes 5 . However, the analysis on the deposited data packages in the associated study 5 focused solely on fungi and bacteria despite the presence of other groups of organisms in the studied deadwood. The present deposition of the metagenome assembly and total RNA sequencing data opens the opportunity for biologists interested in virus ecology 46 , bacterial metagenomics 47,48 and ecology of eukaryota 49,50 to explore the functional potential of the deadwood-associated biota through the analysis of the metagenome as well as to obtain taxonomic overview of all deadwood-associated organisms using total RNA that allows reliable taxonomic classification of taxa across the whole tree of life 51,52 . Amplicon data described here for the first time offer intra-comparison with metagenomes and metatranscriptomes as well as inter-comparison with further deadwood studies 16,53,54 and analysis of cross-domain interactions 6,55 . Efforts to collect data and generalize microbial diversity patterns 13,14,56 profit from fully annotated, accessible and metadata-rich sequences which we present here. The Data Descriptor further provides information for ecologists, biogeochemists and conservation biologists interested in the role of deadwood in ecosystem processes and deadwood associated biodiversity, an important topic of the present research in forest ecology 57 .

Code availability
The above methods indicate the programs used for analysis within the relevant sections. The code used to analyse individual data packages is deposited at https://github.com/TlaskalV/Deadwood-microbiome.