Background & Summary

Naturally occurring deposits containing sulfidic sediments that form acid sulfate soils (ASS) cover over 17 million hectares of coastal areas. These extreme soils occur in North America (e.g. in the South and South West of the U.S) as well as in Europe (e.g. surrounding the Baltic Sea), Asia, and Australia1. Exposure of the sulfidic materials to air initializes chemical reactions that produce sulfuric acid2, thus creating ASS with a pH < 4, while mobilizing large quantities of acidity and leachable toxic metal(loid)s (e.g. Al, As, Cd, Co, Ni, and Zn). The leached metals and acidity are ultimately transported to the surrounding waters where they cause severe environmental problems3, negative economic consequences4, and impact human health5.

Pyrite oxidation is a complex biogeochemical process that involves a series of chemical reactions aided by microbiological catalysis1. Intermediate sulfur species formed during the oxidation process are metabolized by sulfur-oxidizing bacteria6 while ferrous iron can be further oxidized at low pH by iron-oxidizing bacteria7. Ultimately, pyrite oxidation causes acidification and mobilization of trace metals that are leached to recipient waters. These processes have been extensively studied at the Risöfladan experimental field, Vaasa, Finland (Fig. 1) and there is a large amount of available geochemical and microbial 16S rRNA gene sequencing data8,9,10,11. The previous 16S rRNA gene based investigations from the oxidized zone at the site identified a mixed community of acidophilic bacteria and archaea similar to that found in acid mine drainage environments along with sulfate reducing bacteria in the underlying un-oxidized potential acid sulfate soil (PASS) zone10,11. Finally, 16S rRNA genes with similarity to low temperature adapted microbes were identified in all soil layers, reflecting the boreal environment in Finland. However, more detailed investigations into the metabolic potential and activities of the contrasting microbial communities in oxidized ASS, the transition zone containing the oxidative front, and the PASS have not been reported.

Fig. 1
figure 1

Vertical soil profile of the transition from oxidized ASS to reduced PASS at the Risöfladan experimental field. Basic geochemical information; pH, Fe2+, and total reduced sulfur (TRS); is provided for the oxidized (approximately 75–140 cm depth, n = 3), transition (approximately 140–190 cm depth, n = 3), and un-oxidized soil layers (approximately > 190 cm depth, n = 3).

In this Data Descriptor, we present triplicate biological replicate data for 16S rRNA gene amplicons, metagenomes, and metatranscriptomes from the oxidized, transition, and un-oxidized soils horizons at the Risöfladan experimental field (total nine samples) along with the accompanying geochemical metadata (Table 1). The 16S rRNA gene amplicon sequencing dataset contains on average 369 027 reads (min 83 731, max 634 072) for the nine samples (Table 2). In addition, the nine metagenomes (three biological replicates from the three soil types designated OX-MG, TR-MG, and UN-MG) contained 3.78 × 109 paired-end reads (2 × 151 bp) of raw sequence data (Table 3). Finally, the corresponding nine metatranscriptomes (designated OX-MT, TR-MT, and UN-MT) to the three soil types generated 1.11 × 1011 paired-end reads (2 × 151 bp) of raw sequence data (Table 3).

Table 1 Summary of soil samples and available data.
Table 2 Summary of 16S rRNA gene amplicon sequencing of the three soil zones.
Table 3 Summary of the metagenome (MG) and metatranscriptomes (MT) sequencing of the three soil zones.

The broader goals that motivated the study were to take advantage of multi-omics and geochemical data to gain insights into the metabolic landscape and the molecular mechanisms underlying microbial life in this extreme environment. In addition, based on 16S rRNA gene sequencing many of the microbes in the PASS and ASS are unknown and the data will characterize the structure and function of these microbial populations. Moreover, the microbial metabolic pathways in the actual ASS will be used to answer the question how they catalyze the oxidation process.

The project will be of great use to the research community, as it will generate biological data for which there is hydrological and geochemical monitoring of the site stretching back for many years. Since ASS are extreme environments, the project will be of direct use to researchers, e.g. microbiologists, environmental biologists, geochemists, and geologists. Scientists interested in the iron and sulfur cycles as well as the contaminants released from the ASS during the oxidation process will additionally have a direct use of the project. As sulfidic sediments are affected during dredging, building of infrastructure, and agriculture; researchers involved in these areas will also have use of this work as it may give an explanation in how to minimize the negative impact of these sediments. Finally, the study is relevant to bioinformaticians that study large data sets.

Methods

Soil sampling

Soil was sampled in mid-August 2017 from the Risöfladan experimental field located in Vaasa, Finland (63° 02′ 50.22″N, 21° 42′ 41.85″E; Figs 1 and 2). Ten replicate soil samples of 5 g were taken from 75 cm below ground (oxidized zone, OX), 140 cm below ground (transition zone, TR), and 190 cm below ground (un-oxidized zone, UN). The samples were placed in separate sterile tubes (30 tubes in total), immediately preserved by freezing in liquid nitrogen, and stored at −80 °C until RNA and DNA were extracted within ten days of sampling.

Fig. 2
figure 2

Workflow from potential acid sulfate soil (PASS)/acid sulfate soil (ASS) sampling to deposited sequencing data.

RNA and DNA extraction

RNA and DNA were extracted simultaneously from 3 g of soil using the RNeasy® PowerSoil Total RNA Kit (Qiagen) and the following RNeasy® PowerSoil DNA elution Kit (Qiagen). Briefly, 5 g of frozen soil was thawed and carefully mixed before 3 g of soil was transferred to a Bead Tube provided in the RNeasy® PowerSoil Total RNA Kit and RNA was extracted according to the manufacturer’s instructions. The final RNA pellet was suspended in 25 µL RNase/DNase-free water. After eluting the RNA from the RNA Capture Column provided by the RNeasy® PowerSoil Total RNA Kit, the bound DNA was eluted from the RNA Capture Column using the RNeasy® PowerSoil DNA Elution Kit according to the manufacturer’s instructions. The final DNA pellet was suspended in 50 µL RNase/DNase-Free water.

After extraction, RNA samples were treated twice with RNase-free DNase using the Turbo DNA-free™ Kit (Invitrogen) according to the manufacturer’s instructions and then stored at –80 °C. In order to obtain sufficient samples for the final three replicates per soil zone, two to three of the extracted RNA and DNA samples were pooled. The same replicate extractions were pooled for RNA and DNA in order for the metatranscriptomes to match the metagenomes and the RNA transcripts to be mapped to the metagenomes. Ribosomal RNA was depleted from the pooled RNA samples with the Ribominus™ Transcriptome Isolation Kit (Invitrogen) according to the manufacturer’s instructions. Nucleic acid concentrations were measured using a Qubit® 2.0 Fluorometer (Life Technologies). DNA and RNA samples were stored at −80 °C before being submitted to the U.S. Department of Energy Joint Genome Institute (JGI) for sequencing.

16S rRNA gene amplicon sequencing

Subsamples from the DNA extracts were used for amplifying the V3-V4 region of the microbial 16S rRNA gene by using primers 341F (CCTACGGGNGGCWGCAG) and 805R (GACTACHVGGGTATCTAATCC)12, followed by PCR amplification for Illumina sequencing13 (Fig. 2). Samples were sequenced at the Science for Life Laboratory, Sweden, on the Illumina MiSeq platform14. A summary of the 16S rRNA gene amplicon sequencing reads is shown in Table 2.

Metagenome and metatranscriptome library construction and sequencing

For metagenomes, 100 ng of DNA was sheared to 300 bp using the Covaris LE220 and size selected using SPRI beads (Beckman Coulter). The fragments were treated with end-repair, A-tailing, and ligation of Illumina compatible adapters (IDT, Inc) using the KAPA-Illumina library creation kit (KAPA Biosystems). The prepared libraries were quantified using KAPA Biosystem’s next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. Sequencing of the flowcell was performed on the Illumina NovaSeq sequencer using NovaSeq XP V1 reagent kits, tbd-sample dependant flowcell, following a tbd-sample dependant indexed run recipe.

For metatranscriptomes, stranded cDNA libraries were generated using the Illumina Truseq Stranded RNA LT kit. Ten ng of total RNA was fragmented using divalent cations and high temperature. The fragmented RNA was reversed transcribed using random hexamers and SSII (Invitrogen) followed by second strand synthesis. The fragmented cDNA was treated with end-pair, A-tailing, adapter ligation, and eight cycles of PCR. The prepared libraries were quantified using KAPA Biosystem’s next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. Sequencing of the flowcell was performed on the Illumina NovaSeq sequencer using NovaSeq XP V1 reagent kits, tbd-sample dependent flowcell, following a tbd-sample dependant indexed run recipe.

The generated reads and bases of all samples can be seen in Table 3.

Data Records

A summary of the data records included in the study is given in Table 4. The raw Illumina MiSeq 16S rRNA gene sequencing reads are available from the NCBI Sequence Read Archive under the NCBI BioProject accession number PRJNA52414415. The raw Illumina NovaSeq sequencing reads for all metagenomes are available from the NCBI Sequence Read Archive under accession numbers SRP18508216, SRP18508417, SRP18508618, SRP18508719, SRP18508820, SRP18508921, SRP18509222, SRP18509123 and SRP18522324. The raw Illumina NovaSeq sequencing reads for all metatranscriptomes are also available from the NCBI Sequence Read Archive under accession numbers SRP18522225, SRP18509426, SRP18509327, SRP18522528, SRP18522429, SRP18522630, SRP18522731, SRP18522832 and SRP18522933.

Table 4 Summary of the 16S rRNA gene, metagenome, and metatranscriptome data records.

Technical Validation

Soil samples were taken aseptically by using sterilized equipment and sterile RNase and DNase-free tubes. RNA and DNA were extracted in an RNase free environment and the quantity of the extracted nucleic acids were measured between every step with a Qubit® 2.0 Fluorometer. The quality of the extracted nucleic acids was analyzed using both a NanoDrop™ 2000 Spectrophotometer and agarose gel electrophoresis using a 1% agarose gel with 0.01% SYBR™ Safe DNA Gel Stain (Invitrogen) and a Thermo Scientific™ GeneRuler™ Ready-to-use 1 kb Plus DNA Ladder.

For 16S rRNA gene amplification, negative and positive controls in the form of RNase/DNase-free water and DNA extracted from Pseudoateromonas citrea, respectively were included in the first amplification step. In addition, negative controls in the form of mastermix with primer-barcodes without templates were included in the second PCR amplification to confirm that both the primers and the barcodes identifying the sequences functioned correctly and that no contamination occurred. The concentration of the 16S rRNA gene amplicons and controls was measured with a Qubit® 2.0 Fluorometer and their quality were analyzed using agarose gel electrophoresis (Fig. 3).

Fig. 3
figure 3

Agarose gel electrophoresis and Qubit® 2.0 Fluorometer DNA concentrations of the V3-V4 region of 16S rRNA gene amplicons from the triplicate samples of the three soil zones. Thermo Scientific™ GeneRuler™ Ready-to-use 1 kb Plus DNA Ladder was used in the gel electrophoresis and can be seen to the left side of the gel. TR-1 to TR-3 are triplicate samples from the transition zone, UN-1 to UN-3 are triplicate samples from the un-oxidized zone, and OX-1 to OX-3 are triplicate samples from the oxidized zone. PCR-1 stands for templates after the first amplification step by using primers 341F and 805R. The positive (DNA from Pseudoateromonas citrea) and negative (RNase/DNase-free water) controls used in the first amplification are named “Positive” and “Negative” in the figure. PCR-2 stands for templates after the second amplification step that attaches the barcodes identifying the sequences in the MiSeq Illumina sequencing. Negative PCR-2 stands for negative controls in the second amplifications step, i.e. mastermix with primer-barcodes without templates.