A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining

Gastauer, Markus; Vera, Mabel Patricia Ortiz; de Souza, Kleber Padovani; Pires, Eder Soares; Alves, Ronnie; Caldeira, Cecílio Frois; Ramos, Silvio Junio; Oliveira, Guilherme

doi:10.1038/sdata.2019.8

Download PDF

Data Descriptor
Open access
Published: 12 February 2019

A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining

Markus Gastauer¹,
Mabel Patricia Ortiz Vera^1,2,
Kleber Padovani de Souza^1,3,
Eder Soares Pires¹,
Ronnie Alves^1,3,
Cecílio Frois Caldeira¹,
Silvio Junio Ramos¹ &
…
Guilherme Oliveira ORCID: orcid.org/0000-0003-0054-3438¹

Scientific Data volume 6, Article number: 190008 (2019) Cite this article

13k Accesses
34 Citations
13 Altmetric
Metrics details

Subjects

Abstract

Microorganisms are useful environmental indicators, able to deliver essential insights to processes regarding mine land rehabilitation. To compare microbial communities from a chronosequence of mine land rehabilitation to pre-disturbance levels from references sites covered by native vegetation, we sampled non-rehabilitated, rehabilitating and reference study sites from the Urucum Massif, Southwestern Brazil. From each study site, three composed soil samples were collected for chemical, physical, and metagenomics analysis. We used a paired-end library sequencing technology (NextSeq 500 Illumina); the reads were assembled using MEGAHIT. Coding DNA sequences (CDS) were identified using Kaiju in combination with non-redundant NCBI BLAST reference sequences containing archaea, bacteria, and viruses. Additionally, a functional classification was performed by EMG v2.3.2. Here, we provide the raw data and assembly (reads and contigs), followed by initial functional and taxonomic analysis, as a base-line for further studies of this kind. Further investigation is needed to fully understand the mechanisms of environmental rehabilitation in tropical regions, inspiring further researchers to explore this collection for hypothesis testing.

Design Type(s)	observation design • biodiversity assessment objective • sequence assembly objective
Measurement Type(s)	microbial community
Technology Type(s)	DNA sequencing
Factor Type(s)	experimental condition
Sample Characteristic(s)	Municipality of Corumba • soil

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Nutrient-induced acidification modulates soil biodiversity-function relationships

Article Open access 03 April 2024

Zhengkun Hu, Manuel Delgado-Baquerizo, … Manqiang Liu

The interplay between microbial communities and soil properties

Article 20 October 2023

Laurent Philippot, Claire Chenu, … Noah Fierer

Hidden diversity and potential ecological function of phosphorus acquisition genes in widespread terrestrial bacteriophages

Article Open access 02 April 2024

Jie-Liang Liang, Shi-wei Feng, … Jin-tian Li

Background & Summary

In many countries, the environmental rehabilitation of mine lands as close as possible to its pre-disturbance levels is a legal requirement to reduce net losses of biodiversity and ecosystem functions^1,2. It is necessary to monitor rehabilitating sites to meet targets of environmental licensing agencies². To date, there is no consensus on the best indices available from science to evaluate the monitoring process³. Therefore, multidisciplinary approaches aiming at providing such parameters have been proposed recently^4,5.

Besides vegetation or fauna surveys⁶, the examination of microbial communities can detect environmental alterations in short time scales⁷, thus able to deliver insights about the fulfillment of rehabilitation targets^8,9. Metagenomic approaches provide insights into environmental variations^10–12, detecting the diversity of microorganisms in rehabilitating habitats¹³. Comparing the composition of microbial communities from rehabilitating communities to preserved reference sites may thus contribute to the evaluation of rehabilitation success in mine lands^14,15.

In Brazil, one of the world’s leading raw iron export nations¹⁶, iron ore deposits occur in open-cast mines in different regions. Ferriferous savanna ecosystems named ‘cangas’^17,18 cover the deposits in the Iron Quadrangle (Minas Gerais), the Carajás mountains (Pará), the Caetité region (Bahia), and the Urucum Massif (Mato Grosso do Sul). Due to particular environmental conditions such as high concentrations of metal ions, especially iron, high radiation, elevated temperatures, and ample rainfall seasonal amplitudes, these diverse and endemic ecosystems^19–21 are considered hotspots of biodiversity^17,22. Besides the storage of unique genetic resources for therapeutic purposes²³ or the remediation of contaminated areas^24,25, rupestrian canga ecosystems provide many ecosystem services²⁶.

Impacted by iron ore extraction²⁷ reshaping entire landscapes²⁸ by the removal of ore and mining wastes, the environmental rehabilitation of the impacted ecosystems is desired aiming at the preservation of biotic resources and ecosystem services for future generations. Insights to composition, diversity and functional characterization of microbial soil communities along environmental rehabilitation gradients are useful variables for measuring the success of rehabilitation, able to provide valuable feedback to improve the rehabilitation practice.

The goal of this study was to identify changes in microbial community composition, diversity and functional processes resulting from mine land rehabilitation and compare to pre-disturbance levels from references sites covered by native canga vegetation. We sampled three study sites before rehabilitation efforts, seven study sites spanning different rehabilitation stages and three reference canga sites associated with two iron ore mines from Corumbá (Urucum Massif). Environmental rehabilitation comprises topographic reformulation after removal of the iron ore, liming, fertilization and the application of biomass before native canga species are seeded or planted.

At each study site, we installed three plots of 10 × 10 m; in each plot, a composed soil sample was collected (depth 0–2 cm) for metagenomics analysis. An additional sample (depth 0–10 cm) was collected for physical and chemical analysis. In this study, we applied a paired-end sequencing technology (NextSeq 500 Illumina) after DNA extraction, purification and amplification to construct metagenomic libraries. The Illumina reads were assembled using MEGAHIT. Subsequently, nucleotide sequences coding for proteins (CDS), were extracted from assemblies. Functional and taxonomic classification of coding DNA sequences (CDS) was performed using EMG and Kaiju.

Here, we provide the complete metagenomic data set, without detailed analysis of results or discussion to highlight its outstanding comprehensive view into soil microbial communities from the rehabilitation of a canga ecosystem occurring in Southwestern Brazil. We furthermore present the annotated metagenome assembly, containing taxonomic and functional classification as well as chemical soil properties (i.e., pH, cation exchange capacity, organic matter contents, micro- and macro nutrient as well as aluminum availability) and soil texture. The present collection is the first high-throughput sequencing-based survey from non-rehabilitated and reference sites as well as sites under rehabilitation after iron ore mining from a tropical region, thus representing base-line data for further studies of this kind. With its publication, researchers can explore this collection for hypothesis testing related to environmental rehabilitation in tropical regions, especially after mining activities. The consistency in experimental design, sequencing methodology and sample sources ensures the value of this collection for on-going studies about environmental rehabilitation after anthropogenic impacts, in particular, those about mine land rehabilitation.

Methods

Experimental design

Data were collected in October 2016 in 13 study sites from open-cast iron ore mines situated in the Urucum Massif, Mato Grosso do Sul, Brazil (Fig. 1). The altitude of the massif varies between 600 and 1,065 m a.s.l. With a mean annual temperature of 25 °C and mean annual precipitation of 1,070 mm²⁹, the climate of the region corresponds a tropical warm, savanna climate (Aw in the Koppen classification), characterized by dry winters and rainy summers. The natural vegetation is a mosaic of seasonal deciduous and semi-deciduous forests on slopes and near watercourses. Furthermore, different savanna formations, ranging from arborized physiognomies to treeless grasslands stock on the upper parts of the massif ³⁰.

**Figure 1: Map of geographical position of the study sites in the Urucum Massif, Corumbá, Mato Grosso do Sul, Brazil.**

Iron ore mining in the region is restricted to the outcrops of ferruginous jaspilites and fixed hematites from the Santa Cruz Formation³¹ and begins with the suppression of vegetation and removal of topsoil layers. Environmental rehabilitation after mining includes topological reformulation, topsoil application, liming and fertilization of mine soils. Organic matter originating from suppressed areas is added. The rehabilitation targets are native open savanna formations, i.e., pre-mining formations on ironstone outcrops. Thus, plants rescued from suppressed areas and seedlings of native species produced in a tree nursery are planted to trigger environmental rehabilitation of mine lands. Additionally, seed mixtures of native species collected in the vegetation remnants were applied. On-demand, further activities, such as re-plantation of seedlings, further applications of seeds, and combating alien invasive species, were executed.

Study sites comprise three bare soil areas immediately before rehabilitation activities are carried out, seven sites from different rehabilitating stages (two-, three- and six-year-old stands) as well as three reference sites covered by native vegetation, i.e., open savanna formations (Table 1). At each study site, three plots (10 × 10 m) were installed in homogeneous vegetation without signs of external disturbances.

Table 1 Site information for all 13 sampling locations utilized in this study.

Full size table

Two mixed soil samples were collected from each plot. For each sample, the substrate from five homogeneously distributed sampling points within each plot was mixed. The first sample collected at a depth of 0–10 cm was air dried and submitted to analysis of chemical properties and texture. The pH in water (pH(H2O)) and in potassium chloride (pH(KCl)), organic matter (OM), available phosphorus (P), potassium (K), sulfur (S), calcium (Ca), magnesium (Mg), aluminum (Al), boron (B), zinc (Zn), iron (Fe), manganese (Mn) and copper (Cu) concentrations as well as effective cation exchange capacity (ECEC) of the samples were determined following standardized protocols³². Soil texture was detected by particle-size distribution analysis using the pipette method.

A mixed superficial soil sample (depth 0–2 cm) was collected for metagenomics analysis from each plot. Immediately after collection, soil samples were cooled in a fridge to avoid DNA degeneration. At the lab, the samples were stored in a freezer of −80 °C until analysis.

DNA extraction and shotgun sequencing

From 250 mg soil from each sample, total DNA was extracted using the PowerSoil DNA Isolation Kit (Mobio Laboratories, USA) following the manufacturer’s instructions. DNA samples were quantified using Qubit 3.0 fluorometer (Thermo Fisher Scientific Inc.).

Shotgun metagenomic paired-end libraries were then constructed from 50 ng of pure DNA. For that, samples were subjected to a random enzymatic fragmentation in which the DNA was simultaneously fragmented and bound to adapters using the QXT SureSelect kit (Agilent Technologies). The fragmented DNA was purified using AmPure XP beads (Beckman Coulter) and subjected to an amplification reaction using primers complementary to the Illumina flowcell adapters. Amplified libraries were again purified using AmPure XP beads (Beckman Coulter), quantified using the Qubit 3.0 Fluorometer (Thermo Fisher Scientific Inc.) and checked for fragments size in the 2100 Bioanalyzer (Agilent Technologies®) using a High Sensitivity DNA kit (Agilent Technologies).

After that, the libraries were adjusted to a concentration of 4 nM, pooled, denatured and diluted to a running concentration of 1.8 pM. The sequencing run was performed in the NextSeq 500 Illumina platform using a NextSeq 500 v2 kit high-output with 150 cycles.

Genome assembly, taxonomic and functional classification

The Illumina paired-end reads were assembled using MEGAHIT v1.1.2³³, using default parameters (Fig. 2). Contigs were output in the fasta format.

**Figure 2: Workflow of genome assembly, functional and taxonomic classification and data validation applied in this study.**

Using a locally installed EMG v2.3.2 pipeline³⁴, coding DNA sequences (CDS) were extracted from contigs output as .fnn files. Furthermore, the pipeline produces the functional classification output as .ipr files. Subsequently, the taxonomic classification was performed on CDS using Kaiju v.1.4.4 (running mode: greedy, with up to 5 substitutions; minimum match: 12; minimum match score: 70)³⁵. As reference database, we used the non-redundant NBCI BLAST protein sequences (access on December, 8^th, 2016, containing 81 M protein sequences from Bacteria, Archaea, and Viruses). We estimated average coverage as the fraction of the observed microbial community covered by the NBCI BLAST protein sequence by package Nonpareil v3.3.3³⁶, using forward reads with quality scores greater than Q20, as recommended by the tool.

Cluster analysis

For data validation, taxonomic and functional counting matrices were generated. Differences in entire microorganism richness, i.e., the taxonomic matrix containing all CDS identified until genus level, between non-rehabilitated, rehabilitating and reference study sites were outlined using one-way ANOVA followed by post-hoc Tukey HSD tests after checking for normality and homogeneity of variance. Diversity was estimated as Shannon’s diversity index H’, using package vegan v2.5-2³⁷ in R Environment.

We used package pvclust v2.0³⁸ in R Environment v3.4.1³⁹ to compute the clusters from the taxonomic counting matrix, considering genus-level predictions from Kaiju. Cluster consistency was tested using the approximately unbiased (au) and the bootstrap probability (bp) statistics⁴⁰. Both statistics return p-values ranging from 0 to 1, where 0 represents a weak consistency and 1 represents a strong consistency for all formed clusters. As au is a better approximation to unbiased p-value than bp, we considered only with au values larger than 0.95, which represents a strong similarity between the grouped samples.

Finally, an integral analysis of taxonomy was performed by MGCOMP⁴¹ to observe the relationship among sample profiles and sites. In order to reduce the influence of rare organisms in this analysis, we considered only the top 30 most abundant genera for each sample, which corresponds to the smallest number of genera covering 50% of the analyzed sequences. Based on these top 30 genera, we performed a two-level clustering of all identified genera for this analysis. In the first level, the samples that showed similar genus abundances were grouped and in the second level, a second grouping was carried out in each cluster considering only the samples belonging to the respective group. After the grouping, the genera present in all first level groupings (denominated core taxa), the genera present exclusively in each of the first level groupings (denominated exclusive taxa) and the other genera (denominated neutral taxa) were identified.

Data Records

The raw nucleotide sequences of 1,192,347,558 reads and 2,608,990 contigs extracted from 34 soil samples were deposited as fastq and fasta files at NBCI (Data Citation 1 and Table 2 (available online only)). As required, fastq files contain four lines for each read, that is an identifier of the read, the nucleotide sequence, the placeholder ‘ + ’ for optional annotations (not used here) and the Phred quality score of each nucleotide. fasta files are composed of two elements for each contig, an identifier and the sequence of the contig.

Table 2 Sequencing and assembly data from metagenomic libraries of 34 soil samples from non-rehabilitated, rehabilitating and reference sites from two iron-ore mines, Corumbá, Mato Grosso do Sul, Brazil.

Full size table

Further data were deposited in Open Science Framework (Data Citation 2). Here, the “supplementary” folder contains quality reports for forward and reverse reads from each sample as well as chemical and physical soil properties. Soil properties are furnished as comma delimited .csv file, named SoilSamples.csv. Read quality reports contains 12 section entitled 1) Basic Statistics, 2) Per base sequence quality, 3) Per tile sequence quality, 4) Per sequence quality scores, 5) Per base sequence content, 6) Per sequence GC content, 7) Per base N content, 8) Sequence Length Distribution, 9) Sequence Duplication Levels, 10) Overrepresented sequences, 11) Adapter Content and 12) Kmer Content. The file README.txt, available in the same folder, contains a brief explanation for each section.

Additionally, the “cluster_analysis” folder contains three subordinated folders. The “inputs” folder contains files regarding CDS detected within assembled contigs whereas the “output” folder contains the taxonomic and the functional classification that were used to generate counting matrices by the corresponding scripts, deposited in the “script” folder.

The “inputs” folder contains three zipped files. First, kaiju_input.tar.gz contains a file for each sample with all identified CDS. The file lists CDS identifiers and their sequences. Second, kaiju_output.tar.gz contains the taxonomic classification for each CDS, stored as individual, tab-delimited files for each sample. An upper case letter indicates the success of taxonomic classification (U is unclassified, C is classified) and is followed by the CDS identifier, the NCBI taxonomy ID for the identified taxon and a string showing taxonomic identification containing domain, phylum, class, order, family, genus and species, separated by semicolons, for each CDS. The identifier is composed of CDS ID, containing the contig ID as well as the initial and final nucleotide positions of the CDS within the contig, all of them joined by underlines to a single string. The interpro_output.tar.gz contains the functional classification. Individual comma-delimited files (.csv) contains the enzyme list detected within each sample. Each file is composed of three columns containing an identifier, the name of the protein as well as the number of occurrences within the analyzed sample.

The “output” folder contains three comma separated files within a zipped folder (output.tar.gz). The files correspond to the expected taxonomic (taxa.csv) and the functional matrices (functions.csv). Additionally, taxa_30.csv shows the taxonomic matrix for the 30 top genera only.

Furthermore, five R scripts used to produce the taxonomic matrix (taxonomic_analysis.R), plot samples clustered by taxonomic composition (taxonomic_cluster_plot.R), plot taxonomic composition of each sample (taxonomic_stacked_plot.R), produce the functional matrix (functional_analysis.R) and to plot samples clustered by functions (functional_cluster_plot.R) are available in the “scripts” folder.

Technical Validation

Altogether, 2,166,372 CDS were detected. A total of 2.064 genera were present in 1,290,491 CDS, among them 127 archaea, 1,853 bacteria, and 84 virus genera. Richness varies from 739 to 1,894 within samples (Table 3). 273,799 CDS (12.64% of all CDS) remain completely unclassified, and for an additional 875,881 CDS (40.43% of all CDS), only partial matches are available. Functional classification of identified contigs distinguished 10,913 proteins.

Table 3 Taxonomic and functional classification of communities from metagenomic libraries of 34 soil samples from non-rehabilitated, rehabilitating and reference sites from two iron-ore mines, Corumbá, Mato Grosso do Sul, Brazil.

Full size table

All micro-organism diversity within samples (measured on genus level) varied from 4.5 to 5.5 (Fig. 3) and was significantly higher in non-rehabilitated than in rehabilitating study sites (ANOVA, F = 4.137, p = 0.0255, Fig. 3). Significant differences in community composition were detected. First, the cluster analysis separated the samples into two clusters. The larger cluster groups samples from rehabilitating and reference sites, whereas samples from non-rehabilitated sites were grouped outside (Fig. 4).

**Figure 3: Shannon diversity of each of the 34 samples (left) and boxplot of species richness, separated by non-rehabilitated (NR), rehabilitating (RH) and reference study sites (REF).**

Figure 4: Clustering of samples from non-rehabilitated (NR), rehabilitating (RH) and reference study sites (REF) from Corumbá iron ore mines, Mato Grosso do Sul, Brazil, based on taxonomic counting matrix.

Additionally, the complete analysis of taxonomy separated the dataset into four groups by taxonomic profile, three of them divided into subgroups (Fig. 5). As shown in Table 4, samples from all three treatments (non-rehabilitated, rehabilitating and reference sites) were clustered in groups A and B, while a single reference sample forms group D and group C is composed exclusively of non-rehabilitating samples. All analysis carried out here show that taxonomic composition of microorganism communities from rehabilitating and reference sites is highly similar, indicating that rehabilitating activities after iron ore mining in the Urucum massif can rehabilitate soil microorganisms successfully.

**Figure 5: Graphical representations of integrated taxonomy analysis performed by MGCOMP, containing a two-level grouping of all identified genera.**

Table 4 Exclusive and core taxa for each sample cluster build with MGCOMP.

Full size table

Usage Notes

Contigs and the taxonomic and functional classifications have been generated using an automated process without manual assessment, i.e., represent a draft assembly only. As such, all downstream research should independently assess the accuracy of reads, contigs, and taxonomic and functional assignments for organisms of interest. Nevertheless, this study presents a baseline for further studies of this kind.

The dataset contains a significant amount of taxa and functions previously identified, but a high portion of unclassified or incompletely classified CDS indicates the presence of a sizable portion of unseen biodiversity within soils along the sampled rehabilitation chronosequence. The identification of this unseen biodiversity may require additional alignments, eventually using different genome assemblers as well as combinations with further reference databases. Furthermore, there is a need for manual assessment of the quality of functional and taxonomic classification in some cases. This analysis of outstanding seen and unseen biodiversity within this dataset is expected to produce helpful insights to microbial community ecology along rehabilitation chronosequences after iron ore mining.

Additional information

How to cite this article: Gastauer, M. et al. A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining. Sci. Data. 6:190008 https://doi.org/10.1038/sdata.2019.8 (2019).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Society for Ecological Restoration International Science & Policy Working Group, S. The SER International primer on ecological restoration (www.ser.org & Tucson 2004).
Gastauer, M. et al. Mine land rehabilitation in Brazil: Goals and techniques in the context of legal requirements. Ambio 48, 74–88 (2019).
Article Google Scholar
Kollmann, J. et al. Integrating ecosystem functions into restoration ecology—recent advances and future directions. Restor. Ecol. 24, 722–730 (2016).
Article Google Scholar
Perring, M. P. et al. Advances in restoration ecology: rising to the challenges of the coming decades. Ecosphere 6, art131 (2015).
Article Google Scholar
Gastauer, M. et al. Mine land rehabilitation: Modern ecological approaches for more sustainable mining. J. Clean. Prod. 172, 1409–1422 (2018).
Article Google Scholar
Audino, L. D., Louzada, J. & Comita, L. Dung beetles as indicators of tropical forest restoration success: Is it possible to recover species and functional diversity? Biol. Conserv. 169, 248–257 (2014).
Article Google Scholar
Valentini, A. et al. Next-generation monitoring of aquatic biodiversity using environmental DNA metabarcoding. Mol. Ecol. 25, 929–942 (2015).
Article Google Scholar
Eaton, W. D., Shokralla, S., McGee, K. M. & Hajibabaei, M. Using metagenomics to show the efficacy of forest restoration in the New Jersey Pine Barrens. Genome 60, 825–836 (2017).
Article CAS Google Scholar
Hamonts, K. et al. Effects of ecological restoration on soil microbial diversity in a temperate grassy woodland. Appl. Soil Ecol. 117–118, 117–128 (2017).
Article Google Scholar
Bruno, A. et al. One step forwards for the routine use of high‐throughput DNA sequencing in environmental monitoring. An efficient and standardizable method to maximize the detection of environmental bacteria. Microbiologyopen 6, e00421 (2017).
Article Google Scholar
Navarrete, A. A. et al. Soil microbiome responses to the short-term effects of Amazonian deforestation. Mol. Ecol. 24, 2433–2448 (2015).
Article CAS Google Scholar
Techtmann, S. M. & Hazen, T. C. Metagenomic applications in environmental monitoring and bioremediation. J. Ind. Microbiol. Biotechnol. 43, 1345–1354 (2016).
Article CAS Google Scholar
Garris, H. W., Badlwin, S. A., Van Hamme, J. D., Gardner, W. C. & Fraser, L. H. Genomics to assist mine reclamation: a review. Restor. Ecol. 24, 165–173 (2016).
Article Google Scholar
Li, Y. et al. Ecological restoration alters microbial communities in mine tailings profiles. Sci. Rep 6, 25193 (2016).
Article ADS CAS Google Scholar
Thavamani, P. et al. Microbes from mined sites: Harnessing their potential for reclamation of derelict mine sites. Environ. Pollut. 230, 495–505 (2017).
Article CAS Google Scholar
Yellishetty, M. & Mudd, G. M. Substance flow analysis of steel and long term sustainability of iron ore resources in Australia, Brazil, China and India. J. Clean. Prod. 84, 400–410 (2014).
Article Google Scholar
Jacobi, C. M., Do Carmo, F. F., Vincent, R. C. & Stehmann, J. R. Plant communities on ironstone outcrops: A diverse and endangered Brazilian ecosystem. Biodivers. Conserv. 16, 2185–2200 (2007).
Article Google Scholar
Schaefer, C. E. G. R. et al. In (ed. Fernandes, G. W. ) 15–53 (Springer International Publishing 2016).
Costa, W. F., Ribeiro, M., Saraiva, A. M., Imperatriz-Fonseca, V. L. & Giannini, T. C. Bat diversity in Carajás National Forest (Eastern Amazon) and potential impacts on ecosystem services under climate change. Biol. Conserv. 218, 200–210 (2018).
Article Google Scholar
Jaffé, R. et al. Reconciling Mining with the Conservation of Cave Biodiversity: A Quantitative Baseline to Help Establish Conservation Priorities. PLoS One 11, e0168348 (2016).
Article Google Scholar
Mota, N. F. de O., Silva, L. V. C., Martins, F. D., Viana, P. L. in Geossistemas ferruGinosos do brasil Áreas prioritárias para conservação da diversidade geológica e biológica, patrimônio cultural e serviços ambientais( eds. Carmo, F. F. do & Kamino, L. H. Y. ) 289–315 (2015).
Silveira, F. A. O. et al. Ecology and evolution of plant diversity in the endangered campo rupestre: a neglected conservation priority. Plant Soil 403, 129–152 (2016).
Article CAS Google Scholar
Caldeira, C. F. et al. Sustainability of Jaborandi in the eastern Brazilian Amazon. Perspect. Ecol. Conserv 15, 161–171 (2017).
Google Scholar
Schettini, A. T. et al. Exploring Al, Mn and Fe phytoextraction in 27 ferruginous rocky outcrops plant species. Flora 238, 175–182 (2018).
Article Google Scholar
Jacobi, C. M., do Carmo, F. F. & de Campos, I. C. Soaring Extinction Threats to Endemic Plants in Brazilian Metal-Rich Regions. Ambio 40, 540–543 (2011).
Article Google Scholar
Resende, F. M., Fernandes, G. W. & Coelho, M. S. Economic valuation of plant diversity storage service provided by Brazilian rupestrian grassland ecosystems. Brazilian Journal of Biology 73, 709–716 (2013).
Article CAS Google Scholar
Skirycz, A. et al. Canga biodiversity, a matter of mining. Front. Plant Sci. 5, 1–9 (2014).
Article Google Scholar
Wang, K., Lin, Z. & Zhang, R. Impact of phosphate mining and separation of mined materials on the hydrology and water environment of the Huangbai River basin, China. Sci. Total Environ. 543, 347–356 (2016).
Article ADS CAS Google Scholar
Soriano, B. M. A. in Zoneamento ambiental da borda oeste do Pantanal: maciço do Urucum e Adjacências 211 (2000).
Urbanetz, C., Lehn, C. R., Salis, S. M. & Bueno, M. L. Composição E Distribuição De Espécies Arbóreas Em Gradiente Altitudinal, Morraria do Urucum, Brasil, Oecologia Australis 16, 859–877 (2012).
Article Google Scholar
Anjos, C. E. & Okida, R. in Zoneamento ambiental da borda oeste do Pantanal: maciço do Urucum e Adjacências 47–54 (2000).
Teixeira, P. C., Donagema, G. K., Fontana, A. & Texeira, W. G. M. (Eds.). Manual de Métodos de Análise de Solo. 3rd. (Embrapa, 2017).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Article CAS Google Scholar
Mitchell, A. L. et al. EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res 46, D726–D735 (2018).
Article CAS Google Scholar
Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
Article ADS CAS Google Scholar
Rodriguez-R, L. M., Gunturu, S., Tiedje, J. M., Cole, J. R. & Konstantinidis, K. T. Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity. mSystems 3, e00039–18 (2018).
Article Google Scholar
Oksanen, J. et al. vegan: Community Ecology Package. R package https://cran.r-project.org/package=vegan (2017).
Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006).
Article CAS Google Scholar
R Core Team. R: A language and environment for statistical computing (2017).
Shimodaira, H. Approximately Unbiased Tests of Regions Using Multistep-Multiscale Bootstrap Resampling. Ann. Stat 32, 2616–2641 (2004).
Article MathSciNet MATH Google Scholar
Santos, V. C. A., Correa, L., Meiguins, B., Oliveira, G. & Alves, R. Metagenomics-based signature pattern cluster and interactive visualization analysis. IEEE IJCNN Conf. Proc (2018).
Olson, N. D. et al. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief. Bioinform bbx098–bbx098 (2017).

Data Citations

NCBI Sequence Read Archive SRP153411 (2018)
Gastauer, M. et al. Open Science Framework https://doi.org/10.17605/OSF.IO/EZNMH (2018)

Download references

Acknowledgements

We are grateful for Felipe Tadashi Asoa Coelho and Rosilene Silva from DIFC/Vale for their support during the field campaign. The authors would like to thank Vale for the financial support. SJR and GO are grateful for CNPq supported productivity scholar grants.

Author information

Authors and Affiliations

Instituto Tecnológico Vale, Rua Boaventura da Silva, 955, Bairro: Nazaré, CEP 66055-090, Belém, PA, Brazil
Markus Gastauer, Mabel Patricia Ortiz Vera, Kleber Padovani de Souza, Eder Soares Pires, Ronnie Alves, Cecílio Frois Caldeira, Silvio Junio Ramos & Guilherme Oliveira
Universidade Federal do Pará, Programa de Pós-Graduação em Genética e Biologia Molecular, Rua Augusto Corrêa, 01, Guamá, CEP 66075-110, Belém, PA, Brazil
Mabel Patricia Ortiz Vera
Universidade Federal do Pará, Programa de Pós-Graduação em Ciência da Computação, Rua Augusto Corrêa, 01, Guamá, CEP 66075-110, Belém, PA, Brazil
Kleber Padovani de Souza & Ronnie Alves

Authors

Markus Gastauer
View author publications
You can also search for this author in PubMed Google Scholar
Mabel Patricia Ortiz Vera
View author publications
You can also search for this author in PubMed Google Scholar
Kleber Padovani de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Eder Soares Pires
View author publications
You can also search for this author in PubMed Google Scholar
Ronnie Alves
View author publications
You can also search for this author in PubMed Google Scholar
Cecílio Frois Caldeira
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Junio Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.G., C.F.C., S.J.R. and G.O. designed this study. M.G. collected the samples, E.S.P. extracted, amplified and sequenced the metagenomic libraries, M.P.O.V. elaborated the libraries and assembled the contigs with important contributions from K.P.S. K.P.S., M.P.O.V., and R.A. carried out taxonomic and functional classification as well as data validation. M.G. wrote the paper; all authors contributed to the final version of this paper.

Corresponding author

Correspondence to Guilherme Oliveira.

Ethics declarations

Competing interests

Vale provided financial support for the research activities but did not have any additional role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript. No environmental licensing activities used results of this work. There are no patents, products in development or marketed products to declare.

ISA-Tab metadata

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

Reprints and permissions

About this article

Cite this article

Gastauer, M., Vera, M., de Souza, K. et al. A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining. Sci Data 6, 190008 (2019). https://doi.org/10.1038/sdata.2019.8

Download citation

Received: 22 August 2018
Accepted: 11 December 2018
Published: 12 February 2019
DOI: https://doi.org/10.1038/sdata.2019.8

This article is cited by

Genetic diversity of Actinidia spp. shapes the oomycete pattern associated with Kiwifruit Vine Decline Syndrome (KVDS)
- Giovanni Mian
- Guido Cipriani
- Paolo Ermacora
Scientific Reports (2023)
De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee
- Yunxi Liu
- R. A. Leo Elworth
- Todd J. Treangen
Nature Communications (2022)
Enzymatic Cleavage of 3’-Esterified Nucleotides Enables a Long, Continuous DNA Synthesis
- Shiuan-Woei LinWu
- Ting-Yueh Tsai
- Cheng-Yao Chen
Scientific Reports (2020)
Contrasting soil fungal communities at different habitats in a revegetated copper mine wasteland
- Jie-liang Liang
- Jun Liu
- Jin-tian Li
Soil Ecology Letters (2020)