Introduction

Soil bacteria, involved in the decomposition of organic residues, nutrient cycling, aggregation, and formation of humic substances, can reach up to 10 billion cells per gram1,2,3. However, little was known in the past about the composition and diversity of bacterial communities until culture-independent community-profiling approaches became available4,5,6. While only 1% to 10% of the bacteria present in a soil sample can be readily cultured under laboratory conditions, metagenomic methods are potentially suitable to distinguish all the species of a natural microbial community by the detection of specific DNA sequences7. The analysis of ribosomal RNA genes (rRNA), by using the latest advances in next-generation sequencing (NGS) technology, has shown that soil bacterial diversity in a natural setting is extremely high8,9,10,11,12, but the mechanisms controlling such diversity are poorly understood13.

Although a plethora of research has documented the influence of soil on the bacterial genotypic diversity, most of these studies have been concerned with single characteristics such as pH14,15,16, texture17, or pollutants18,19,20, whereas the joint influence of factors that are lumped in a soil survey21 remains unclear. Most soil microbiology studies have hardly reported pedological information about profile, typology, or genesis of soils from which the samples are taken22. However, laboratory tests have shown that the distribution of microbial taxa could be locally adapted to soil23. Likewise, several field studies have recently found significant differences in bacterial diversity between agricultural soils located in northern and southern China9, among six forest soils of North and Central America ranging from Colorado to Panama13, and at 12 sampling points between 1500 and 2600 m a.s.l. in the Pyrenees range24. Although these studies have sought to demonstrate the effect of historical contingencies and climatic gradients, examining the different soil-formation environments within each area according to its latitudinal and elevational variation, the inherent difference between soils may also have contributed to the genetic divergence of bacterial assemblages. Indeed, in cases when the effects of land use on distribution patterns of microbial communities have been evaluated12,25,26, the researchers have compared areas within the same soil typology. Given the complex interrelationships among soil microbes and site conditions, it was found to be unlikely that their variation stemmed from a single influence or a small set of factors. Everything suggests that bacteria are adapted to the soil they inhabit.

Accordingly, an improved understanding of bacterial relationships with the soil, considering the latter as an entity resulting from the interaction of environmental factors (parent material, vegetation, topography, and climate) over time, may help solve some outstanding challenges in soil microbiology. Such a pedological approach could be useful in studies aimed at establishing unifying principles in soil microbial ecology27, to determine the response of microbial communities to long-term environmental change28, and to clarify the relative significance of chance, environmental factors, and past legacies on prokaryotic biogeography and diversification29. However, the metagenomic bacterial exploration linked to a pedological study is still uncommon and the role of pedogenesis in shaping the bacterial community structure remains unknown.

To address this issue, we undertook a local-scale survey of soils which are widespread and have ecological significance within a Mediterranean mountain area on limestone. Here, topography controls microclimate, vegetation, water redistribution, and surface stability, which in turn leads to the formation of soils with different development. We searched for soils representing a pedogenic gradient in the Mediterranean region. To minimize the factors of biological variation due to distance, we chose soils with different genesis and development but also as close together as possible. Our study was aimed at analysing the profile, typology, and physicochemical parameters of each soil, as well as bacterial genotypic diversity in order to help elucidate the influence of pedogenesis on the bacterial assemblages.

Results

Soil development and quality

We studied eight soils on limestone but each with different vegetation, altitude, slope, and topographic orientation (Supplementary Table S1) in the Sierra de María Natural Park (SE Spain, Fig. 1). The calcimorphic features of the accumulation of secondary carbonate such as filaments, nodules, and coatings were common in the profiles. The soil pH ranged between slightly and moderately alkaline (7.5–8.5) and the contents of equivalent CaCO3 usually increased towards the profile bottom (Table 1). The two least carbonate soils (S7 and S8) had horizon sequences Ah-Bt-R and Ap-Btk-Ck, reddish colours in accord with the amount of dithionite extractable iron (Fed), a huge amount of clay, and strong angular-blocky aggregates. The Harden’s profile development index, which quantifies the degree to which the soil-horizon properties differ from the parent-material properties (C or R layer), was 25.2 for S7 and 25.7 for S8, both soils, respectively, classified as Leptic Luvisol and Luvic Calcisol21. They are broadly called red Mediterranean soils including a typical terra rossa on hard limestone (S7) and a red soil developed on Quaternary colluvial glacis later degraded by tillage (S8). Because their pedogenic processes including leaching of carbonate minerals, rubefaction, and clay accumulation require a long time period, they must be the oldest soils in the Sierra de María area. In other words, they had the greatest pedogenic change with respect to the soil parent material.

Figure 1
figure 1

Sampling strategy and location. At each site (S1 to S8), we surveyed a soil profile (red points) and four composite topsoil samples were taken at 20 m from the soil profile (blue points). Each topsoil sample was composited from nine samples of the top 20 cm of soil collected in a plot of 3 × 3 m. Maps were created using ArcGIS v.10.2 software (http://www.esri.com/arcgis/about-arcgis).

Table 1 Modal soil profiles of the pedogenic gradient studied in Sierra de María (SE Spain).

The soils S3, S4, S5, and S6 exhibited a Bw horizon. The formation of a Ah surface organic horizon was also an outstanding pedogenic feature in S3 and S6, two Calcic Chernozems located on north-facing slopes at about 1500 m a.s.l., and in S5, a Calcic Kastanozem in a south-west-facing slope at 990 m a.s.l. The profile development indices in S5 and S6 were akin to the soils having a Bt horizon, because the soil thickness (83–85 cm, Table 1) offsets the lesser change in the Bw characteristics with respect to the parent material. However, their lower redness and contents of clay and Fed reflected a lower degree of development than in soils with a Bt horizon.

Finally, the soils S1 (Rendzic Leptosol) and S2 (Calcic Kastanozem) had horizon sequences of less pedogenic development (Ah-R and Ah-AC-C) and the lowest profile development indices (14.5 and 4.9), in accord with a microclimate, vegetation, and topography more restrictive to soil formation. Their pedoenvironments were south-facing slopes with sparse scrubs, resulting in soils of little thickness, brown colours and low contents in clay, Fed, and organic C. The gain of organic matter in the A horizon was the only noticeable feature in the genesis of these soils, whereas the availability of water and erosion were the main constraints on pedogenesis. Thus, the soils ordered by their increasing profile-development index in Table 1 from S1 to S8 showed the studied pedogenic gradient. Consistent with an increase in weathering on the course of pedogenesis, the variation in clay and Fed contents of the subsurface horizon (AC or B) also allowed this gradient to be visualized and statistically represented (Supplementary Fig. S1a,b). However, the soil-profile development in S6 was due mainly to the accumulation of organic matter more than to weathering.

Table 2 lists the physicochemical indicators of soil quality determined in topsoil samples. Soil moisture, macroporosity, cation-exchange capacity, N, and K were positively correlated with organic carbon (organic C) content (r = 0.49 to 0.79, P < 0.05, n = 32). Soil moisture, available water, cation-exchange capacity, N, P, and K were also positively correlated with water-stable aggregates (r = 0.40 to 0.80). The pH and microporosity both correlated inversely with organic C and water-stable aggregates (r = −0.45 to −0.78). Consequently, the two latter parameters, also positively correlated with each other (r = 0.63, P < 0.0001, n = 32), these being the most informative of the soil-quality conditions. The amount of water-stable aggregates (mass of particles less than 250 µm in size forming stable aggregates) progressively increased in the soil sequence S1, S2, S4, S3, S5, S6, S7, and S8, and the organic C in a similar sequence although truncated in S4 and S8 (Supplementary Fig. S1c,d), suggesting that soil quality improved with increasing soil development but declined in organic C by the tillage.

Table 2 Soil quality along the pedogenic gradient of Sierra de María. Parameters measured (mean and standard deviation, n = 4) at the topsoil (20 cm).

Composition and structure of the soil bacterial communities

Sequencing of 16S rRNA gene (V4-V5) amplicons with Illumina MiSeq and paired-read merging resulted in a total of 1,696,973 raw sequences. After eliminating the nonaligned ones and chimeras, we had a final number of 951,483 sequences (Supplementary Table S2) with an average length of 408 bp. High-quality reads from each soil were subsampled to 13,386 sequences (the lowest number obtained in sample S6B) prior to calculating alpha-diversity parameters. Rarefaction curves (Supplementary Fig. S2) did not show apparent saturation at the current surveying effort, which covered between 86% and 91% of the within-community (alpha) diversity (Supplementary Table S2). The indices Chao1, Inverse Simpson, Shannon, and Pielou tended to be lower in managed soils (S4 and S8) than in natural soils, but they were not related to the pedogenic gradient.

The RDP classifier identified 14 phyla, with Acidobacteria, Bacteroidetes, Proteobacteria, Actinobacteria, and Candidate division WPS-1 being the most abundant (Table 3). A principal component analysis accounted for 74.1% of the bacterial variation at the phylum level in a solution of four principal components (Supplementary Table S3). The phyla (variables) with the longest positive loadings on PC1 (Acidobacteria, Canditate division WPS-1, and Armatimonadetes) influenced the scores mostly of soils S1, S2, and S4 with their spatial variability (replicates A, B, C, and D in Fig. 2a), whereas the variables with negative loadings (Bacteroidetes, Proteobacteria, Actinobacteria, Planctomycetes, Verrucomicrobia, etc.) influenced mainly the scores of S7 and S8. The alignment of soil scores along PC1 in Fig. 2a indicated changes in the content of the above phyla across the pedogenic gradient, from S1, S2, and S4 (richer in the phyla represented on the right), to S7 and S8 (richer in the phyla represented on the left), going through S3, S5, and S6 (in the middle, with intermediate contents). The progressive increase in the phyla represented on the right led to a decrease in those on the left and vice versa (Table 3). Likewise, the phyla with loadings of opposite signs (inverse correlation) in PC2 (Fig. 2a) showed differences between the soil group, S1, S2, S4, and S8, and the group S3, S5, S6, and S7. Two statistical comparison tests confirmed significant differences in the bacterial community structure of soils at the phylum level (Table 3).

Table 3 Abundance of bacterial phyla in the soils S1 to S8 (mean percentage and standard deviation, n = 4).
Figure 2
figure 2

Principal component analysis. Biplots for the relative abundance of bacterial phyla (a), and selected bacterial phyla, organic C, and water-stable aggregates (b). The black markers are the soil scores S1 to S8 (four spatial replicates A, B, C, and D in (a), and mean values in (b)) and the blue vectors represent the loadings of variables.

A total of 139 taxa with relative abundance higher than 0.05% were identified at the lowest classification level (subgroup to genus). A principal component structure of fan-shaped vectors (variables) pointing to different soil scores (Supplementary Fig. S3) showed changes in the bacterial communities of soils. Acidobacteria Gp-4 and Armatimonadetes_gp4 were especially abundant in S1, S2, and S4; Nocardioides and Terrimonas progressively increased from S3 to S5, S6, and S7; whereas Solirubrobacter and Microvirga prevailed in S8. The statistical testing comparing soils revealed homoscedasticity (Levene’s test > 1.33, P > 0.265) in the first three variables with significantly different means (Supplementary Fig. S4), whereas the last three displayed significant differences in the medians (Kruskal-Wallis H test > 20.55, P < 0.004). Closer vectors in Supplementary Fig. S3 also identified other bacterial taxa strongly and positively correlated in abundance with each one of the above and, therefore, with a tendency to appear together in the same soil community as bacterial consortia (Supplementary Table S4).

Relationships between bacteria and soil-quality parameters

Some bacteria such as Cyanobacteria, Firmicutes, and Gemmatimonadetes hardly correlated with the soil-quality parameters. However, a number of correlation coefficients in Candidate division WPS-1 and Armatimonadetes (Table 4) indicated that they tend to be more abundant with decreasing soil moisture, water-stable aggregates, macroporosity (increased microporosity), organic C, CEC, N, and K. In other words, these bacteria predominated in the soils of lower quality, also having the most alkaline pH. Just the opposite soil conditions were related to Planctomycetes. Acidobacteria and Actinobacteria also showed opposite behaviour (coefficients of the opposite sign) with respect to the amount of water-stable aggregates, P, and K. This was consistent with the strong inverse correlation between Acidobacteria and Actinobacteria (r = −0.82, P < 0.0001, n = 32).

Table 4 Pearson’s correlation coefficients (P < 0.05, n = 32) between soil-quality parametersa and abundance of bacteria representative of the different soil communities.

Integrating these correlations into a principal component analysis with bacterial phyla representative of the different communities and the two most informative soil-quality parameters (water-stable aggregates and organic C), the PC1 showed that the decrease in Armatimonadetes, Candidate division WPS-1, and Acidobacteria from S1, S2, and S4 to the remaining soils, and the consequent relative increase in other phyla such as Proteobacteria, Actinobacteria, and Bacteroidetes, proved to be related mainly to the increase in water-stable aggregates (Fig. 2b). This discrimination of soil bacterial communities was also related to the organic C content, which had a certain loading on PC1. On the contrary, the separation of soil bacterial communities along the PC2 appeared to be related only to the organic C content, which was greater in S7 than S8 and also greater in S2 than S4.

Several taxa of lower taxonomic categories also correlated with soil-quality parameters (Table 4), which may be indicating specific soil requirements for certain bacterial taxa. Thus, Acidobacteria Gp4 and Armatimonadetes_gp4, being representative of the bacterial consortia in S1, S2, and S4, were correlated to lower contents of water-stable aggregates and organic C, as well as unfavourable conditions of other soil parameters co-varying with them. Although Solirubrobacter, Microvirga, Nocardioides, and Terrimonas flourished mainly in soils having more stable aggregates, the two latter genera were also correlated with a lower pH and higher organic C content. Some relationships can be fit to regression models (Fig. 3), indicating that the influence of soil-quality parameters on the composition of bacterial populations is highly significant. Usually, the abundance of each taxon depended on several soil variables (multiple regressions) and not a single parameter. Like representative taxa of the different soil bacteria populations (Armatimonadetes_gp4 and Terrimonas), other consortiated taxa (Acidobacteria Gp3 and Segetibacter, Supplementary Table S4) also varied in abundance depending on soil-quality status.

Figure 3
figure 3

Multiple linear-regression models. Observed abundance (%) of Armatimonadetes_gp4 (A_gp4), Terrimonas (Te), Acidobacteria Gp3 (Ac_Gp3), and Segetibacter (Se) versus predicted abundance from physicochemical soil-quality parameters (Table 2). The entry of variables into the models by forward stepwise analysis was controlled by an F-ratio criterion of 4.

Discussion

The statistical analysis of the sequencing data showed variations in the distribution patterns of bacteria across the soils. The soils S1 and S2 were found to be richer in Acidobacteria, Candidate division WPS-1, and Armatimonadetes, while soils S7 and S8 had a significantly greater content in Actinobacteria, Proteobacteria, and Bacteroidetes (Fig. 2a, Table 3). Between these two soil groups, another with S3, S5, and S6 showed an intermediate bacterial composition. The soil S4 had high abundance in both Acidobacteria and Actinobacteria. Therefore, although it is postulated that microorganisms are globally dispersed and the soil environment is constantly invaded by unspecialized taxa, which having a high functional redundancy30 are randomly selected or acquiring the necessary functionalities by horizontal gene transfer29,31,32, the different bacterial community structures in our soils ordered along a pedogenic gradient could hardly be attributed exclusively to random.

The acidobacterial population decreased with increasing contents of water-stable aggregates, organic C, soil moisture, and nutrients (Table 4, Fig. 2b), i.e. when the soil-quality conditions improved. Unlike the work by Liu et al.25 in Mollisols, the abundance of Acidobacteria in our Mediterranean soils did not increase with organic C content. However, in agreement with these authors, different phylotypes were present in soils with different pH values (inversely related to organic C content). In particular, the changes in the bacterial consortia from Gp3 and Gp4 to Gp5, Gp10, and Gp18 (Supplementary Table S4) occurred as the soil became less alkaline and its content of organic C increased. With an improvement in soil quality, Actinobacteria, Proteobacteria, and Bacteroidetes flourished, the former being especially well established in the red soils S7 and S8, as well as being richer in water-stable aggregates cemented by organic C and/or secondary Fe oxides. All the above, together with the relations of Acidobacteria Gp4, Armatimonadetes_gp4, Solirubrobacter, Nocardioides, Terrimonas, and Microvirga with physicochemical parameters (Table 4) indicates that the soil bacterial community parallels the soil quality status. The abundance of certain bacteria depends on a combination of physicochemical parameters rather than a single parameter (Fig. 3), which means that it is influenced by the soil as a whole.

Two major latent factors, statistically referred to as dimensions or components, accounted for between 50% and 80% of the total bacterial variability in the eight soils (Fig. 2 and Supplementary Fig. S3). According to the distribution of soil scores, the first factor at the phylum level separated the poorly developed soils (S1 and S2) from the well-developed (S7 and S8), and between them the remaining soils of intermediate development (S3, S5, and S6), as reflected by their morphological and analytical characteristics including the profile development indices (Table 1). Thus, soil development could be a major driver in shaping the soil bacterial communities. Changes in the metagenomic abundance of bacteria along the studied pedogenic gradient can be explained pedologically in two senses. Firstly, this is because the soil is the result of the environmental factors: parent material, vegetation, topography, and (micro)climate33, and, secondly, because the development of soil changes the physicochemical parameters of inherent soil quality34. Environmental and physicochemical influences on the bacterial diversity have been reported from the earliest9,14 to the most recent studies13,24, but never before has the integrative influence of both been shown through soil development. In addition, this interpretation is consistent with the hypothesis that a large part of the genetic divergence of microbial assemblages can result from environmental factors29. Soil genesis summarizes these factors.

Bacterial communities linked to the most developed soils (communities represented by Terrimonas, Nocardioides, Solirubrobacter, and Microvirga in the Supplementary Table S4) also had a greater number of bacterial taxa positively correlated to each other than in the poorly developed soil profiles (represented by Acidobacteria Gp4 and Armatimonadetes_gp4). This indicates a greater interrelation within the bacterial assemblages of well-developed soils, suggesting that it is not due to chance, but presumably the result of their stabilization and adaptation through time. In addition, the preferential accumulation of Solirubrobacter and its consortium Aciditerrimonas in the most developed soils (Supplementary Table S4), which have been previously found, respectively, in reddish Saharan dust intrusions reaching Europe35 and linked to ferrous-ferric redox reactions36, may be connected with the partially aeolian origin proposed for red Mediterranean soils and their typical rubefaction process37. In this way, well-developed soils may help us to recognize the lingering effects of past evolutionary and ecological events on bacterial diversity.

A second latent factor (PC2 in Fig. 2) that at the phylum level separated the bacterial communities of forest soils S3, S5, S6, and S7 from the rest S1, S2, S4, and S8 appeared to be related in part to land use and management. Tillage has been identified as a major cause of soil compaction, loss of organic matter, and re-carbonation38,39; in addition to disturbing microbial communities12,26,40. Tillage can therefore explain the separation of the soils S4 (reforestation) and S8 (agriculture) from the rest by the second factor, but not that of native soils S1 and S2. However, S1, S2, S4, and S8 shared a low organic C content, which may be the main reason for this factor (Fig. 2), regardless of whether the cause is natural (scant soil development) or anthropogenic (soil degradation). This organic factor was the first to distinguish soil bacterial communities at taxonomic levels lower than phylum (Supplementary Fig. S3), whereas exclusively pedogenic effects still linger in the soil communities as a second factor.

The soil relationships at the phylum level could be indicative of an apparent evolutionary DNA sequence stability in the bacteria from our soils because of intermittent periods of dominance28. The stages of soil formation and degradation could be interpreted as stressed situations for the bacteria, which must be adapted to new environmental and soil-quality conditions. This stability could promote the abundance of certain phylogenetic groups according to the pedogenic environment, although we cannot disregard the presence of any bacterial taxon in a system as diversified as the soil41.

In conclusion, we have shown differences in bacterial community structures along the studied pedogenic gradient related to soil development and quality. Consequently, soil bacteria change with pedogenesis, which in turn depends on environmental factors and time. Thus, soil is a window into the evolutionary and environmental history of the soil bacterial communities.

Methods

Setting

Sierra de María was part of a marine carbonate platform along the palaeomargin of Iberian Massif, subsequently affected by the Alpine folding, and today included among the eastern mountains of the External Zones of the Betic Cordilleras in the Mediterranean region. Across an elevational range from 800 m to 2045 m and a substrate of limestone, the landscape consists of forestlands with extensive natural tree masses (Pinus halepensis Mill., Quercus ilex L., Pinus nigra Arnold), areas of degraded shrubs (Quercus coccifera L., Juniperus phoenicea L., Vella spinosa Boiss), and some reforested pine stands (Pinus halepensis Mill.). In addition, there are extensive rock outcrops and rangelands with scrubs (e.g. Stipa tenacissima L., Lygeum spartum L., and Rosmarinus officinalis L.) besides some patches of croplands in the low parts of slopes and valleys with cereal, almond trees, and olive trees. The climate is Mediterranean, somewhat semi-arid, and with an elevational gradient of mean annual rainfall and temperature, ranging between 250 mm and 17 °C in the lowest point, and 600 mm and 6 °C at the highest peak.

Soil sampling

Following previous soil surveys42, a field judgement sampling was performed in order to select sites of the climax pedoenvironments with well-developed soils such as Luvisols, Chernozems, and Kastanozems as well as sites of the young and poorly developed soil units with Calcisols and Leptosols. Finally, we selected eight sites within a maximum distance of about 20 km, including natural and managed soils. At each site a soil pit was dug to inspect the soil profile and carry out a discrete depth sampling by natural horizons. In addition, after removing the litter O layer, we collected composite topsoil samples in four plots of 3 × 3 m laid out in a cross pattern at a distance of 20 m from the soil profile (Fig. 1). Each consisted of a bulk soil material (250 cm3) of the upper 20 cm from each corner, from the midpoint of the sides, and from the centre of the square plot, which then were thoroughly mixed to make a composite topsoil sample. A subsample was transported to the laboratory in isothermal bags, sieved through a 4 mm screen, and stored in polythene containers at −80 °C for subsequent next-generation sequencing (NGS) analysis. The other topsoil subsample and each of the soil-horizon samples were air dried, crumbled, and sieved to 2 mm for subsequent physicochemical analyses. We also took intact cores in stainless-steel cylinders to measure soil bulk density and moisture with respect to dry soil weight at 105 °C. Samplings were taken in early June as the period of highest biological activity after the spring rains.

Physicochemical analysis

Soil structure and consistency were determined according to FAO guidelines for soil description43 and the Munsell colour measured with a Konica Minolta CM-2600d spectrophotometer (Minolta Co. Tokyo). In fine earth (<2 mm) and following standard procedures44,45, we analysed the particle-size distribution by sieving (sand) and by the pipette method (silt and clay) after removing organic matter with H2O2 and dispersion with sodium hexametaphosphate. In addition, the soil-water release at −33 kPa and −1500 kPa was measured on a Richard’s membrane, particle density with a pycnometer, and water-stable aggregates using a wet sieving apparatus with a mesh size of 250 µm (Eijkelkamp Co., Giesbeek, The Netherlands). We also determined the pH by potentiometry in a 1:1 soil:water suspension, cation-exchange capacity and exchangeable bases using ammonium and sodium displacement solutions, as well as the contents of organic C by dichromate oxidation, total N by the Kjeldhal method, available P by ammonium acetate extraction followed by colorimetry, equivalent CaCO3 with a Bernard’s calcimeter, and dithionite-extractable Fe as described in Mehra and Jackson (1960)46.

Using the data compiled from the above analyses, we calculated the Harden’s profile development index47 and, on a fine-earth volume basis in the top 20 cm of soil48,49, the contents of available water, water-stable aggregates, organic C, total N, extractable P, exchangeable K, total porosity from the particle and bulk density, microporosity estimated as water volume at field capacity (−33 kPa), and macroporosity from total porosity less microporosity.

DNA extraction and PCR amplification

Total genomic DNA was extracted from 0.25 g of each individual topsoil sample following the manufacturer’s protocol of the PowerSoil™ DNA Extraction Kit (MoBio Laboratories Inc., Carlsbad, CA, USA). DNA size and quality was checked by electrophoresis in 1.5% (w/v) agarose gel. Ten ng of total DNA were used as a template for the PCR amplification of the 16S rRNA gene hypervariable V4-V5 regions in the topsoil samples. PCR amplification conditions consisted in 25 cycles using 55 °C as annealing temperature as described by Aguirre-Garrido et al.50.

Illumina-Sequencing of 16S V4-V5 amplicons

PCR amplification products of the V4-V5 variable regions of the 16S rRNA gene were obtained using fusion universal primers 515 F (Illumina adaptors + 5′GTGYCAGCMGCCGCGGTAA3′) and 926 R (Illumina adaptors + 5′CCGYCAATTYMTTTRAGTTT3′). Amplicon multiplexing and sequencing was carried out with a dual indexing tag-tailed design using 8nt indices from Nextera XT Index Kit v2 (Illumina, San Diego, CA, USA). Paired-end sequencing of 16S PCR amplicon libraries was performed using the Illumina MiSeq instrument with 300 + 300 v3 kit chemistry at Centre for Comparative Genomics and Evolutionary Bioinformatics (CGEB)-Dalhousie University, Canada.

Bioinformatic analysis

The 16S rRNA data were processed with MOTHUR software v. 1.39.551. following the MiSeq SOP52. Once demultiplexing with a 1 bp mismatch in the barcodes and 2 pb mismatches in the primer, chimeric reads were identified and excluded using Chimera UCHIME53. Diversity was examined by operational taxonomic units (OTUs) considering quality reads at 3% dissimilarity and the distance-based greedy clustering algorithm (dgc), as well as rarefaction curves computed at 97% similarity with the Mothur’s alpha diversity pipeline. The number of observed OTUs and the indices Chao1 richness, Inverse Simpson diversity, Shannon diversity, and Pielou evenness were finally calculated by the equations described and implemented in Mothur. Finally, we determined the composition of bacterial communities with the RDP Bayesian classifier Trainset 14, fixrank classification54. Only the sequences that could be classified at the lower classification level (subgroup to genus) were used for further analysis. Abundance was expressed as a percentage with respect to the total number of sequences in each sample. Taxa with relative abundance higher than 0.05% were retained for statistical analysis.

Statistical analysis

Principal component analysis performed with Statgraphic Centurion XVI (Statpoint Technologies, Inc., Warrenton), was applied to the data in order to reduce a large amount of information to a small number of orthogonal dimensions in such a way that they account for as much variation of the data set as possible55. This analysis transforms a number of possibly correlated variables (e.g. bacteria taxa) into a limited number of uncorrelated variables called principal components, which are linear combinations of the original variables. The resulting equation with the loadings (coefficients) of variables in each component allows calculating a score per soil sample. In graphical terms, a variable is denoted by a vector, while a soil score is represented by a point. Correlation and regression analysis were also occasionally applied to the data. Finally, we compared the data of each of the eight soils (4 replicates) on each variable using one-way ANOVA. The F-ratio determined whether there were significant differences between means and the multiple-range test of Fisher’s least significant differences showed which means were significantly different from others. Alternatively, when the assumptions for ANOVA were not met (like the assumption of homoscedasticity), we used the Kruskal-Wallis test determining whether the medians differ according to the H statistic56.

Data availability

Sequence data were deposited in the Sequence Read Archive (SRA) of the National Centre for Biotechnology Information (NCBI) under the bioproject number PRJNA414389.