Soil parameters, land use, and geographical distance drive soil bacterial communities along a European transect

To better understand the relationship between soil bacterial communities, soil physicochemical properties, land use and geographical distance, we considered for the first time ever a European transect running from Sweden down to Portugal and from France to Slovenia. We investigated 71 sites based on their range of variation in soil properties (pH, texture and organic matter), climatic conditions (Atlantic, alpine, boreal, continental, Mediterranean) and land uses (arable, forest and grassland). 16S rRNA gene amplicon pyrosequencing revealed that bacterial communities highly varied in diversity, richness, and structure according to environmental factors. At the European scale, taxa area relationship (TAR) was significant, supporting spatial structuration of bacterial communities. Spatial variations in community diversity and structure were mainly driven by soil physicochemical parameters. Within soil clusters (k-means approach) corresponding to similar edaphic and climatic properties, but to multiple land uses, land use was a major driver of the bacterial communities. Our analyses identified specific indicators of land use (arable, forest, grasslands) or soil conditions (pH, organic C, texture). These findings provide unprecedented information on soil bacterial communities at the European scale and on the drivers involved; possible applications for sustainable soil management are discussed.

SCiEnTifiC REPORTS | (2019) 9:605 | DOI: 10.1038/s41598-018-36867-2 use on microbial communities specific to the soil type. Furthermore, specific indicators of land use vs. soil type are needed. So far, most of the biogeographic surveys have focused on broad-scale patterns and drivers, and few attempts have been made to disentangle the effects of different land use on similar soils, and vice versa. A key issue with broad landscape surveys is that human land management is strongly determined by the prevailing soil and climatic properties of the environment. With the onset of agricultural development, human populations have shaped their environment locally by converting the most fertile and easily manageable soils to agriculture and grassland, leaving infertile and rocky soils to forest, heathland and bog 26,27 . At a small scale, this phenomenon confounds gradients of soil properties with land management, masking the primary drivers of soil communities. At a larger spatial scale, these parallel events have led to the distribution of given land uses on very different soil types, although certain soil types are preferentially used for arable land, grassland or forest. For example, grassland tends to occur across a broader range of soil conditions than arable land. Forests are usually developed on nutrient-poor soils, which vary from very acidic to neutral pH values. This study addresses these shortcomings by specifically determining the relative effects of land use and soil properties on bacterial diversity and structure in a large number of soil samples collected and characterised across the first European transect for soil microbes. This transect ran from Sweden down to Portugal and from France to Slovenia. The distribution of different land uses on various soil conditions across this transect allowed us to test the respective effects of these two categories of drivers on soil bacterial communities. The sampled sites represented three major land uses (arable, forest, grassland) across a range of soils chosen for their contrasting pH values, textures, and organic matter contents, and covering a broad range of climatic conditions. We analysed these soil samples for their physicochemical properties, and for their bacterial diversity and community composition by 16S rRNA gene amplicon pyrosequencing. We used a k-mean classification analysis to group the soils on the basis of their physicochemical properties and land use into homogeneous clusters, and then test if a potential land use effect was observable in these clusters. Finally, we performed an indicator species analysis to identify the genera indicative of particular soil conditions and/or land uses.

Results and Discussion
Soil properties. According to physicochemical analyses, the 71 soil samples collected across Europe differed strongly in terms of pH, texture, density, and C or N contents (Table 1). pH ranged from 3.7 to 8.2, textures varied from very fine (70% clay in a French arable soil) to coarse (94% sand in a Danish grassland), and organic C content ranged from 0.4% (arable soil, France) to 33% (forest soil, Sweden). The sites were all distinguishable from one another based on soil properties. Studies on the effect of land use or soil properties on bacterial communities had already been performed at a continental scale 7,9,15 , but our study characterises bacterial communities at the European scale for the first time 28 . The sampling design was representative of the variations in soil properties, i.e. organic matter content, pH and texture, across different geographical/climatic zones in Europe, and it covered three land uses (arable, forestry, grassland, Table 1). This extensive sampling strategy was consistent in investigating different environmental characteristics regarding the multivariate analyses ( Fig. 1A-F). Such a strategy minimises possible biases related to land use. Highly fertile soils are indeed commonly dedicated to agriculture, while nutrient-poor and rocky soils are usually associated to forest as highlighted in previous studies 7,9,12,15 . Thus, our sampling strategy was adequate for disentangling the effects of land use from those of soil properties on bacterial diversity across Europe.
Diversity and composition of the bacterial communities. We obtained more than 3*10 6 16S rRNA sequences from the 71 samples we collected, and a total of 8,085 sequences per soil sample after bioinformatics processing. The rarefaction curves of bacterial OTUs followed a logarithmic model (data not shown) without reaching a plateau. We detected a total of 34,190 OTUs among all the soil samples. At the European scale, the number of OTUs per soil sample ranged from 653 to 1,860 (mean: 1,307 OTUs; Standard Deviation (sd): 244; Fig. 2A). The Shannon index ranged from 4.1 to 6.2 (mean: 5.5; sd: 0.4; Fig. 2B), and evenness from 0.62 to 0.82 (mean: 0.76; sd: 0.04; Fig. 2C). The number of detected OTUs ranged from 1,022 to 1,860, 653 to 1,437, and 987 to 1,722 in soils dedicated to arable, forest, and grassland uses, respectively. The Shannon index varied from 4.1 to 5.6, 4.9 to 6.0, and 5.3 to 6.2, in forests, grasslands, and arable lands, and evenness from 0.63 to 0.78, 0.71 to 0.81, and 0.76 to 0.82. Altogether, the variations of richness, evenness, and of the Shannon index were in the range of those observed in the literature for a similar sequencing effort [29][30][31][32] . Nevertheless, these indices revealed a non-significant trend of lower bacterial richness and diversity from arable to forest soils. This is not in agreement with the literature since other studies demonstrated that microbial indices were affected by land-use at the plot scale 33 , the landscape scale 34 , and at more global scales 29,31,35 . The much greater variability of soil physicochemical properties relatively to land use in our study may account for this discrepancy since soil physicochemical characteristics are known to frequently be the most influencing factor of soil microbial indices, followed by land use 29,31,34,36 .
We detected a total of 27 phyla after taxonomic assignment (Supplementary Table 1). For each sample representing a specific environmental condition (i.e. a combination of a given soil property and a land use), the most represented bacterial phyla were Proteobacteria (average relative abundance 56.4%), Actinobacteria (15.7%), and Acidobacteria (7.1%) (Supplementary Table 1). The dominance of these phyla is in agreement with Lauber et al. 10 and Delgado-Baquirezo et al. 30 . Other phyla, i.e. Bacteroidetes, Firmicutes, Planctomycetes, Chloroflexi, and Nitrospirae, were also represented (above 1% on average) but their relative abundance greatly varied according to the environmental conditions. We detected a total of 1,033 bacterial genera (data not shown), whose abundance varied according to environmental conditions from 157 genera in a Swedish forest soil, up to 397 in a Dutch arable soil. A total of 7.23% of the sequences remained unclassified at the genus level. the range of those reported in the literature 30,32,36 . The absence of a relationship between the number of genera and land use type may be related to the high number of genera represented at very low relative abundances, which we did not take in account. Similarly, Karimi et al. 36 identified a total of 1,355 genera out of which only 47 had a relative abundance higher than 0.5%. Similar trends have also been observed at a global scale 30,32 .
Spatial distance, soil properties and land use as drivers of bacterial communities. We performed further analyses to better evaluate the spatial structuration of soil bacterial communities and better specify the environmental and spatial drivers of the variations of bacterial communities across Europe. First, we evaluated TAR for soil bacterial communities at the European scale (Fig. 3). This relationship was significant since the similarity of bacterial community composition across sites significantly decreased with increasing geographical distance, as shown by a significant community composition turnover (z = 0.0532, b = −0.0034; r² = 0.07; P < 0.001).
Although the turnover intensity was low, its level was in the range (0.1 to 0.23) of those reported in several studies applying the same methodology at different spatial scales and in different environmental matrices [37][38][39][40][41][42] . Furthermore, even lower turnover intensity values, ranging from 0.006 to 0.05, have recently been reported at the scale of France 12,13,43 . Altogether, the significant TAR observed at the scale of Europe in our study supports that soil bacterial community composition is non-randomly distributed and spatially structured. This result is in agreement with other studies focusing on soil microbial biogeography 12,44,45 . This spatial structuration of soil bacterial communities may be related to environmental filters (i.e. environmental selection) and to limited dispersal. We applied two complementary approaches to better understand the factors involved in the spatial structuration of soil bacterial communities and supporting environmental selection and dispersal limitation. A descriptive approach based on NMDS applied on Unifrac distance (Supplementary Fig. 1) showed that pH (R = 0.87, P < 0.001, ANOSIM, 1,000 permutations), texture (R = 0.64, P < 0.001, ANOSIM, 1,000 permutations) and organic carbon content (R = 0.58, P < 0.001, ANOSIM) were major drivers of bacterial community structure. Other drivers such as the nitrogen content and CEC were strongly correlated to the organic carbon content; moreover, assimilable phosphorous was non significantly correlated to soil bacterial community variation (P > 0.24). Similarly, we recorded significant differences in the community structures of the three types of land use (grassland, arable, forest) (ANOSIM-land use: R = 0.32, P < 0.001). Even if soil properties and land use both had discriminating effects on soil bacterial communities, cross-effects must be considered together with the effect of distance between sites.
To disentangle potential cross-effects and take the effect of distance between sites into account, we applied a parsimonious variance partitioning approach on the three diversity indexes (richness, Shannon index, and evenness) and on soil bacterial community structure to determine the relative importance of soil properties and/ or land use and/or climate conditions and/or distance between sites on bacterial communities. This analysis notably showed that soil properties explained a large part of the variance of bacterial richness (48.4% of variance, P < 0.001); that effect was mostly related to pH, texture class and total carbon content, which accounted for 13.8%, 10.8%, and 6.9% of the variance of soil bacterial richness, respectively (Fig. 4). Other environmental parameters (land use, climate, and spatial autocorrelation) did not explain significant amounts of variance during model building, suggesting that they did not affect soil bacterial richness. We observed similar trends for the Shannon index and for evenness (63.2% and 61.7% of variance; respectively; P < 0.01) except that the texture  for evenness). A large portion of the variance still remains unexplained, suggesting that other environmental factors may contribute to structure bacterial communities, and confirming previous findings 15,31,43 . Conversely to diversity indices, soil bacterial community structure based on Unifrac distance was only slightly explained by environmental factors (17.3% of variance, P < 0.001). Among all the environmental factors we identified, the soil pH (3.8%) ranked first, followed by land use (3.3%) and the total carbon content (1.7%). This is in agreement with other studies of soil microbial biogeography 7,11,14,31,43,46,47 and is fairly related to the soil reactional conditions and trophic level and to biological interactions, e.g. between plants and soil bacteria. This study also identifies the climatic zone as a driving factor of soil bacterial community structure. Nevertheless, even if its marginal effect appeared to be substantial as compared to other variables, it was only very slightly significant (5.7%, P < 0.1). Its selection may be related to particular sites belonging to boreal and Mediterranean climatic zones located far from the rest of the transect. Surprisingly, longitude, latitude, or PCNMs (representing the neighbouring relationships between sites) were not selected when we built the model. This suggests that bacterial communities are not dispersal-limited at the scale of Europe. Nevertheless, other studies demonstrated that bacterial communities may be dispersal-limited at smaller spatial scales (France, Scotland) 12,13,44 . This discrepancy may be explained by differences in terms of sampling efforts.  Soil clustering and indicator species analysis. Each sampling site exhibited different soil properties and land uses, so we further explored the relative contribution of these two types of parameters to bacterial diversity. For this purpose, we clustered sites according to the homogeneity of their soil physicochemical characteristics, land use and climate, as assessed by the k-mean method. The k-mean approach allowed us to the delineate four clusters with contrasting soil properties and covering different land uses: the first cluster (cluster 1, n = 27 sampling sites) included the three land uses (arable, grass, and forestry) and encompassed soils with a low carbon and nitrogen content, an average pH of ca. 6 and a coarse texture; the second cluster (cluster 2, n = 24) also encompassed representative soils from the three land uses but they displayed distinct properties, with medium carbon and nitrogen contents, an average pH of ca. 6 and a medium to fine texture; the third cluster (cluster 3, n = 5) included mainly forest soils but also one grassland, and the soils exhibited high carbon and nitrogen contents, an acidic pH and an organic texture; finally the fourth cluster (cluster 4, n = 18) included all three land uses but only one forest site and encompassed soils with a medium to fine texture, a neutral pH, and a low carbon content ( Table 2). As suggested by Hermans et al. 48 and Banerjee et al. 49 specific soil bacterial genera may be identified as bioindicators of the different clusters. To test this hypothesis, we performed an indicator species analysis on the four clusters to determine whether specific genera were specifically associated with a single cluster, making them bioindicators. For each bacterial genus identified in this analysis, the relative abundance and the relative coverage of a cluster condition are presented in Table 2. For example, the Blastochloris genus, whose maximum abundance reached 18.6%, is represented in 76% of the samples of cluster 3, while the Pseudolabrys genus, whose maximum abundance reached 4.2%, is represented in 50% of the samples of cluster 4. Previous reports have shown that this approach can be successfully applied to identify microbial groups (i.e. fungi or bacteria), phyla, or even genera as environmental indicators 50,51 . We recorded a total of 118 genera with an average relative abundance above 0.1% ( Overall, we found higher numbers of indicator genera in the clusters with the lowest variety of land uses (clusters 3 and 4), and conversely lower numbers of indicator genera in the clusters encompassing more different land uses (clusters 1 and 2), suggesting that land use is a major driver of soil biodiversity 52 .
The bioindicator analysis also revealed that such an analysis must be conducted at a fine taxonomic level (i.e., the genus level). We indeed demonstrated that different indicator genera belonging to a same phylum could be indicators of different soil conditions (i.e. cluster type). For example, within the Actinobacteria phylum, the Actinospica genus was an indicator of cluster 3 and the Geodermatophilus genus an indicator of cluster 4 ( Table 2). Another important feature is related to the relative abundance of these indicator genera. Indicators were represented at high or low relative abundance levels (maximum relative abundance above 10% and below 0.5%, respectively) e.g. the Blastochloris, Holophaga or Steroidobacter genera, and Myxococcus or Collimonas, respectively. The latter would be good candidates as indicators of specific environmental conditions. The Collimonas genus was indeed more specifically found in cluster 3 (forest) confirming previous works done on this genus 18,25 . The Myxococcus genus was more specifically found in cluster 4 (arable land: 11, grassland: 4). Then we constructed models to refine and predict the contribution of the three environmental parameters -soil properties, land use, and climate -associated with these clusters for each indicator genus (Table 2). For example, the indicator genus Acidothermus in cluster 3 was mostly explained by soil pH (41.08%) and land use (5.79%), while the indicator genus Gaiella in cluster 4 was explained by the total carbon (29.09%) and nitrogen (16.7%) contents. Globally, a broad range of the bioindicators (40 out of 118) identified in this study were determined by the soil pH, confirming the importance of this parameter for soil biodiversity as reported in previous studies 7,9,11,14 , while C content 50 , climate zone, land use 15 and texture 13 explained the relative distribution of 30, 27, 25 and 12 other bioindicators, respectively. Only 12 bioindicator genera were explained specifically by land use exclusive of any other parameter ( Table 2): in cluster 1, Nitrospira (Betaproteobacteria); in cluster 2, Elusimicrobium (Elusimicrobia); in cluster 3, Skermanella (Alphaproteobacteria), Steroidobacter (Gammaproteobacteria), Novispirillum (Alphaproteobacteria), Herminiimonas (Betaproteobacteria), Holophaga (Acidobacteria), and Aquicella (Gammaproteobacteria); in cluster 4, Methylobacterium (Alphaproteobacteria), Thermoacetogenium (Firmicutes), Dehalococcoides (Chloroflexi), and Thermincola (Firmicutes). Nevertheless, 37 of the bacterial genera identified as indicators of clusters remained unexplained by any of the parameters tested (soil properties, land use, climate), e.g. Blastochloris (max. relative abundance of 18.65%) or Mycobacterium (max. relative abundance of 2.61%), Myxococcus (0.1%), or Collimonas (0.32%).
Land use effects within soil clusters. We further tested the relative impacts of land use on the bacterial communities under similar soil conditions using the land use distribution found in each cluster as generated by the k-mean classification. In contrast with the very small cluster 3 including only five sites (4 forests and 1 grassland) and cluster 4 that included only one forest site out of 18, clusters 1 and 2 encompassed the three land uses (i.e. forestry, arable land, and grassland). We therefore analysed clusters 1 and 2 to test whether variations of bacterial communities were mostly explained by land use or soil properties. We performed multivariate analyses independently, on the soil physicochemical properties and then on the distribution of the bacterial genera characterising the sites encompassed in clusters 1 and 2. The results showed that the soils were mostly grouped in terms of soil properties according to the land use, with a clear separation between forests and other land uses as confirmed by ANOSIM analysis (in cluster 1: R soil data = 0.60, P < 0.001; in cluster 2: R soil data = 0.68, P < 0.001) (Fig. 5). Furthermore, bacterial genera appeared to be distributed differentially in cluster 1 according to the land use, not to the soil properties (ANOSIM analysis R genus data = 0.524, P < 0.001); the major driving effect of the land use in this cluster was also supported by the absence of a significant correlation (P > 0.05) between soil parameters and the distribution of bacterial genera (Fig. 5). This was not the case in cluster 2 (ANOSIM analysis R genus data = 0.251, p = 0.055) that encompassed forest sites differing by their constitutive soil properties. Taken together, our data show that, under similar soil properties, land use is a major driver of bacterial community structure. Compared with previous reports 6,8,53,54 , this conclusion is strongly supported by the large variety of soil properties for each land use in contrast with previous studies that addressed the land use effect only on a given soil type [54][55][56] .

Conclusions
We analysed the effect of soil properties and land use on the soil bacterial communities using an unprecedented extensive European transect allowing us to sample soils representative of a range of soil properties (organic matter content, pH, texture) from different geographical and climatic zones across Europe. Our analysis generated a broad overview of the soil bacterial diversity, richness and composition across Europe, which is strongly complementary to other intensive approaches dedicated to specific countries in Europe (the United Kingdom 11 , France 31,36 and the Netherlands). Combining soil analyses with 16S rRNA gene amplicon pyrosequencing revealed that the soil bacterial communities are non-randomly distributed across Europe but determined by environmental factors. More specifically, soil properties, notably the pH and texture, appeared to be the main drivers of soil bacterial community composition and distribution at the European scale. However, under given soil properties corresponding to delineated clusters, the influence of land use was apparent. We identified bacterial genus indicators of soil physicochemical properties through a cluster analysis; some genera were land use specific within these clusters. Taken together our data provide an overview of the soil bacterial diversity across Europe and identify potential bacterial indicators of different soils and land uses. We confirmed at a large spatial (continental) scale the strong impact of soil properties on soil bacterial communities, but we also showed that under similar soil conditions, land use also contributed to their structure. These findings and the identification of indicators open up onto stimulating new prospects in terms of land management for monitoring soil biodiversity and ultimately expected services from soils.

Methods
Site description, soil properties and sampling strategy. Soil samples were collected across 71 European sites (Fig. 6) in 11 countries (5 in Denmark, 19 in France, 9 in Germany, 6 in Ireland, 4 in Italy, 2 in The Netherlands, 2 in Portugal, 4 in Slovenia, 4 in Sweden, 5 in Switzerland, and 11 in the United Kingdom), as described in Stone et al. 28 . The sampling sites were identified on the basis of EFSA spatial legacy data (Version 1.0) provided by the Joint Research Centre, of the European Commission 57 . We chose four parameter maps among the 52 spatial layers available: topsoil organic matter; topsoil pH in water; topsoil texture class; and EFSA Corine land cover data. The sites encompassed three types of land uses (arable, forest, grassland) (see Table 1 for details). Soil sampling was performed from early September to mid-November 2012 to avoid the influence of seasonal   Table 2. Identification of bacterial genera associated to a particular environment. All the transect site characteristics (physical, chemical, geographical, climatic and land use data) were used to organize the sites into four soil condition clusters (cluster 1: low C and N contents; cluster 2: moderately rich soils; cluster 3: nutrientrich and acidic soils; cluster 4: Low C and N contents and high pH values) using the k-means method. An indicator species analysis was performed in each of these clusters to identify the bacterial genera preferentially associated with one of these specific clusters. For each genus significantly associated with a cluster, its sensitivity (i.e. the probability of finding the genus in sites belonging to the surveyed environment) is reported, with stars denoting the significance level of the indicator species analysis (***P < 0.001; **P < 0.01; *P < 0.05). For each genus significantly associated with one cluster, minimum, median and maximum relative abundances are presented. Models predicting the abundance of each genus were built and led to assess the amount of variance explained by environmental factors (soil properties, land use, climate).
SCiEnTifiC REPORTS | (2019) 9:605 | DOI:10.1038/s41598-018-36867-2 variation using the same standardized optimised procedure (SOP) 28 . For each site, a composite sample was made from twelve cores (5 cm in diameter and 5 cm depth) randomly taken within a 1 × 1 m surface. All soil samples were sieved to <2 mm and prepared for soil analyses by Teagasc and DNA extraction was performed in Dijon by the GenoSol platform, UMR Agroécologie, https://www2.dijon.inra.fr/plateforme_genosol/plateforme-genosol. Soil properties were measured as described in Creamer et al. 58 . Briefly, soil texture was characterised following the particle pipette size method 59 . Total C and N were measured according to the ISO 10694 standard 60 ; organic carbon (OC) was determined by LECO elemental analysis, conducted on 0.25 mm dry milled soil sub-samples 61 .
The cation exchange capacity (CEC) was analysed using the BaCl 2 extraction method 62 . The pH was measured in a 1:2.5 soil in water suspension using a glass electrode 63 . N mineralisation was analysed using the Illinois soil nitrogen test for amino sugar-N 64 . Available phosphorus content was measured using the Mehlich 3 methodology 65 . The sites were grouped based on their pH values (<5, between 5 and 7, >7), organic C content (less than 2%, between 2 and 15%, over 15%) and texture (fine, medium or coarse). The distribution and characteristics of the samples are presented in Table 1 and Fig. 1.

Molecular characterisation of soil bacterial communities. DNA was extracted and purified from
all soil samples using the improved ISO 11063 soil DNA extraction procedure as described by Plassart et al. 66  g for 1 min. The supernatant was removed and then proteins were precipitated with 1/10 volume of 3 M sodium acetate prior to centrifugation (14,000 g for 5 min at 4 °C). Nucleic acids were precipitated by adding 1 volume of ice-cold isopropanol. The DNA pellets obtained after centrifugation (14,000 g for 5 min at 4 °C) were washed with 70% ethanol. Crude DNA extracts were then purified using a MinElute gel extraction kit (Qiagen, France) and quantified using a QuantiFluor staining kit (Promega, USA), prior to further investigations. The diversity, composition and structure of the bacterial communities were determined on the 71 soil DNA samples by 16S rRNA gene amplicon pyrosequencing. A 450 bp fragment of the 16S rRNA gene was amplified using the primers F479 (5′-AGCMGCYGCNGTAANAC-3′) and R888 (5′-CCGYCAATTCMTTTRAGT-3′) 43 . Five ng of DNA were used in 25 μL of PCR mixture. The PCR was conducted under the following conditions: 94 °C for 2 min, 35 cycles of 30 s at 94 °C, 30 s at 52 °C, and 1 min at 72 °C, followed by 7 min at 72 °C. The PCR products were purified using a MinElute gel extraction kit (Qiagen, Courtaboeuf, France) and quantified using the PicoGreen staining Kit (Molecular Probes, Paris, France) on an Infinite 200 Pro fluorimeter (Tecan, Lyon, France). A second PCR of nine cycles was then conducted for each sample under similar PCR conditions using purified PCR products as DNA template and modified F479 and R888 primers containing 10 bp multiplex identifiers added at 5′ position to specifically identify each DNA sample. Two 25 µL reactions were performed for each DNA sample and pooled, before purification and quantification as previously described. Pyrosequencing was then carried out on a GS FLX Titanium (Roche 454 Sequencing System) by Genoscreen (Lille, France).
We performed the 16S rRNA gene sequence analysis according to Terrat et al. 43 , using the GnS-PIPE 67 . Briefly, all the raw reads were sorted according to the multiplex identifier sequences. The raw reads were then quality filtered and trimmed based on (i) their length (370 b), (ii) their number of ambiguities (Ns = 0) and (iii) their primer(s) sequence(s). The reads were dereplicated (i.e. clustering of strictly identical sequences) and aligned using INFERNAL alignment tool 66 , then clustered into OTUs using a PERL program that groups rare reads to abundant ones, and does not count differences in homopolymer lengths 67 . A filtering step was then carried out to check all singletons based on the quality of their taxonomic assignments to keep or discard them from the data sets. Finally, the retained reads were homogenised by random selection to 8,085 sequences per sample (the minimum number of reads per sample after GnS-PIPE processing) to compare the datasets and avoid biaised community comparisons. These sequences were then clustered into OTUs (97% similarity threshold) prior to determining diversity and richness indices. OTUs were further identified taxonomically using the Silva database. The raw data sets are  The turnover rates (z) for bacterial community composition were derived from the slope of the Taxa-Area Relationship (TAR) as described in Ranjard et al. 12 following the method described in Harte et al. 72 and by applying the equation: where χ d is the observed Sørensen's similarity between two soil samples that are d meters apart from each other. χ d and d are determined based on community structure data and geographical coordinates of sampling sites; respectively. b is the intercept of the linear relationship and z the turnover rate of the community composition. The z estimate and its 95% confidence interval were derived from the slope (−2*z) of the relationship between similarity and distance by weighted linear regression. Only distances greater than 1 km were considered. Differences in soil microbial community structure were characterized using UniFrac distances 73 . Non metric multidimensional scaling (NMDS) was then used to graphically depict differences between soil microbial communities using the MetaMDS function of the vegan package. The significance of the observed clustering of samples on the ordination plot was assessed by an analysis of similarity (ANOSIM, 1000 permutations).
A variance partitioning approach was used to investigate the relative influences of soil physicochemical characteristics (FAO classification for soil texture, total C content, assimilable P, soil pH, and CEC), land use, climatic zone, and geographical location (represented by geographical coordinates and spatial descriptors derived from geographical coordinates by means of a principal coordinates neighbour matrices approach, PCNMs, representing neighbourhood relationships between sites 69 ) on diversity indices (richness, evenness, and Shannon-Weaver index) and on soil bacterial community structure. Total nitrogen was not considered as an explanatory variable because it was highly correlated to the total soil carbon content (r 2 = 0.96). This approach consisted of two successive steps. First, we investigated the effect of environmental variables. Then, we tested the effect of geographical location and neighbourhood relationships on the residuals derived from the first step using the same methodology. For diversity indices, each step first consisted in evaluating each combination of explanatory variables relatively to its adjusted R 2 (to maximize) and its Bayesian Indication Criterion (to minimize) using the regsubsets function in the leaps package. This led to a reduced set of explanatory variables which was submitted to an iterative selection by means of Canonical Redundancy Analysis using the rda and ordiR2Step functions in the vegan package (forward selection) to identify the most parsimonious model. For bacterial community structure, each step was based on a distance-based redundancy analysis on the Unifrac distance matrix and consisted of a forward selection of the most parsimonious model by the capscale and ordiR2step functions in the vegan package. Finally, in each case, the total explained variance and the marginal effect of the selected explanatory variables were determined through an ANOVA-like approach using the anova.cca function in the vegan package.
In order to evaluate if microbial groups could represent indicators of soil conditions, land use and/or climate conditions, we clustered the sites into different environment types using the k-means method 74 . The k-means method is one of the most widely used techniques to establish clusters from environmental observations. First, initial groups were formed, and centroids were calculated as barycenters of the clusters. Then, the algorithm assigned each observation for the group to the closest centroid, and new centroids were calculated. These steps were repeated until the centroids no longer moved. The data used for this classification were soil physicochemical properties (pH, texture, organic C, total N) and land use (arable, forest, grassland). This classification method identified four clusters (cluster 1: nutrient poor soils, cluster 2: moderately rich soils, cluster 3: nutrient-rich and acidic soils, and cluster 4: nutrient-poor soils with an alkaline pH). To identify the indicator genera of each cluster obtained by the k-mean method, the multipatt function of the indicspecies package was used 71 . Based on the microbial groups identified as environmental indicators, a variance partitioning approach was used to model the influence of soil physicochemical properties, land use, climate and geographic location on the relative abundance of each bacterial genus. The land use effect within the clusters was tested by multivariate analysis and analysis of similarity (ANOSIM).