Introduction

As sequencing of barcode genes are unveiling the enormous bacterial diversity on earth, it becomes evident that what we have previously thought of as rather simple biological systems may in fact be dependent on the bacterial components for their function. Both traditional Sanger sequencing and more powerful high throughput sequencing methods such as 454-sequencing, have shed new light on the high importance of bacteria in eukaryotic systems1,2,3,4,5. Some biologists have referred to this as a conceptual breakthrough in biology; the recognition that higher organisms create a shared living space with a specific set of beneficial microorganisms6. Evidence for bacteria constituting an integrated part of both simple and more complex biological systems is piling up. Multiple well-known symbioses harbor rich and diverse bacterial populations. These include lichens1,2,7, fungal endophytes8, as well as mycorrhizal fungi9,10,11,12,13.

The ectomycorrhizal (EcM) symbiosis between plant roots and fungal hyphae is of major ecological importance in the extensive boreal and temperate forests in the Northern Hemisphere. Moreover, recent molecular surveys have revealed a high diversity of EcM fungi in alpine and arctic ecosystems14,15,16. Several studies have shown that EcM fungal communities do show some structuring and non-stochastic patterns along the horizontal soil plane17. Spatial autocorrelation of EcM fungal communities have been detected at distances below 3–4 m18,19,20, indicating that EcM communities are spatially aggregated at small scales. Furthermore, Bahram et al21 found that even though EcM species richness showed a stochastic pattern not directly related to distance from the host tree base, the EcM fungal communities were structured and non-stochastic at lineage level. The spatial autocorrelation of EcM fungi at small spatial scales have been explained by common mycelial networks (CMNs) that connects different plants belowground22.

The EcM mantle that covers the plant root tips has long been known to harbor bacteria7,23,24, from a range of phyla such as Bacteriodetes, Chlorobi, Actinobacteria, Cyanobacteria, Proteobacteria and Firmicutes13,25. Their association level with the mycorrhizal symbiosis ranges from loosely to tightly and several bacteria have been reported to be able to promote EcM formation11,26. The bacterial diversity within ectomycorrhizae has been explored with various techniques9,10,23,27, however, as far as we know, only one study have explored bacteria associated with ectomycorrhizae by means of high throughput sequencing techniques23. Even though this study provided some first insight to the bacterial communities associated with EcM fungi, only a few root tips were studied with a corresponding limited fungal taxonomic coverage, as bacteria of only two EcM fungi were investigated. Hence, the richness and composition of bacterial communities associated with whole EcM root systems remain largely unexplored. Furthermore, it is unknown if the bacterial communities of EcM forming plants are, as the EcM fungal communities, spatially aggregated and thereby spatially auto-correlated. It has been shown that bacteria can disperse along fungal mycelia28,29,30. If this is the case also along common mycelial networks, spatial auto-correlation of the bacterial communities at fine spatial scales will be expected.

The core–satellite hypothesis developed by Hanski31 applies where a habitat has clear dichotomy into two sets of species. The core species are locally abundant and regionally common, whereas satellite species are locally and regionally rare. Furthermore, a core microbiome is comprised of the members common in two or more microbial assemblages associated with a habitat32. The presence of core species and core microbiomes have been detected in several microbial community assemblages4,33, including bacteria in ectomycorrhizaspheres23 and may be essential within the habitat in which they are found.

As most EcM forming plants are shrubs and trees, a large fraction of the conducted studies on EcM associated bacteria have focused only on tiny fractions of these plants' root systems23. The present study is, to our knowledge, the first investigation of bacterial communities associated with entire EcM root systems using high throughput sequencing (HTS).

We analyzed the relatively small root systems of the EcM herbaceous plant, Bistorta vivipara (L.) Delabre within a 2 m × 2 m plot situated in an alpine area in Norway. 120 individual plant roots were found within the plot and analyzed. B. vivipara has a circumpolar distribution in arctic-alpine habitats. The plant is one of a few herbs that form EcM symbiosis. EcM associations on the roots of B. vivipara were described by Hesselman34 and have later been confirmed in numerous studies35,36,37,38,39,40. HTS of the entire fungal communities of individual B. vivipara plants have previously successfully been conducted40,41,42. In this study we are in the same manner analyzing the overall bacterial communities associated with whole root systems of B. vivipara plants.

Our main aims are to (1) explore the bacterial community composition of the EcM root systems of B. vivipara, (2) investigate if there are significant compositional differences in the bacterial communities in richness and structure associated with the roots versus the adjacent soil, (3) explore if the bacterial communities in soil and plant roots are spatially autocorrelated and (4) assess if a core microbiome, with tentative important functions, can be found across the analyzed plant roots.

Results

Richness

A total of 246,306 16S raw reads were obtained for the 120 B. vivipara plant root samples, of which 1.7% were initially removed due to tag-switching. After additional filtering, denoising, removal of OTUs with < 6 reads, OTUs that appeared in < 2 samples as well as samples containing < 95 OTUs and removing reads that appeared in the negative control (Table S2) 181,858 sequences remained, which clustered into a total of 4,419 OTUs. Of these, 3,571 OTUs were found in plant-roots and 3,476 in the soil; 2,628 were shared. Thus, 943 OTUs were exclusive to roots and 848 OTUs exclusive to soil. The average number of OTUs detected per sample was 325 (range: 144–589) with a mean of 1,022 reads generated per sample (range: 427–1,927) for all samples. A slightly lower number of OTUs and reads were found for the plant-root systems compared to the soil samples (Fig. S1).

Both the estimated species-accumulation curves (Fig. 1) and the species richness estimators (Table 1) indicated that much of the diversity within the plot had been recovered. The extrapolated richness estimates Chao and first order jackknife for total bacterial richness was similar for both soil and root systems, however, the two estimators did show divergent results in estimating the most species rich habitat of soil and plant roots. Chao estimated the species richness to be highest in soil, whereas 1.order jackknife estimated the species richness of plant roots to be highest.

Table 1 Observed number of bacterial OTUs and reads as well as estimated species (OTU) richness by means of Chao and 1st order jackknife estimates along with the standard deviation of the estimate in in all samples, plant root samples of B. vivipara (115) and adjacent soil samples (63)
Figure 1
figure 1

Species (OTU) accumulation curves calculated using the analytical species-accumulation method for the 115 plant root samples (green) and 63 soil samples (brown).

Taxonomic coverage

The 3,459 OTUs with significant hits to the Greengenes database, which were classified (960 OTUs without significant hits remained unclassified), represented 27 bacterial phyla, 64 classes, 79 orders and 80 families (Table 2). All of the phyla, classes and orders and 74 families were found in the soil samples, whereas 25 phyla, 56 classes, 74 orders and 78 families were found in the plant roots. At phylum level, Chlorobi and NKB19 were only detected in the soil samples. Acidobacteria, Chlorobi, Firmicutes and Planctomycetes were significantly more frequently represented in soil samples compared to plant roots (p < 0.05, chi-square test), while Actinobacteria, Armatimonadetes, Chloroflexi and OTUs that were unclassified at phylum level were significantly more abundant in the plant roots (p < 0.05, chi-square test). At class level, seven of the eight taxonomic groups present in the soil samples that were absent from the plant root samples belonged to classes of uncultured bacteria (028H05-P-BN-P5, C6, MVS-40, S035, SJA-176, SJA-28, TSBW08). The only class of cultured bacteria that were present in soil, but not in plant roots, were Bacilli. At family level representatives of A4b, Caulobacteraceae, Corynebacteriaceae, Cystobacterineae, Flexibacteraceae, Microbacteriaceae, Xanthobacteraceae were only found in the plant root samples. A higher proportion of the OTUs from the plant root samples remained unclassified (21.2%) compared to the soil samples (16.6%) at phylum level. The similarity towards 16S rRNA GenBank sequences was generally low; the mean sequences similarity towards 16S rRNA sequences of known taxonomy was 95.3% for the soil samples (range: 84.1%–100%) and 95.0% for the plant root samples (range: 84.9%–100%, Fig. 2). The mean sequence similarity for the 848 OTUs that were exclusively found in soil was 94.5%, whereas the 943 OTUs that were exclusively found in the plant root systems had even poorer sequence similarity; 92.8%.

Table 2 Summary of the distribution of operational taxonomic units (OTUs) among bacterial lineages. RS OTUs gives the OTUs found in the 115 root systems. S OTUs gives the OTUs found in the 63 soil samples. Only taxonomic groups that contained >1% of the OTUs are included in the table. Numbers in bold and italics represent taxa that are significantly different at phyla level (p < 0.05, chi-square test)
Figure 2
figure 2

Density plot showing the obtained 16S sequence similarity to known GenBank 16S rRNA accessions.

The solid green line gives the density curve for all representative sequences found in plant root samples (3,571 OTUs), whereas the green dotted line gives the density curve for the representative sequences found uniquely in plant root samples (943 OTUs). The brown solid line gives the density of sequence similarity of the representative sequence of all the soil samples (3,476 OTUs), whereas the brown dotted line gives the density curve for the OTUs that were uniquely found in the soil samples (848 OTUs).

Community composition

No OTU was observed in all samples and only one OTU in phyla AD3 occurred in > 95% of all samples. Three OTUs occurred in 95% of the plant root samples, whereas one occurred in 95% of all soil samples (Table 3). The G-test revealed that 58 OTUs were significantly associated with the plant-root samples, whereas 36 OTUs were significantly associated with soil samples (Table S1). These OTUs represented 14 phyla, 11 that were found in both sample types. Among the OTUs significantly associated to plant roots, Chloroflexi comprised 31% and Actinobacteria 16%. Sixteen percent of the OTUs did not have any significant hit to the Greengenes database and were thus classified as unknown. OTUs significantly associated with soil were distributed in 11 phyla. Planctomycetes dominated among these, with 47% of the OTUs followed by Acidobacteria (22% of the OTUs). Only 2.7% of the OTUs significantly associated with soil were not affiliated to known sequences in the Greengenes database.

Table 3 Identity of the bacterial OTUs that appeared in >95% of either the 115 plant root systems of B. vivipara or the 63 soil samples and their taxonomical affiliation

The two-dimensional GNMDS ordination (Fig. 3) of the Hellinger transformed data revealed a distinct compositional difference of the bacterial communities between plant roots and soil samples. The structure of this GNMDS ordination was corroborated by the high pair-wise similarity between GNMDS and DCA axes 1 (Kendall's τ = 0.8977; p < 1·10−15) and axes 2 (Kendall's τ = 0.7753; p < 1·10−15).

Figure 3
figure 3

Global multidimensional scaling (GNMDS) ordination diagram of the bacterial operational taxonomic units (OTUs) composition for the 115 plant root samples (green solid circles) and 63 soil samples (brown open circles).

The semi-variogram analyses revealed that the bacterial communities associated with the plant roots were spatially structured; pairs of samples situated more closely than the approximated range of 60 cm in general were more similar with respect to OTU composition than samples collected further apart (Fig. 4). No spatial structure was observed for the soil bacterial communities.

Figure 4
figure 4

Standardized semi-variogram for the bacterial communities of plant roots (green solid line) and soil (brown solid line) and their corresponding confidence intervals (dotted lines) against GNMDS axis one.

Bacterial communities from plant root samples closer than ca. 60 cm show evidence of spatial structuring as the semi-variance is outside the confidence envelope.

Discussion

Almost 4,500 bacterial OTUs comprising 27 phyla were recovered from the EcM plant-roots of Bistorta vivipara and adjacent soil in a mid-alpine situated plot. The bacterial richness detected is not surprising, as soil is known to harbor diverse bacterial communities due to its many micro-niches43. Bacterial OTU richness in soil have shown to diverge from the typical declining trend with altitude seen in many groups of organisms. Rather, the bacterial OTU richness have been found to increase with altitude to a mid-elevation bulge44. The species-accumulation curves for both the sampled root systems and adjacent soil approached an asymptote, indicating that the sampling effort was sufficient to capture much of the bacterial diversity within the plot. Moreover, bacterial richness in terms of number of OTUs detected was similar in plant roots and surrounding soil.

No studies have previously addressed the bacterial OTU richness across entire root systems of EcM plants and adjacent soil. However, some studies have compared the bacterial richness of plant roots and root systems with richness in soil. Burke et al27. analyzed the EcM roots of Douglas fir using T-RFLP and reported that bacterial communities associated with the EcM roots resembled that of rhizosphere communities of other conifers. Similarly, Lundberg et al4. reported no significant differences in bacterial OTU richness in soil, rhizosphere and endopythic compartments of the non-mycorrhizal plant Arabidopsis thaliana. This may indicate that the micro niches available for bacteria in plant roots are similar to what is found in soil.

We detected twenty-seven bacterial phyla across the soil and plant root samples. This number is higher than what have been found in previous studies on bacteria associated with EcM, using HTS23,45, T-RFLP27, RFLP9 and DGGE analysis9. Several of the bacterial phyla detected in our study are known to be relatively abundant and ubiquitous in soil, such as Acidobacteria, Actinobacteria, Bacteroidetes, Proteobacteria and Verrucomicrobia46,47. These phyla comprised almost 28% of the OTUs in our study.

Significant compositional differences in the bacterial communities were found between the two substrates (Fig. 2 and 3). The phyla Actinobacteria, Armatimonadetes and Chloroflexi were significantly more frequent in the plant root systems compared to the soil samples. Opposite, Acidobacteria, Chlorobi, Firmicutes and Planctomycetes were significantly more frequent in the soil samples than in the plant root samples.

The dominating phyla in the plant root systems (Acidobacteria, Actinobacteria, Armatimonadetes, Chloroflexi, Planctomycetes and Proteobacteria) comprised over 60% of the OTUs. In addition we detected several phyla that previously have shown to harbor bacteria associated with the mycorrhizal fungi13,25, such as Actinobacteria, Bacteriodetes, Chlorobi, Cyanobacteria, Protobacteria and Firmicutes in the root systems of B. vivipara. However, only a few of these were abundant.

Each phyla in our study was dominated by only a few classes and orders. This pattern of phyla being represented by only a few dominating classes, orders or families, has also been observed in other studies of bacterial communities3,48. In macroecology, competition between members of closely related species has been found as an important structuring effect in species communities, as closely related species often compete for the same resources49. The dominance of a few families, orders and classes in our study could be the result of competition between closely related species, leading to the dominance by some species. Several of the phyla and classes detected in our study have previously been found to dominate EcM bacterial communities. Alphaproteobacteria and Bacteriodetes were found to largely dominate the bacterial communities of Douglas fir EcM root tips27, whereas Proteobacteria, Acidobacteria, Actinobacteria and Bacteriodetes dominated the ectomycorrhizasphere of the two EcM fungi Xerocomus pruinatus and Scleroderma citrimum collected from an oak forest23. Even though members of the phylum Firmicutes were identified in the plant root systems of B. vivipara, none of these belonged to the class Bacilli, which previously has been identified as a group containing mycorrhizosphere bacteria25. Seven families (A4b, Caulobacteraceae, Corynebacteriaceae, Cystobacterineae, Flexibacteraceae, Microbacteriacea and Xanthobacteraceae) were only present in the plant root systems, of which several have previously been identified from soil or root systems4,50. Corynebacteriaceae have shown to be able to stimulate basidiospore germination50. The bacterial communities found in the EcM plant roots of B. vivipara were also similar to what was found in non-mycorrhizal roots of A. thaliana at phylum level3,4. However, the dominating families within each phyla in A. thaliana roots differ from the dominating families observed within the EcM plant roots of B. vivipara. A likely reason for this discrepancy at family level is that A. thaliana does not possess any form of mycorrhizal symbiosis, which may facilitate for other bacteria than those associated with EcM symbioses. Furthermore, as the soil type highly influence the composition of the rhizosphere microbiome3,4, the different environments in which these studies are carried out could further explain the taxonomical differences found between the studies.

In this study, more than 25% of the bacterial OTUs associated with roots were not found in the surrounding soil, while 24% of the OTUs were unique to soil. According to the G-test, most of the OTUs affiliating with the plant root samples represented either the six large phyla that comprised > 60% of the OTUs within the root systems, or were unclassified at any taxonomical level. Several OTUs affiliated with the plant root systems have previously been reported from the root system of the non-mycorrhizal plant A. thaliana, such as Micromonsporaceae and Streptomycetaceae4. The latter family has been found associated with EcM fungi and postulated as modulators of plant symbiosis51.

On average, OTUs recovered from the plant-root systems had a lower similarity to known 16S rRNA sequences than OTUs found in the surrounding soil. This especially holds true for those OTUs that are exclusively found in the plant root systems. This larger presence of unclassified bacteria within the EcM root systems may be due to the fact that the bacterial diversity in this environment is poorly known. Moreover, the root associated bacteria may be difficult to grow in culture, because they are adapted to this specialized niche52,53.

We did not detect any core microbiota in the EcM plant roots of B. vivipara or in the adjacent bulk soil. Even though several OTUs are significantly affiliated with either the B. vivipara plant roots or the surrounding soil according to the G-test, only three OTUs were found in more than 95% (which were used as an arbitrary cut-off) of the plant root samples. All of the three OTUs that were found in most plant root samples are also present in the soil, but only in very few samples. Furthermore, no dichotomy in the frequency distribution of species was found. Rather, most species occured in very few samples. Our findings thus contrast both the core-satellite hypothesis of Hanski31 and the core microbiota of Turnbaugh et al32. Our results also contrast the findings of Lundberg et al4 and Bulgarelli et al3 who detected a core microbiome in the roots of A. thaliana, recruited from the soil bacteria. Core species have been proposed as critical to the function of the community of which they have been sampled54. Instead, the investigated EcM root systems of B. vivipara of our study were highly dominated by rare species.

Bacterial communities in soil commonly do not show any spatial structure, which may be due to the multiplicity of micro-ecological niches within soil43. In contrast, in our data, plant root associated bacterial composition is spatially structured up to a distance of approximately 60 cm. This means that within this range, bacterial communities in the plant roots are more similar to each other than expected if the communities were randomly distributed. Several mechanisms may account for this pattern. Mycorrhizal plants may be linked by mycelia that radiate from plant roots and infect other plants in common mycelial networks (CMNs)22,55. It is also shown that fungal hyphae can act as vectors for bacterial transport56. Bacteria moving in the common mycelial network may therefore explain why a spatial structure of bacteria is seen in the plant root samples, but lacking in the soil samples.

Our study provides a first glimpse into the bacterial communities associated with the EcM symbiosis using universal 16S rRNA primers. It adds to an increasing number of studies in which bacteria are detected in large numbers in what were considered bi-eukaryotic symbioses. The bacterial communities in the EcM root systems of B. vivipara were clearly different from the surrounding soil. More than 25% of the bacteria found in the root systems were not found in the surrounding soil and several of the families were only found within the root systems. In addition, sequence similarities to known 16S rRNA sequences obtained from GenBank were poor for the OTUs that were exclusively found in the plant root systems. However, none of the OTUs that appeared in most root systems were absent in the soil samples, meaning that we did not detect a distinct core microbiome within the EcM root systems of B. vivipara.

This study is an investigation of the association between EcM roots and bacteria at whole root system level. To make a more direct link, investigations at root tip level are more appropriate. However, our results indicate a close association between the plant roots of B. vivipara and bacteria. In line with this, the presence of a spatial structure in the bacterial community associated with plant roots could be due to bacteria making use of the “fungal highway”, the common mycorrhizal networks, as transport vectors. The lack of a highly exclusive microbiota within the plant root samples may indicate that the bacterial community composition plays a less significant role than the community structure (i.e. the abundance of the taxa) in the differentiation between the soil and plant root microbiota.

Methods

Location and sampling

A 2 m × 2 m plot located in the middle alpine bioclimatic zone in Southern Norway (N 60° 33″, E 7° 25″), split into 64 sub-plots, was used for this study. All 120 Bistorta vivipara plants found in the plot per July 2009 were mapped and later dug up with their root system intact. The root systems were to a large extent dominated by EcM fungi and root endophytes belonging to the fungal phyla Ascomycota and Basidiomycota, with fewer fungal OTUs belonged to Zygomycota, Chytridiomycota and Glomeromycota (Blaalid et al, submitted). Pictures of B. vivipara roots are provided in Fig. S2, showing typical EcM root tips.

Fixed amounts of soil (about 28 cm3) were collected in the mid-point of each sub-square using a soil core sampler (see Fig. S3). To prevent degradation of DNA before extraction, plant samples were stored at 4°C. Further processing took place no later than 24 h after sampling. Soil samples were stored at −20°C no later than 3 hours after sampling. The plant root systems were rinsed in tap water to remove visible soil and plant debris and all roots not attached to the plant rhizome were removed. Subsequently, the plant root systems were rinsed thoroughly in milliQ water for at least 5 min. All root systems were transferred to Eppendorf tubes containing Cetyl Trimethyl Ammonium Bromide (CTAB) lysis buffer and stored at −20°C until DNA extraction. Thawed soil samples were homogenized in milliQ water and then sieved to remove small stones and plant roots before DNA extraction.

Molecular analyses

DNA was extracted from the entire plant root systems using CTAB extraction57,58. The DNA extracts were further cleaned using an E.Z.N.A soil cleanup kit (Omega Biotek, Norcross, Georgia, USA), following the manufacturer's protocol. We used the MO BIO soil extraction kit (MO BIO Laboratories Inc., Carlsbad, California, USA) to extract DNA from 2 ml of homogenized soil slurry following the manufacturer's protocol. Both plant and soil samples were prepared for 454 pyrosequencing by amplifying the V4 16S rRNA region. Plant root DNA extracts contained considerable amounts of eukaryotic DNA, this might cause problems with co-amplification of eukaryotic DNA due to homology between conserved regions of bacterial 16S rRNA and eukaryotic 18S rRNA59. PCR primers were therefore designed to target bacterial 16S rRNA sequences as broad as possible, but still avoid complete homology to eukaryotic 18S rRNA sequences. Conserved regions of the bacterial 16S rRNA gene were identified by aligning diverse bacterial sequences (see Table S2 for accession numbers). Potential homology to the eukaryotic 18S rRNA gene sequences were identified by manual inspection of an alignment of diverse eukaryotic sequences (Table S2). For further optimization of primer sequences to improve coverage, we used the Probematch tool of Ribosomal Database Project together with sequence alignments including numerous rRNA gene sequences for relevant bacterial taxa. A semi-nested PCR protocol was applied, with different forward, but the same reverse primer, in the two PCR reactions. The primers 329F (5′-ACKGBCCARACWCCTACG-3′) and 802R (5′-TACCRGGGTATCTAAKCCYGT-3′) were used for the external PCR. For the internal PCR cycles, fusion primers were constructed by adding 24 different 10 bp unique tags (Roche technical bulletin 005-2009) and 454 pyrosequencing adaptors A and B to the 16S target sequences 518F (5′-CAGCAGCCGCGGTAAKAC-3′) and 802R, respectively. Final concentrations in the PCR were 0.16 mM dNTP mix, 0.2 μM of each primer and 0.4 units Finnzymes Hotstart II Phusion polymerase (Vantaa, Finland). The amplification program for both steps of the nested PCR was as follows: 30 s at 98°C, followed by 30 cycles of 10 s at 98°C, 20 s at 55°C and 20 s at 72°C, with a final extension step at 72°C for 7 min before storage at 4°C. PCR products were cleaned with Wizard® SV Gel and PCR Clean-Up System (Promega, Madison, Wisconsin, USA), quantified using a Sequalprep™ Normalization kit (Invitrogen, Paisley, United Kingdom) and pooled into 16 equimolar amplicon libraries. 454 Titanium sequencing of the tagged amplicons was performed at the Norwegian Sequencing Centre at the University of Oslo using 8/16 lanes. One non-template control from the DNA extraction step was included inn all steps and finally run together with the other samples in the 454 run. The raw data have been accessioned in the NCBI short read archive, accession no. SRP029429.

Bioinformatics analyses

A total of 246,306 16S rRNA sequence reads were filtered, denoised and processed using QIIME v. 1.5.060 on the Abel cluster at the University of Oslo. Tagging the reads in both ends enabled us to check for tag-switching, which may lead to false positives in HTS amplicon datasets61. Tag-switching was checked using a Python script available upon request. Reads with length < 200 bp and > 550 bp, average phred quality score of < 25, or mismatches in the tags, were discarded. Sequences with homopolymers exceeding 6 bp, more than one ambiguous base call and more than one mismatch in the forward primer sequence were removed from the data set. In addition, a 50 bp sliding window was used to identify regions of low sequence quality and sequences were truncated at the end of the last good-quality window (that is, a window phred > 25). Truncated sequences that still met the minimum length requirement were retained in the data set.

The resulting sequences after quality control were denoised using Denoiser v. 09162 as implemented in QIIME v 1.5.0. The denoised reads were clustered into operational taxonomic units (OTUs) at 97% sequence similarity using the UCLUST v.1.2.22 with parameters maxaccepts = 20, maxrejects = 500. Only OTUs with six or more reads that appeared in two or more samples were considered in downstream analyses. OTU representative sequences were aligned against the Greengenes core set63 and chimeric sequences were identified using ChimeraSlayer as implemented in Mothur v. 1.2664 and subsequently removed from the dataset. OTUs that occurred in the negative control were discarded (Table S3) as well as five samples that contained less than 95 OTUs.

OTU representative sequences were initially classified using the RDP classifier65. The one OTU classified as plastid was discarded. Taxonomy was also assigned in QIIME by BLASTing the representative sequences against the Greengenes database66 and this dataset was used in the further analyses.

To check for amount of sequence novelty in our dataset, the parallel implementation of BLAST, mpiBLAST, was used to blastn the OTU-representative sequences against the reference database NCBI-nr (January 2013). mpiBLAST was run with 100 processors on the supercomputer MareNostrum located at the Barcelona Supercomputing Center.

Statistical analyses

A total of 181,858 sequences that clustered into 4,419 OTUs remained for the 115 plant-root and 63 soil samples for further analyses. OTUs that occurred in > 95% of the soil and plant-root samples were regarded as core microbiota for each environment, the cut-off value was arbitrarily chosen. Chi-square tests67 were used to determine if OTUs at different taxonomic levels were significantly different from one another.

A G-test of independence was run with Bonferroni correction of the probability in QIIME 1.6.0 to test whether any of the OTUs were significantly associated with either plant-root samples or soil samples. The dataset was Hellinger transformed to account for ‘blind sampling’ and large numbers of absences, as suggested by Ramette68. Downstream statistical analyses were performed in R version 2.15.169. The vegan package was used for all multivariate and species richness analyses. Species accumulation curves were calculated using the analytical species-accumulation method developed by Ugland et al70. Extrapolated species richness was calculated using both Chao's method and as first-order jackknife estimate. Global non-metric multidimentional scaling (GNMDS) ordinations were obtained from the matrix of Hellinger transformed data for the OTUs by the following specifications: distance measure = Bray-Curtis Distance, dimensions = 2 and 3, initial configurations = 100, maximum iterations = 1000. Axes were rotated to principal components. Because the underlying gradients in our study are unknown, detrended corresponding analysis (DCA)71,72 was run in R with vegan and MASS73 packages with default options on the data and ordination inspected for known artifacts such as the arch effect, tongue effect and other patterns74. Obtaining similar results using the two methods along with the absence of visual artifacts was interpreted as a strong indication that a reliable gradient structure had been found75,76. This was confirmed by calculating Kendall's rank correlation coefficient τ between DCA and GNMDS axes for GNMDS ordinations with two as well as three dimensions. The package geoR77 was used to infer spatial structure of all ordination axes and explanatory variables by evaluation of empirical semi-variance analyses. The semi-variance analysis was based on the assumption that the variables have direction-independent (isotrophic) patterns of spatial variability, with mean and variance that does not vary with location in space (ergodicity). Standardized isotrophic semi-variance, i.e., half the standardized variance of differences between plot pairs78, was calculated for each variable for each eight classes into which the range of distances between the studied samples was divided. Envelopes for each standardized semi-variance value were obtained by permutation, using the variog.mc.env function of geoR by which data values were randomly allocated to spatial locations 999 times and new variograms calculated on each permutation. The envelopes represent for each lag the minimum and maximum of the standardised semi-variance values for the permuted data.

Definitive quantitative assessment criteria neither exist for the degree to which a variable is spatially structured nor the range of influence by a spatial process, cf. Palmer79 Spatial structure was therefore inferred by visual inspection of semivariogramssee 80. A spatial structure was considered to be present in distance interval at which the standardised semi-variance was below the lower bounds of the envelope.