The forest timberline, a distinctive boundary between the continuous closed forest canopy to the treeless alpine zone1,2, has solicited substantial attention because it responds quickly and markedly to climate change3. Formation of the forest timberline is typically associated with mean air temperature between 5.5 and 7.5°C during the growth season4, which is a major limiting factor for tree growth at high altitudes2. As it has been recognized that global warming induces upshift of the timberline towards higher altitudes worldwide5, the sensitivity of the timberline to temperature makes the timberline a ready indicator of climate change6.

Several studies of the forest timberline have explored the underlying mechanisms forming spatial distribution patterns and ecological processes involved in biogeochemical cycles3,7. At the global scale, temperature is a determinative factor controlling the timberline2,7. At the local scale, environmental parameters and anthropogenic perturbation of the landscape affect formation of the timberline8. However, a majority of these studies only analyze plant communities and their responses to climatic dynamics9,10,11. Only a few studies have analyzed microbial activities in regard to microbial biomass, carbon efflux and decomposition rates12,13,14, neglecting microbial community composition and structure. Due to vast complexity of soil microbial communities and inadequacy of scientific tools that accurately profile microbial community structure and function15, it remains unclear whether and how the taxonomic composition and metabolic activity of microbial community are affected by timberline upshift.

The rapid development of high-throughput 16s rRNA gene sequencing has enabled fine-tuning assessment of microbial taxonomic composition16. Meanwhile, development of a functional gene microarray (GeoChip) has enabled quantitative, accurate and rapid detection of a large number of microbial genes17, making it possible to link the metabolic potentials of microbial communities with ecosystem processes18. Aiming to obtain insights in predicting changes in microbial communities and microbe-mediated soil biogeochemical cycles for timberline upshift, we adopted 16s rRNA gene Illumina sequencing and GeoChip to analyze soil microbial communities of the coniferous forest and the shrubland, which are two vegetation belts forming the timberline in the Shennongjia national natural reserve (SNNR) located in Northwest Hubei province, China. SNNR is chosen because it is a sensitive region vulnerable to climate changes19. It is one of the most biologically diverse areas in China and a member of UNESCO's World Network of Biosphere Reserves, often known as “The Oriental Botanic Garden” and the natural bank of biological species.

Specifically, we are interested in the following scientific questions: (1) which environmental parameter best explained the formation of microbial community? and (2) were there significant linkages between microbial functional genes and soil nutrient storages, given the important role of microbes in driving biogeochemical cycles?


Vegetation and soil geochemical parameters

The distance among the sampling plots within each vegetation zone ranged from 0.042–0.322 kilometers. Given the typical high soil heterogeneity, it was unsurprising to detect considerable variations in the vegetation and soil geochemical parameters, which prompted us to test for general differences between the shrubland and the coniferous forest (Table 1). We found that the number of plant species and plant diversity, calculated by Shannon index, were significantly (P < 0.001) higher in the coniferous forest (2.64) than in the shrubland (1.64). Fargesia murielae of the Poaceae family and Rhododendron oreodoxa of the Ericstarae family was the dominant species in the shrubland, while Abies fargesii of the Pinaceae family was the dominant species in the coniferous forest.

Table 1 Summary of environmental parameters analyzed by two-tailed t-test

Many soil geochemical parameters, including soil organic carbon, labile organic carbon, dissolved organic carbon, total nitrogen, alkaline hydrolysis nitrogen, NO3-N, total potassium and rapidly available phosphorus, were similar in both habitats. Soil temperature, a major limiting factor for tree growth at the forest timberline20, was significantly higher (P = 0.001) in the shrubland than the coniferous forest. This was consistent with the annual average air temperature, which was higher in the shrubland (4.62°C) than in the coniferous forest (4.00°C). For others, total phosphorus, total sulfur, NH4+-N and δ15N were significantly (P < 0.010) higher in the shrubland, while moisture, pH, δ13C, Al and Fe were significantly (P < 0.010) higher in the coniferous forest. These results suggested that shrubland soil was relatively richer in nutrients than the coniferous forest.

Soil microbial community of the coniferous forest and the shrubland

Microbial biomass, judged by soil microbial biomass carbon and nitrogen, were ~40% higher at the significant level of P < 0.002 in the shrubland than the coniferous forest (Table 1). At the taxonomic level, a total of 795,134 paired-end sequences, ranging from 14,928 to 50,849 reads per sample, were obtained from 24 samples of these two sampling sites. Finally, random re-sampling was performed with 20,000 sequences per sample, resulting in 55,440 OTUs at the 97% similarity level (Supplementary Fig. S1). The microbial taxonomic diversity, calculated by the Simpson index, was significantly (P = 0.035) higher in the shrubland (Table 1). At the functional gene level, 45,357 genes were detected in the shrubland samples, whereas 39,984 genes were detected in the coniferous forest samples, resulting in a significantly (P = 0.001) higher functional diversity in the shrubland.

To further examine differences between these two sites, Principal coordinates analysis was performed with the high-throughput sequencing and GeoChip data, respectively. Samples of the coniferous forest and the shrubland were well separated from each other in both datasets (Fig. 1), which was also verified by three non-parametric multivariate statistical tests of dissimilarity (adonis, MRPP and anosim, P < 0.010) (data not shown).

Figure 1
figure 1

Principal coordinates analysis (PCoA) of microbial community based on (A) high-throughput sequencing data and (B) GeoChip data.

The values for Axes 1, 2 and 3 are percentages of variation attributed to the corresponding axis.

The linkage between the microbial community and environmental parameters

Pearson correlation analyses were carried out to identify environmental parameters linking to soil microbial biomass. Soil temperature was positively correlated with soil microbial biomass carbon (smbc) (r = 0.54, P = 0.006) and soil microbial biomass nitrogen (smbn) (r = 0.75, P = 0.001) (Fig. 2A), implying that temperature might be a key factor in determining soil microbial biomass. Soil NH4+-N content was also positively correlated (P = 0.002) with smbc and smbn (Supplementary Fig. S2A). But most other environmental parameters, including soil pH and plant diversity, were not significantly (P < 0.050) correlated.

Figure 2
figure 2

Correlations between (A) microbial biomass (indicated by soil microbial biomass carbon and soil microbial biomass nitrogen) and soil temperature and (B) microbial α-diversity (taxonomic and functional diversity) and soil temperature.

White and black dots represent samples from the shrubland and the coniferous forest, respectively.

We proceeded further to identify key environmental parameters linking to microbial diversity. At the taxonomic level, soil temperature, but not NH4+-N, was positively correlated (r = 0.53, P = 0.008) with microbial diversity (Fig. 2B). At the functional gene level, both soil temperature (Fig. 2B) and NH4+-N (Supplementary Fig. S2A) were positively correlated (P < 0.050) with microbial functional diversity. Meanwhile, microbial taxonomic and functional diversity were positively correlated (P < 0.050) with soil microbial biomass nitrogen (Supplementary Fig. S2B), showing a linkage between microbial biomass and diversity.

Canonical correspondence analysis (CCA) was performed to identify major environmental parameters linking to soil microbial community at the taxonomic or functional gene level. A total of 12 environmental parameters (pH, moisture, soil temperature, NH4+-N, total sulfur, total phosphorus, Al, Fe, altitude, plant diversity, δ13C and δ15N) were selected as independent environmental parameters based on variance inflation factor analysis, resulting in significant (P < 0.050) CCA models (Fig. 3). Soil temperature, moisture, pH and altitude were the most important parameters shaping the microbial community since they aligned well with the first axis, which represented the major variations among microbial communities.

Figure 3
figure 3

Canonical correspondence analysis (CCA) of (A) high-throughput sequencing data and (B) GeoChip data with environmental parameters.

TS, total sulfur; TP, total phosphorus.

To single out the contribution of temperature to microbial community variations, partial Mantel test was performed to correlate the microbial community with temperature, soil parameters and plant community. For taxonomic composition of microbial community, temperature was the most important control factor with the highest R-value (Table 2). For functional gene composition of microbial community, temperature was the only factor that was significantly correlated.

Table 2 The relationships of microbial community structure to environmental parameters revealed by partial Mantel tests

Bacterial taxa groups

All of the detected OTUs were classified as 26 bacterial phyla. Among them, Proteobacteria, Acidobacteria, Actinobacteria and Verrucomicrobia were the most abundant phyla. Acidobacteria, Actinobacteria, Chloroflexi and Planctomycetes were significantly higher in the shrubland, while Proteobacteria was significantly higher in the coniferous forest (Supplementary Fig. S3). Proteobacteria, as the most abundant phylum, was composed of α-Proteobacteria, β-Proteobacteria, δ-Proteobacteria and γ- Proteobacteria. Among all of the detected Proteobacteria, β-Proteobacteria and α-Proteobacteria were the most abundant, which was as high as 72.62% of all Proteobacteria. β-Proteobacteria of the coniferous forest was significantly (P = 0.001) and 192% higher than that of the shrubland, while α-Proteobacteria was significantly (P = 0.001) and 23% lower in the coniferous forest.

Selected gene categories of the shrubland and the coniferous forest

The response ratio analysis showed that genes associated with functional categories of organic remediation and soil benefit were significantly more abundant in the coniferous forest at the 95% confidence level (Supplementary Fig. S4). Genes associated with carbon cycle, sulfur cycle, bacterial phage, energy process and soil borne pathogen were significantly less abundant, whereas genes associated with nitrogen and phosphorus cycle remained similar. Detailed information of selected gene categories is described below.

Nitrogen (N) cycle

The total abundance of amoA genes associated with nitrification was higher in the coniferous forest (Fig. 4). Most of these amoA genes were derived from the archaea phylum of Crenarchaeota, known to be highly abundant in temperate acidic forest soil21. By contrast, total abundances of nasA and nrfA genes associated with N reduction and nirS/nirK genes associated with denitrification were significantly lower, suggesting a shift of microbial functional potential from ammonium biosynthesis towards nitrate biosynthesis. Consistently, the NO3-N content was 67.2% higher in the coniferous forest, albeit insignificantly and the NH4+-N content was significantly and 20.9% lower (Table 1).

Figure 4
figure 4

The fold change of N cycle genes between the shrubland and the coniferous forest.

The percentage of each gene indicated the relative fold changes, with red and green colors indicating the higher and lower signal intensity of each detected gene in the coniferous forest, respectively. This percentage was calculated by the sum of the signal intensity of each detected gene divided by the total sum of signal intensity of all detected N cycle genes and weighted by the fold change of each gene for the coniferous forest over the shrubland. ***P < 0.010; **P < 0.050.

We explored possible linkages between the NO3-N or NH4+-N content and N cycle genes associated with NO3-N or NH4+-N metabolism. The results showed that total abundances of nitrification genes (amoA and hao) were negatively correlated (r = −0.46, P = 0.023) with NH4+-N and total abundances of denitrification genes (nirS and nirK) were also negatively correlated (r = −0.54, P = 0.008) with NO3-N (Supplementary Fig. S5), demonstrating that gene abundances associated with N cycle, representing microbial functional potentials, were closely linked to soil N contents.

Ammonium oxidization was controlled by a narrow group of ammonium-oxidizing archaea and bacteria22. Thus, we examined ammonium oxidizer OTUs from the sequencing data. Total abundance of the ammonium oxidizer was positively correlated to NO3-N (r = 0.49, P = 0.019) and negatively correlated to NH4+-N (r = −0.41, P = 0.049) (Fig. 5A), supporting the important role of ammonium oxidizer in regulating soil N cycle. To verify it, we examined bacterial amoA gene, which is the key functional gene of ammonium oxidizers. Similar to the observations from OTU data, total abundance of bacterial amoA gene was also positively correlated to NO3-N (r = 0.40, P = 0.057) and negatively correlated to NH4+-N (r = −0.44, P = 0.033) (Fig. 5B). Furthermore, total abundance of ammonium oxidizer was positively correlated with bacterial amoA gene abundance (r = 0.38, P = 0.064) (Fig. 5C).

Figure 5
figure 5

Correlations between (A) NH4+-N, NO3-N and total abundance of ammonium oxidizer, (B) NH4+-N, NO3-N and total abundance of bacterial amoA gene and (C) abundance of ammonium oxidizer and bacterial amoA gene.

White and black dots represent samples from the shrubland and the coniferous forest, respectively.

Carbon (C) cycle

A large number of functional genes associated with fixation and degradation were detected. CODH (CO dehydrogenase) was significantly (P = 0.050) more abundant in the coniferous forest but rubisco (ribulose-1, 5-bisphosphate carboxylase/oxygenase) was less abundant, suggesting that the CODH route was a preferred C fixation route in the coniferous forest. Most of the abundant CODH genes were derived from bacteria except a few (Sulfolobus acidocaldarius, Methanosarcina mazei and uncultured Crenarchaeote). The most abundant CODH genes were derived from Bradyrhizobium sp., Silicibacter sp., Aminobacter sp., Burkholderia sp., Ralstonia eutropha Solibacter usitatus, Mycobacterium smegmatis, Nocardioides sp. and uncultured bacteria, which were known to have a wide distribution23.

The total abundance of genes associated with C degradation was significantly lower in the coniferous forest samples (P = 0.050). Total abundances of ara associated with hemicellulose degradation, acetylglucosaminidase associated with chitin degradation and glx associated with lignin degradation were higher in the coniferous forest, while the other genes were similar or lower (Supplementary Fig. S6A). The most abundant glx genes in the coniferous forest were derived from Phanerochaete chrysosporium, which has been well documented for its important role in lignin degradation24. Despite these intrinsic differences of C fixation and degradation pathways, the net effect of C storage may be similar, as reflected in the observation that the levels of soil organic carbon and dissolved organic carbon were similar between the coniferous forest and the shrubland (Table 1).

Phosphorus (P) cycle

GeoChip 4.0 contained probes to target three P cycle genes, ppk (polyphosphate kinase) for polyphosphate biosynthesis, phytase for phytate degradation and ppx (exopolyphosphatase) for inorganic polyphosphate degradation. The total abundance of ppx genes was significantly higher in the coniferous forest, while that of ppk genes was significantly lower (P < 0.050) (Supplementary Fig. S6B). No significant difference was found for phytase genes. Thus, microbial functional potentials are inclined towards polyphosphate degradation, which was consistent with the higher level of rapidly available phosphorus and the lower level of total phosphorus in the coniferous forest (Table 1).

Organic remediation

Abundant organic remediation genes in the coniferous forest included genes associated with biodegradation of aromatics, chlorinated solvents, herbicides related compound, pesticides related compounds or other hydrocarbons. Among these, aromatic remediation genes were the most abundant. Consistently, more abundant aromatic components could be detected in conifer-rich sites25, which might induce the aromatics remediation genes in the coniferous forest.

Correlation Networks of OTUs and functional genes

To explore possible ecological interactions of microbial communities, we used a random matrix theory-based algorithm26 to reconstruct correlation networks from OTUs or selected GeoChip data. All of the resulting networks showed general features of ecological networks, such as scale free, small world and modular (Supplementary Table S1). At both taxonomic and functional gene levels, the coniferous forest networks had higher average connectivity and transitivity, indicative of tighter interactions/coupling within microbial communities. By contrast, the modularity, indicative of the resistance of systems to disturbance27, was lower in coniferous forest networks than that of shrubland. More positive interactions, which implicated cooperation28, were detected in the coniferous forest networks (98.19% in the OTU network and 92.44% in the GeoChip network) than those in the shrubland networks (93.24% in the OTU network and 74.79% in the GeoChip network).

Only one hub of OTU networks was detected in the shrubland and the coniferous forest, which was derived from the phylum of Acidobacteria and Verrucomicrobia, respectively (Supplementary Fig. S7A). For GeoChip networks, 25 out of 34 hub nodes of the shrubland network were C cycle genes whereas 12 out of 20 hub nodes of the coniferous forest network were N cycle genes (Supplementary Fig. S7B), which was consistent with the finding that relative abundances of C and N cycle genes were higher and lower in the shrubland samples, respectively (Supplementary Fig. S4).

To explore possible correlation between environmental factors and microbial community, we included environmental factors in our correlation network based on random matrix theory. We found that temperature was a hub node, that is, temperature was one of the nodes with the highest connectivity with other nodes (Fig. 6). Some of the directly connected nodes to temperature were among the top 10 nodes with the highest connectivity, suggesting that temperature and its directly connected nodes play an important role in maintaining the network structure and topology.

Figure 6
figure 6

Network interactions between environmental parameters and microbial taxonomic community.

The red node represents temperature. The green ones represent the nodes directly connected to temperature, bigger diameter of which indicates higher connectivity. The purple ones were indirectly connected OTUs.


Investigating soil microbial communities is important for understanding microbe-mediated biogeochemical cycle and ecosystem functioning29. A main goal of this study is to identify the environmental factor(s) that best explain microbial community variation. It has been proposed that above-ground plant communities drive below-ground microbial diversity30. Previous studies have indicated that plants affect microbial communities via rhizodeposits and exudates and provide organic C and N to the soil microbial community31,32. The different vegetation types could select for distinct soil microorganisms due to formation of a variety of microhabitats that support a diverse collection of species33,34. Thus, we examined whether soil microbial functional diversity was affected by plant diversity. However, significantly higher microbial taxonomic and functional diversity were detected in the shrubland than the coniferous forest (Table 1). Considering higher plant diversity, species number and biomass in the coniferous forest, this finding suggested that plant diversity might not be the determinant of microbial diversity, which was consistent with several previous studies showing that soil microbial community was uncorrelated with plant diversity35,36,37. The inconsistency between plant diversity and microbes could be ascribed to different ecophysiological traits of plant species, which would exert strong effects on soil biological properties38,39. Thus, the plant community composition is more likely to affect soil microbial community composition than plant diversity35,40,41. In addition, this could also be explained by a chance effect that a keystone species present in the plant community results in greater effects on soil processing than the contribution of the total plant diversity42. Notably, the dominant species were different between the shrubland and the coniferous forest. The plant litter of Abies fargesii in the coniferous forest with relatively higher C/N ratios provides poor nutrients for microbial growth43 and this may have a tendency to reduce microbial diversity and activity. On the contrary, Fargesia murielae, one of the dominate plants in the shrubland, would buffer temperature extremes and modify the local microenvironments and soil quality44, which might be helpful for microbial survival.

Temperature has been shown to be the major factor in timberline formation9. Here we showed that soil temperature, among all of measured environmental parameters, showed the most significant and extensive linkages with microbial biomass, microbial diversity and composition at both taxonomic and functional gene levels. Correlation networking analysis also showed that temperature was an influential environmental parameter. Therefore, temperature was the best predictor for microbial community variations at the timberline. Our study showed that the temperature was significantly lower in coniferous forest, which was consistent with previous studies showing that the timberline forest typically had a colder temperature than its adjacent vegetation at higher altitudes2,45,46, since the closed forest canopy protects the soil from sunrays and results in a lower temperature. Temperature can affect metabolic rates via the kinetics of the biological process, famously coined as “the Red Queen runs faster when she is hot” and consequently contributes to organism differentiation and diversification47,48. However, it is also possible that temperature indirectly affects microbial community by controlling soil nutrient availability, which further affects soil microbial community49. A comparison among the shrubland, grassland, farmland and reforested land in an adjacent region to our study sites showed that the shrubland had the highest soil quality value50. Similar studies on alpine ecosystems showed that treeless soil at higher altitudes had more labile C, greater microbial activity than soil of closed forest canopy51,52. Consistently, higher soil nutrient contents of dissolved organic carbon, available nitrogen, total phosphorus and total sulfur in the shrubland provided a suitable environment for microbial communities, as indicated by higher microbial biomass and diversity. CCA showed that microbial community compositions were correlated with δ15N, total phosphorus, total sulfur; Al and Fe (Fig. 3), in addition to correlations between microbial biomass, functional diversity and NH4+-N (Supplementary Fig. S2). Among them, N appeared to be a key factor affecting microbial community composition, diversity and productivity in many N-limited terrestrial ecosystems53. The correlation between microbial biomass and NH4+-N was consistent with precious study showing that N limitation was an influential factor in affecting microbial community54. Meanwhile, our study showed that low microbial diversities were detected in sampling sites with high C/N ratios, which was consistent with the finding that microbial community structure and activity were negatively coupled to the soil C/N ratio43. This indicated that an upward expansion of forest resulting from global warming would lead to changes in soil organic quality and the activities of the underground communities.

A parallel 16s rRNA and shotgun sequencing study on tallgrass prairies showed a strong positive correlation between taxonomic and functional gene diversity55, suggesting a low degree of functional redundancy. By contrast, we did not detect strong correlation between overall taxonomic and functional gene diversity, which might be attributed to the differences in techniques or ecosystems. However, there was a positive correlation between bacterial amoA gene abundance and ammonium oxidizers (Fig. 5C), a functionally narrow group utilizing the conversion of ammonia into nitrite as the sole energy source22. It can thus be predicted that reduction in ammonium oxidizer species is associated with decreases in the functional potential of bacterial ammonium oxidization contained within these soil communities.

Correlation networking analyses show important details of community assembly rules reflecting ecological processes such as cooperation, competition, habitat filtering and historical effects and can represent mathematical interaction/coupling among different populations and/or functional genes that regulate system functions56. The shorter path length and higher average connectivity and transitivity of coniferous forest networks suggested that microbial interaction/coupling was high. The high interaction might be ascribed to certain deterministic processes such as habitat filtering, reducing the spread of trait values and reflecting shared ecological tolerances57, which was consistent with the observations of more positive interactions in the coniferous forest networks. Modularity helps control disturbances by compartmentalizing social-ecological systems58, the higher modularity of the shrubland network indicated that the microbial system would be more resistant to changes, both at the taxonomic and functional level, which might be ascribed to higher microbial diversity of shrubland (Supplementary Table S1).

Profiling microbial communities in the habitats bordering the timberline is crucial for predicting the dynamics of microbial community changes and ecosystem functioning, since the trend of timberline upshift resulting from climate warming is likely to continue. Our results showed that temperature was the best predictor for microbial community formation and there were clear linkages between microbial functional potentials and soil biogeochemistry cycles of the timberline. Based on these results, we predict that timberline upshift resulting from global warming would cause distinct changes in microbial communities and soil C and N pools. However, the general trend of timberline shift is modified by local, regional and temporal variations59. It would be interesting to expand the observations in this study to other timberlines or conduct time-series experiments in order to test the generality of the observations.


Site and sampling

The study site is located at Shennongjia National Nature Reserve (SNNR, 110°14'–110°18' E, 31°26'–31°28' N) of Shennongjia Mountain, Northwest Hubei province, Central China. This area is heavily influenced by subtropical monsoon, characterized by wet (monthly precipitation > 250 mm) and hot weather in summer (monthly mean temperature > 25°C) but dry (monthly precipitation < 10 mm) and cold (monthly mean temperature < −4°C) weather in winter19. The annual mean air temperature and precipitation of this area are 7.2°C and 1500 mm, respectively. The vegetation belts forming the timberline were the shrubland and the coniferous forest. As many other landscape lines, the timberline is a well-defined boundary from a distance but a gradual transition with patchy patterns upon close inspection. To manifest differences between the shrubland and the coniferous forest, we chose typical shrubland between the altitudes of 2720 and 2776 m and the coniferous forest between the altitudes of 2456 and 2632 m to collect soil samples.

Soil samples of the shrubland and the coniferous forest were collected in August, 2011. At each site, twelve plots (20 m × 20 m) comprising typical vegetation and soil attributes were selected for sampling. At each plot, ten to fifteen soil cores at a depth of 0–10 cm were taken, mixed thoroughly and sieved with 2 mm mesh to remove roots and stones. Soil samples were kept on ice when transporting to laboratory, divided into two subsamples and stored at either 4°C for soil biogeochemical measurements or −80°C for DNA extraction.

Soil and vegetation parameter measurements

The species, number and canopy of each tree or shrub, the diameter of breast height and height of trees were surveyed at each plot. The air annual average temperature (from 1950 to 2000) was calculated with long-term records of daily air temperature by WorldClim (Version 1.4)60. Soil temperature at each plot was measured by a Hobo Temperature instrument at the depths of 0–15 cm, respectively. The other soil geochemical parameters (soil moisture, soil pH, soil organic carbon, total nitrogen, available phosphorous, available potassium, water, dissolved organic carbon, labile organic carbon contents, δ13C, δ15N and soil enzyme activities of cellulose, glucanase, polyphenol oxidase, sucrase and amylase) were measured as previously described61. A rapid chloroform-fumigation extraction method was used to measure soil microbial biomass carbon and nitrogen as previously described62. Shannon-Weaver index (H′) was used to evaluate the plant diversity.

DNA extraction, purification, quantification

Soil DNA was extracted by freeze-grinding mechanical lysis as previously described63. Freshly extracted DNA was purified twice using 0.5% low melting point agarose gel followed by phenol-chloroform-butanol extraction. DNA quality and quantity were assessed by the ratios of 260 nm/280 nm and 260 nm/230 nm and final DNA contents were quantified with a PicoGreen method using a FLUO star Optima (BMG Labtech, Jena, Germany).

Illumina sequencing and data processing

16s rDNA genes was amplified using a common primer pair targeting v4 region (Forward primer, 515F, 5′-GTGCCAGCMGCCGCGGTAA-3′; reverse primer, 806R, 5′-GGACTACHVGGGTWTCTAAT-3′) combined with adapter sequences and barcode sequences64,65. PCR amplification was performed in a 25 μl reaction, containing 2.5 μl 10 × AccuPrime PCR buffer II (Invitrogen, Grand Island, NY), 1 μl of each primer (10 μM), 5 μl template DNA (2 ng/μl) and 0.1 μl AccuPrime High Fidelity Taq Polymerase. The reaction mixtures were subjected to a denaturation at 94°C for 1 min, followed by 30 cycles of 94°C for 20 s, 53°C for 25 s and 68°C for 45 s and a final extension at 68°C for 10 min. Reactions were performed in triplicates for each sample to minimize potential biases from amplification66. The triplicate products from the first step PCR were pooled together and purified through QIAquick Gel Extraction Kit (Qiagen, Valencia, CA), eluted in 50 μl water and aliquoted into three PCR tubes. A second round PCR was then performed in a 25 μl reaction [2.5 μl 10 × AccuPrime PCR buffer II, 1 μl of each primer (10 μM), 0.1 μl AccuPrime High Fidelity Taq Polymerase and 15 μl aliquot of the first-round purified PCR product] with triplicates for each sample, using phasing primers with Illumina adapters, target primers, spacers and barcodes on reverse primers. Twenty cycles of the amplifications were performed following the same program for the first round PCR. PCR products from triplicate reactions were pooled together and quantified with PicoGreen. Sample denaturation was carried out by mixing 10 μl of combined PCR products and 10 μl 0.2 N fresh NaOH. The chilled Illumina HT1 buffer was added to the denatured DNA to make a 20 pM library, which was further diluted to 15 pM by adding HT1 buffer and mixed with a PhiX DNA library. Finally, the 600 μl mixture was loaded into the MiSeq reagent cartridge and run on Miseq (Illumina, San Diego, CA) for 2 × 250 bp paired-end sequencing.

Raw sequences were separated to samples using barcodes and with permission of up to one mismatch. Quality trimming was done using Btrim67. Forward and reverse reads were merged into full length sequences by FLASH68. Sequences were removed if they were too short or contained ambiguous bases. Random re-sampling was performed with 20,000 sequences per sample. The operational taxonomic units (OTUs) were classified using UCLUST at the 97% similarity level and singletons were removed. The taxonomic assignment was conducted by RDP classifier69 with minimal 50% confidence estimates.

GeoChip hybridization and data processing

Purified DNA was labeled with Cy3 dye, dried, rehydrated and hybridized with GeoChip 4.0 as previously described18. After purification, GeoChip microarrays were scanned with a 100% laser power and 100% photomultiplier tube with a NimbleGen MS 200 Microarray Scanner (Roche, Basel, Switzerland) and poor-quality spots with a signal-to-noise ratio of less than 2.0 were discarded prior to statistical analyses.

Signal intensities were quantified and processed using the analysis pipeline as previously described70. Then processed GeoChip data were analyzed using the following steps: (i) removing genes detected in no more than 5 out of 12 samples from the same vegetation type; (ii) normalizing the signal intensity of each spot by dividing the signal intensity by the total intensity of the microarray followed by multiplying by a constant; and (iii) transforming data to the natural logarithmic form.

The Simpson's reciprocal index (1/D) was used to evaluate microbial taxonomic and functional diversity. Principal coordinates analysis (PCoA) was used to determine changes of overall microbial community structure, both the high-throughput sequencing data and GeoChip data. Bray-Curtis distances were used to calculate the dissimilarity matrices from the high-throughput sequencing data and GeoChip data. Canonical correspondence analysis was used to explore linkages between microbial community structure and environmental parameters. The variance inflation factor (VIF) was used for step-wise removal of redundant parameters in CCA modeling17. Partial Mantel test was performed to correlate the microbial community (for high-throughput sequencing data, OTUs detected in more than 10 out of 24 samples was used) with the temperature, soil parameters and plant community. All of the analyses were performed by functions in the Vegan package (v.1.15–1) in R (v.3.0.1).

Network construction via the RMT-based method

Using the random matrix theory (RMT)-based method to automatically define a threshold, Pearson correlation networks were constructed from OTUs or GeoChip data as previously described. OTUs or selected functional genes (carbon and nitrogen cycle genes) detected in less than nine out of twelve replicates from the coniferous forest and the shrubland were removed to ensure reliable correlations. To explore the ecological interactions between environmental parameters and microbial community, OTUs and environmental parameters detected in less than eighteen out of twenty-four samples were removed to ensure reliable correlations. To characterize the modularity property, each network was separated into modules by the fast greedy modularity optimization. Various indices, including the average degree, average clustering coefficient, average path distance, geodesic efficiency, harmonic geodesic distance, density and transitivity and connectedness, were used to describe overall topological properties of different networks. The topological roles of different nodes were divided into the following four subcategories by within-module connectivity (Zi) and among-module connectivity (Pi): (i) peripheral nodes are defined as nodes with Zi ≤ 2.5 and Pi ≤ 0.62; (ii) connectors are defined as nodes with Zi ≤ 2.5 and Pi > 0.62; (iii) module hubs are defined as nodes with Zi > 2.5 and Pi ≤ 0.62; and (iv) network hubs are defined as nodes with Zi > 2.5 and Pi > 0.62.

Additional information

Data Accession Microarray data have been deposited to GEO databases ( Sequencing data are currently being deposited to GEO databases.