Introduction

Microorganisms are the foundation of the earth's biosphere, and have integral and unique roles in ecosystem functions, such as biogeochemical cycling of carbon, nitrogen, sulfur, phosphorus and various metals. They inhabit almost all imaginable environments to form communities that are always undergoing dynamic changes in structure, composition and function over space and time. However, the identification, detection, characterization and quantification of microbial communities face several challenges. First, microbial communities are extremely diverse with thousands to ten thousands of microbial species in a single gram of soil (Torsvik et al., 2002; Gans et al., 2005). Characterizing such a vast diversity and understanding the mechanisms shaping it presents numerous obstacles. Also, the majority of these microorganisms (>99%) have not yet been cultured (Whitman et al., 1998). In addition, it presents even more difficulty for microbiologists to establish mechanistic linkages between microbial diversity and ecosystem functioning (Zhou, 2009).

To tackle these challenges, microarray-based technology has been developed for analysis of microbial communities. Currently, this technology has become a powerful and high-throughput tool for analyzing microbial communities and monitoring environmental processes and ecosystem functions (c.f., Wu et al., 2001, 2006, 2008; Loy et al., 2002; Taroncher-Oldenburg et al., 2003; Bodrossy and Sessitsch, 2004; Rhee et al., 2004; Steward et al., 2004; Tiquia et al., 2004; Dix et al., 2006; Rodriguez-Martinez et al., 2006; He et al., 2007; Leigh et al., 2007; Yergeau et al., 2007; Zhou et al., 2008; Liang et al., 2009; Mason et al., 2009; Tas et al., 2009; Van Nostrand et al., 2009; Waldron et al., 2009; Wang et al., 2009). Especially, one of our previous functional gene arrays (FGAs), GeoChip 2.0, containing more than 24 000 probes and covering more than 10 000 gene sequences from 150 gene categories involved in key microbially mediated biogeochemical processes (He et al., 2007) has been widely used to analyze microbial communities from different resources, such as soils (Yergeau et al., 2007; Zhou et al., 2008), waters (He et al., 2007; Leigh et al., 2007; Tas et al., 2009; Van Nostrand et al., 2009; Waldron et al., 2009), oil fields (Liang et al., 2009), marine sediments (Wu et al., 2008), extreme environments (Mason et al., 2009; Wang et al., 2009), bioreactor systems (Rodriguez-Martinez et al., 2006) and other habitats (Kimes et al., 2010), to address a variety of questions related to biogeochemical cycles, bioremediation, global climate changes, microbial biography and ecological theories. All these results indicate that FGA-based microarrays such as GeoChip 2.0 can be used not only to analyze the structure, functional activity and dynamics of microbial communities, but also to link microbial communities with ecosystem processes and functions.

However, several major challenges still remain in the full use of such technologies for studying environmental microbiology and microbial ecology. First, with rapid development of high-throughput sequencing, the number of functional gene sequences of interest has increased exponentially. Thus, the GeoChip must be continuously updated and improved. Also, a retrieval of specific sequences for a particular functional gene is difficult because of the size of databases and the inconsistency or inaccuracy of sequence annotations. New strategies and advanced software tools for automatic sequence retrieval, data pre-processing and further statistical analysis are needed. In addition, because a universal standard such as genomic DNA for a pure organism is not applicable for community samples, a quantitative and accurate comparison of different microbial communities is difficult. To address these needs, in this study we report the design, construction, evaluation and application of a new generation of FGAs, GeoChip 3.0, that has distinct features from GeoChip 2.0 in terms of design, gene coverage, data normalization and comparison and future updates. GeoChip 3.0 was applied to show that plant diversity affected the structure, composition and potentially functional activity of belowground microbial communities.

Materials and methods

The following is the summary of methods used in this study, and more detailed information is provided in the Supplementary Data A.

Retrieval and verification of functional gene sequences

Sequence retrieval was performed by a GeoChip design pipeline (Figure 1). For each functional gene, a query of key words was first submitted to GenBank Protein Database to fetch all candidate amino acid sequences. Then, all candidate sequences were verified by HMMER 2.3.2 (Ashburn, VA, USA) (Eddy, 1998). In addition, all confirmed protein sequences were used to obtain their nucleic acid sequences from GenBank for probe design. For GeoChip 3.0, all sequences were downloaded from the GenBank databases before 5 May 2009.

Figure 1
figure 1

The design pipeline for GeoChip 3.0 construction. The whole pipeline runs on a web-based Common Gateway Interface (CGI) server and the scripts were written in Perl. Sequences for each functional gene were retrieved by key words and confirmed by HMMER (Eddy, 1998) with seed sequences. A new version of CommOligo 2.0 with group-specific probe design features was used to design 50-mer oligonucleotide probes (both gene-specific and group-specific) using the same criteria as described for GeoChip 2.0 (He et al., 2007). After the specificity of all designed probes was computationally checked with currently available databases (GenBank), the best probe for each sequence or each group of sequences was selected to synthesize for GeoChip 3.0 construction. Because all seed sequences and key words are stored in databases, automatic updates can be performed in the future.

Oligonucleotide probe design, synthesis and microarray fabrication

A new version of CommOligo (Li et al., 2005) with group-specific probe design features, CommOligo 2.0, was used to design both gene- and group-specific probes, and the best set of oligonucleotide probes were synthesized by Invitrogen (Carlsbad, CA, USA), and arrayed onto Corning UltraGAPS (Corning, New York, NY, USA) slides using a Microgrid II Arrayer (Genomic Solutions, Ann Arbor, MI, USA) as described previously (He et al., 2007).

BioCON experimental site, plant species and sampling

The BioCON (Biodiversity, CO2 and N) experimental site is located at the Cedar Creek Ecosystem Science Reserve, Minnesota, USA (lat. 45° N, long. 93° W), and its main field experiment has a total of 296 plots (2 × 2 m) evenly distributed in six 20-m diameter rings with three treatments: CO2 (ambient, 368 μmol mol–1 vs elevated, 560 μmol mol–1), nitrogen (N; ambient vs 4 g m–2 per year) and plant diversity. Four levels of plant diversity, 1, 4, 9, or 16 species, were chosen randomly for each plot from 16 perennial species native or naturalized to the Cedar Creek Ecosystem Science Reserve, including (1) four C3 grasses (Agropyron repens, Bromus inermis, Koeleria cristata, Poa pratensis), (2) four C4 grasses (Andropogon gerardii, Bouteloua gracilis, Schizachyrium scoparium, Sorghastrum nutans), (3) four N-fixing legumes (Amorpha canescens, Lespedeza capitata, Lupinus perennis, Petalostemum villosum) and (4) four non N-fixing herbaceous species (Achillea millefolium, Anemone cylindrica, Asclepias tuberosa, Solidago rigida) (Reich et al., 2001). Similar to our previous study with soil microbial communities under ambient and elevated CO2 at the same site (He et al., 2010), this study analyzed 31 soil samples from ring 2 (ambient CO2 and without N supply) with 11 plots each for 1 and 4 species, 5 plots for 9 species, and 4 plots for 16 species in July 2007.

DNA extraction, purification and quantification

Genomic DNA targets from two Shewanella species, MR-4 and W3–18–1, were extracted and purified as previously described (He et al., 2005), and soil DNA from 31 samples (11, 11, 5, and 4 plots from 1, 4, 9 and 16 species, respectively) taken at the BioCON experiment site (Reich et al., 2001) were extracted and purified as described previously (Zhou et al., 1996). DNA quality was assessed by the ratios of A260/A280 and A260/A230, and final DNA concentrations were quantified with a PicoGreen method (Ahn et al., 1996).

GeoChip 3.0 data and statistical analyses

Three types of targets were used for GeoChip 3.0 analysis: (1) 24 synthesized oligonucleotides, (2) genomic DNAs from two Shewanella species, MR-4 and W3–18–1 and (3) 31 soil samples from the BioCON study. Genomic DNAs or amplified soil DNAs were labeled, and all hybridizations and image processing were conducted as previously described (He et al., 2007). Scanned images were quantified using the software ImaGene 6.0 (Biodiscovery Inc., El Segundo, CA, USA), and the quantified microarray data were preprocessed as described previously (Wu et al., 2006; He et al., 2007). Pre-processed GeoChip data were further analyzed with different statistical methods, such as cluster analysis (Eisen et al., 1998) and detrended correspondence analysis, as described previously (Zhou et al., 2008).

Results

Selection of functional genes for GeoChip 3.0

For GeoChip 3.0, 292 key enzymes/genes have been selected to target a variety of microbially mediated processes (Table 1).

Table 1 The summary of probe and covered sequence information on GeoChip 3.0

Carbon cycling

Microorganisms have critical roles in C cycling of the biosphere. A total of 41 enzymes/genes are selected to detect different functional processes of the carbon cycle as described below. (1) CO2 fixation. Five pathways for autotrophic CO2 fixation have been identified so far (Berg et al., 2007). Four key enzymes were selected for the following pathways: ribulose-l, 5-bisphosphate carboxylase/oxygenase (Rubisco) for the Calvin cycle (Calvin, 1961), carbon monoxide dehydrogenase (CODH) for the reductive acetyl-CoA pathway (Evans et al., 1966), propionyl-CoA/acetyl-CoA carboxylase (PCC/ACC) for the 3-hydroxypropionate/malyl-CoA cycle (Herter et al., 2002) and ATP citrate lyase (AclB) for the reductive acetyl-CoA pathway (Ragsdale, 1991). (2) Starch. Eight enzymes/genes (alpha-amylase/amyA, amylopullulanase/amyX/apu, cyclomaltodextrin dextrin-hydrolase/cda, glucoamylase, isopullulanase, neopullulanase II/nplT and pullulanase/pulA) are selected to detect starch degradation activity. (3) Cellulose. Four enzymes/genes (cellobiose dehydrogenase, cellobiase/bgl, endoglucanase/egl and exoglucanase) are selected to detect cellulose degradation. (4) Hemicellulose. Five enzymes/genes (bacterial arabinofuranosidase, fungal arabinofuranosidase, mannanase, xylose isomerase/xylA and xylanase) are chosen to monitor hemicellulose degradation. (5) Chitin. Three enzymes (acetylglucosaminidase, endochitinase and exochitinase) are used to detect microbial chitin degradation. (6) Pectin. One enzyme (pectinase) is chosen to monitor pectin degradation. (7) Lignin. Four enzymes/genes (glyoxal oxidase/glx, lignin peroxidase or ligninase/lip, manganese peroxidase/mnp and phenol oxidase/lcc) are selected for lignin degradation. (8) Methanogenesis. The key gene mcrA encoding methyl coenzyme M reductase is selected for methanogenesis. (9) Methane oxidation. Two enzymes/genes (methane monooxygenase/mmoX and particulate methane monooxygenase/pmoA) are selected to detect methanotrophic activity. (10) Acetogenesis. Formyltetrahydrofolate synthetase is selected to detect both the Wood-Ljungdahl pathway of autotrophic CO2 fixation (acetogenesis) and the glycine synthase/reductase pathways of purinolysis (Ljungdahl, 1984, 1986; Mackenzie, 1984). (11) Others. Another eight enzymes/genes (camphor hydroxylase/camDCAB, limonene epoxide hydrolase/limEH, limonene monooxygenase/lmo, vanillate monooxygenase/vanA, vanillin dehydrogenase/vdhA, isocitrate lyase/aceA, malate synthase/aceB and alkylsuccinate synthase/assA) are also chosen for the metabolism of other carbon compounds (for example, aromatics).

Nitrogen (N) cycling

The N cycle is one of the major biogeochemical cycles, and consists of a variety of N pools (for example, N2, NH4+, NO3, NO2 and NO) and functional processes (for example, N fixation, nitrification and denitrification). A total of 16 enzymes/genes are selected to target different N cycling processes. (1) nifH for N fixation. Microbial N2 fixation is performed by free-living or symbiotic N-fixing microorganisms through the dinitrogenase reductase gene (nifH). (2) Nitrification. Nitrifying microorganisms (bacteria and archaea) are able to oxidize NH4+ to NO3 in the presence of O2. Two key enzymes/genes (ammonia monooxygenase/amoA and hydroxylamine oxidoreductase/hao) are selected. (3) Ammonification. Organic N is converted into inorganic N (NH4+) through ammonification (N mineralization). Two key enzymes/genes (glutamate dehydrogenase/gdh and urease/ureC) are selected. (4) Dissimilatory N reduction to ammonium (DNRA). NO3 and NO2 can be reduced to NH4+ by a variety of microorganisms through a dissimilatory N reduction process, and two key enzymes/genes (nitrate reductase/napA and c-type cytochrome nitrite reductase/nrfA) are selected for this process. (5) Assimilatory N reduction to ammonium. NO3 and NO2 can also be reduced to NH4+ by an assimilatory N reduction process, and three key genes (nitrate reductase/nasA, nitrite reductase/nirA/nirB) are selected. (6) Denitrification. Denitrifying bacteria are able to convert NO3 and NO2 to NO, N2O, and finally to N2 anaerobically through denitrification, which is one of the key processes in the biogeochemical N cycle and assumed to be one of the major sources of NO and N2O (greenhouse gases) emissions. Five key enzymes/genes (nitrate reductase/narG, nitrite reductase/nirS&nirK, nitric oxide reductase/norB and nitrous oxide reductase/nosZ) are selected for the detection of the denitrification process. (7) Anaerobic ammonium oxidation (anammox). Nitrite and ammonium are converted directly into N2 by anammox bacteria with a key enzyme/gene hydrazine oxidoreductase/hzo.

Sulfur cycling

Four enzymes/genes are used to detect the sulfur cycling of microbial communities with two (sulfite reductase/dsrA/dsrB) for sulfur reduction, one (sulfite oxidase/sox) for sulfur oxidation and one (dissimilatory adenosine-5′-phosposulfate reductase/aprA) for both the microbial sulfate reduction and sulfur oxidation processes (Meyer and Kuever, 2007).

Phosphorus cycling

Three enzymes were used to detect the phosphorus cycling of microbial communities with two (polyphosphate kinase/ppk and phytase) for polyphosphate biosynthesis and one (exopolyphosphatase/ppx) for polyphosphate degradation.

Key energy metabolism

We selected only two enzymes, hydrogenase and cytochrome, from some well-known genera (for example, Geobacter, Anaeromyxobacter, Desulfovibrio, Shewanella, Desulfurobacterium, Desulfobacterium, Rhodobacter and Pseudomonas) for detecting energy metabolism processes of microbial communities.

Metal resistance

Metals and radionuclides are common contaminants in the environment due to anthropogenic activities or natural emissions (Adriano, 2001). As a result of the presence of potentially toxic metals, microorganisms have developed resistance mechanisms to limit exposure to or minimize damage from these metals (Silver, 1996; Silver and Phung, 2005). A total of 41 genes/enzymes comprising resistance mechanisms for 13 metals (Ag, Al, As, Cd, Co, Cr, Cu, Hg, Ni, Pb, Se, Te and Zn) as well as bacterial metallothioneins and metallothionein-like proteins are selected to detect resistance and biotransformation of metals. These metals are common pollutants and microbial mechanisms of resistance to these metals have been well studied (Silver, 1996; Nies, 2003; Silver and Phung, 2005). The majority of these genes confer resistance through transporters, the most common mechanism of prokaryotic metal resistance (Nies, 2003). In addition to transporters, three genes involved in arsenic resistance, including arsenate reductase, and seven for mercury resistance, including mercuric reductase and organomercural lyase, are present.

Degradation of organic contaminants

Contamination by organic chemicals is a worldwide concern and much research has been undertaken to understand the role of microorganisms in the degradation and remediation of organic contaminants. To develop a molecular tool to monitor various degradation pathways, a gene from each step of a contaminant degradation pathway is selected for probe design. As a result, a total of 173 genes/enzymes were selected to detect the potential for degradation of 86 organic contaminants commonly found in the environment. These include 38 enzymes involved in aromatic carboxylic acid (for example, benzoate, phenylpropionate, phthalate) degradation, 18 for BTEX (benzene, toluene, ethylbezene and xylene), 10 for chlorinated aromatics (for example, 2-, 3-, and 4-chlorobenzoate, 2,4,5-trichlorophenoxyacetic acid), 9 for heterocyclic aromatics (for example, carbazole, dibenzothiophene), 9 for nitroaromatics (for example, nitrobenzene, nitropehnol), 18 for polycyclic aromatics (for example, biphenyl, fluorene, naphthalene), 22 for other aromatics (for example, aniline, catechol, phenol), 6 for chlorinated solvents (for example, chloromethane, dichloromethane), 15 for other hydrocarbons (for example, alkanes, cyclohexane, tetrahydrofuran), 12 for herbicides, 4 for pesticides (for example, atrazine, parathion, lindane), and the remainder (12) are for various chemicals, including acrylonitrile, methanesulfonic acid and tetrahydrofuran. These contaminants are selected from the Priority List of Hazardous Substances (HazDat, 2007) of the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA; the Superfund program), which ranks contaminants based on their frequency of occurrence and environmental toxicity, and from the Biocatalysis/Biodegradation Database (Ellis et al., 2006) of the University of Minnesota, which lists known degradation pathways for hundreds of chemicals.

Antibiotic resistance

Antibiotic resistance is a growing concern as more and more pathogens develop resistance to common antibiotics. Microorganisms are often exposed to antibiotics in the environment, and antibiotic resistance is often associated with metal resistance (Stepanauskas et al., 2005). In all, 11 genes for antibiotic resistance are included with five for transporters including ATP-binding cassette, multidrug toxic compound extrusion, major facilitator superfamily, Mex, and small multidrug resistance efflux pumps, four β-lactamase genes (Classes A–D) and genes for tetracycline and vancomycin resistance.

Phylogenetic marker

16S rRNA gene sequences are widely used as a phylogenetic marker for bacterial systematics and ecology, but it is difficult to use them to obtain fine-scale resolutions at species/strain levels (Yamamoto and Harayama, 1995) because the rate of evolution of this molecule is low. An alternative phylogenetic marker molecule, gyrB, which encodes DNA gyrase β-subunit gene (gyrB), has been widely used for differentiating closely related species/strains (Yamamoto and Harayama, 1998). A phylogenetic tree based on gyrB results in a magnitude higher resolution than a tree based on 16S rRNA gene (Yamamoto and Harayama, 1995, 1998; Wang et al., 2007). Thus, gyrB gene was selected as a phylogenetic marker on GeoChip 3.0.

Pipeline for GeoChip development

A web-based GeoChip design pipeline has been developed for implementing our GeoChip 3.0 design strategies in four major components: sequence retrieval and verification, oligonucleotide probe design, probe validation and output, and automatic update (Figure 1).

  1. 1)

    Sequence retrieval and verification. Three key steps were implemented to ensure the accuracy of retrieved sequences. First, a query consisting of key words describing the selected functional gene/protein was submitted to GenBank, and all available protein sequences were retrieved. Generally, the query found all sequences from bacteria, archaea and fungi. However, for cytochromes and hydrogenases, we restricted sequences to be retrieved only in a few common bacterial genera, such as Geobacter, Anaeromyxobacter, Desulfovibrio, Shewanella, Desulfurobacterium, Desulfobacterium, Rhodobacter and Pseudomonas. Second, seed sequences (normally >5) were identified on the basis of experimental evidence and knowledge of a functional gene from the literature. Third, a consensus model was built with the selected seed sequences, and unrelated sequences were removed by HMMER (Eddy, 1998). Normally, only those sequences with global alignment e-values of <0.1 were considered as highly confident targets and others with local e-values of <1.0 were manually determined to be targets by the user. All verified protein sequences were used to fetch corresponding coding sequences and stored in our local databases (Figure 1). The query, seed sequences and functional gene sequences can also serve as the basis for future automatic updates.

  2. 2)

    Oligonucleotide probe design. Owing to the nature of functional gene sequences, a novel software CommOligo 2.0 based on our previous version of CommOligo (Li et al., 2005) was used to select 50-mer oligonucleotide probes with experimentally established oligonucleotide design criteria (He et al., 2005). To increase the coverage for each functional gene, two types of probes, sequence- and group-specific probes, were designed for GeoChip 3.0 (Figure 1). Sequence-specific probes were preferentially designed, which target a single sequence, whereas group-specific probes target a group of highly homologous sequences exclusively.

  3. 3)

    Probe validation. Multiple (for example, 20) probes for each sequence or each group of sequences were designed, and they were verified by MegaBLAST (Zhang et al., 2000) against all nucleic acid sequences of GenBank. Only those probes with 20-bp long stretches, 90% sequence identities and −35 kcal mol–1 free energy with their non-targets were kept. The best probe for each single or group of sequences was then stored in the probe database, and commercially synthesized (Figure 1).

  4. 4)

    Automatic update. To reflect the current status of functional genes, this pipeline has a feature to update gene sequences and their probe design information automatically.

Overall description of GeoChip 3.0

GeoChip 3.0 covers 56 990 gene sequences for 292 functional genes with 27 812 probes: 9251 (33.3%) sequence-specific and 18 561 (66.7%) group-specific probes (Table 1). On the basis of functional categories, most of probes target genes involved in organic contaminant degradation (31.0%), carbon cycling (18.7%), metal resistance (17.5%) and N cycling (13.5%), with 30.6, 18.6, 19.2 and 13.8% of all covered sequences, respectively (Figure 2 and Table 1). Among 27 812 probes, 24 939 probes (89.7%) target 2744 species of bacteria, 886 (3.2%) target 140 species of archaea, 1759 (6.3%) target 262 species of fungi and 228 (0.8%) target plasmids and uncultured/unidentified prokaryote organisms (Table 2). GeoChip 3.0 also contains eight degenerate probes for the 16S rRNA gene for positive controls (spotted on each sub-grid at least two times), and 672 unique probes designed from hypothetical genes of seven sequenced genomes of hyperthermophiles for negative controls. In addition, a 50-mer common oligonucleotide reference standard (CORS) was mixed with all these probes, including gene probes and controls, and co-spotted on GeoChip 3.0 as a common reference standard for data normalization and comparison (Liang et al., 2010). With such diverse probes and CORS, GeoChip 3.0 could provide more comprehensive analysis of the structure, composition and functional activity of bacterial communities, and also provide useful information for study of soil fungal communities, although those fungal probes mainly target fungal genes involved in limited processes such as carbon degradation.

Figure 2
figure 2

Distribution of all designed probes and covered gene sequences among different categories.

Table 2 Summary of GeoChip 3.0 probes by covered microbial domain and phylum

Specificity evaluation

To examine the specificity of designed probes, the distributions of maximum sequence identity, maximum stretch length and minimal free energy to their closest non-targets were computationally evaluated. On one hand, most of designed sequence-specific probes fell in the ranges of sequence identity, stretch or free energy far away from the thresholds of probe design criteria with their non-targets (Figure 3). Only 4.8% probes had 86–90% identities with their closest non-targets (Figure 3a), 8.0% with continuous stretch lengths of 19–20 bases (Figure 3b) and 6% with free energy of −16 to −35 kcal mol–1 (Figure 3c). On the other hand, group-specific probes covered the most similar regions with all members in the same group (Figure 4). Approximately 96.1% of group-specific probes had 100% sequence identities, and only 3.2% and 0.7% of probes had 98% and 96% identities, respectively, with their targets (Figure 4a). In addition, 98.1% of group-specific probes perfectly matched with their targets, and the rest of them (1.9%) had at least 35 identical bases (Figure 4b). In addition, 86.3% of group-specific probes had maximum free energies from −65 to −80 kcal mol–1, and 12.7% and 1.0% had free energy from −60 to −65, and <−80 kcal mol–1, respectively (Figure 4c). The results suggest that all designed probes have a generally high specificity to their corresponding targets.

Figure 3
figure 3

Distribution of 9251 sequence-specific designed probes at their (a) maximal sequence identities, (b) maximal stretch lengths or (c) minimal free energy with their non-targets.

Figure 4
figure 4

Distribution of 18 561 group-specific probes at their (a) minimal sequence identities, (b) minimal stretch lengths or (c) maximal free energy with their group targets.

The specificity of the designed GeoChip 3.0 was further evaluated experimentally. Both complementary synthesized oligonucleotides and genomic DNA targets were used to hybridize with GeoChip 3.0. First, the specificity was evaluated using an equal mixture (10 pg for each target) of 24 synthesized oligonucleotides (Supplementary Table S1). The hybridizations were performed at 42, 45 and 50 °C in the presence of 50% formamide (1% formamide=0.6 °C) that was equivalent to 72, 75 and 80 °C, respectively. The hybridization results showed that all 24 probes corresponding to their complementary oligonucleotides showed positive signals at 42 °C and 45 °C, but that one of them was unable to be detected (false negative) at 50 °C. However, seven (0.025%) unexpected probes were detected (false positive) at 42 °C, whereas only one (0.0036%) false positive was detected at 45 or 50 °C (Figure 5). The results suggest that the optimal hybridization temperature for GeoChip 3.0 was 45 °C with 50% formamide because the fewest false-positive and negative spots were detected. This is also consistent with GeoChip 2.0 and our other previous FGAs with fewer probes (Rhee et al., 2004; Tiquia et al., 2004; He et al., 2007). Second, genomic DNAs from two Shewanella strains, MR-4 and W3–18–1, were mixed with 500 ng for each genome, and used as the target to evaluate GeoChip 3.0 specificity. It is expected that nine probes for Shewanella sp. MR-4 and 35 for Shewanella sp. W3–18–1 on GeoChip 3.0 would hybridize with the target. The hybridizations conducted at 45 °C and 50% formamide showed that 42 of 44 expected probes had positive signals with an average intensity of 11 428±7223, and an average signal-to-noise ratio=19.8±9.7 (Table 3). In addition, seven probes showed false-positive signals on GeoChip 3.0, but they had much lower average signal of 3687±2191 than the real positive spots of Shewanella sp. MR-4 and W3–18–1. Similar results were observed with synthesized oligonucleotide targets (Table 3). Considering the total number of probes on GeoChip 3.0, the percentage (0.0036–0.025%) of false positives was negligible. Therefore, all the above results indicate that the probes on GeoChip 3.0 are highly specific.

Figure 5
figure 5

The GeoChip 3.0 was hybridized with a mixture of 24 synthesized oligonucleotide targets at 42 °C, 45 °C and 50 °C in the presence of 50% formamide. The numbers of detected spots, expected spots, false positives and false negatives are shown with five replicates for each condition.

Table 3 Summary of GeoChip 3.0 evaluation with different targets (synthesized oligonucleotides or genomic DNAs) at 45 °C and with 50% formamide

Applications of GeoChip 3.0 for profiling soil microbial communities

To show the power of the developed GeoChip 3.0, we examined the effects of plant species diversity on soil microbial communities in the BioCON site. Soil samples were taken in July 2007 from 31 plots (11 each for 1 and 4 species, 5 for 9 species and 4 for 16 species) grown at ambient CO2 without additional N supply (Reich et al., 2001; Reich, 2009). All 31 samples were analyzed by the developed GeoChip 3.0. The signal intensity for all hybridizations was normalized using the CORS strategy. Detrended correspondence analysis of all detected 4012 genes showed that the microbial community and structure was different between mono-species (1 species) and multispecies, and that 16-species samples appeared to be well separate from 4- or 9-species samples (Figure 6). Consistently, the average number of detected genes was 909.4±103.30 for mono-cultures, which was significantly lower than those for multispecies with 1465±201.18 for 4 species, 1549.6±278.88 for 9 species, and 1540.8±173.08 for 16 species. To understand how plant diversity affects the structure and functional activity of soil microbial communities, we further analyzed the nifH gene for N2 fixation. Two clusters, monoculture and multi-species samples, were observed at a high level, and in the multi-species cluster, four subclusters were further formed generally with one for 16-species samples, one enriched for 9-species samples and two enriched for 4-species samples (Figure 7). These results suggested that the structure and functional activity of soil microbial communities were affected by the plant species diversity, and that mono-cultures of plants may lead to a decrease in the belowground microbial community diversity.

Figure 6
figure 6

Detrended correspondence analysis (DCA) of soil microbial communities using GeoChip 3.0 data from 11, 11, 5 and 4 (a total of 31) plots planted with 1, 4, 9 and 16 species, respectively. These species were randomly selected from four functional groups, C3, C4, legume and forb, with four species for each group (Reich et al., 2001).

Figure 7
figure 7

Cluster analysis of nifH genes detected by GeoChip 3.0. In total, 114 nifH genes were detected, and 71 of them detected in at least 5 out of 31 samples (11, 11, 5 and 4 from 1, 4, 9 and 16 species, respectively) were used for hierarchical clustering analysis. The figure was generated using CLUSTER 3.0 and visualized in TREEVIEW (Berkeley, CA, USA) (Eisen et al., 1998). Black indicates signal intensities below the threshold value and red indicates a positive hybridization signal. The color intensity indicates differences in signal intensity.

Discussion

Microarray-based genomic technology has been widely used for microbial community analysis, and it is expected that microarray-based genomic technologies will revolutionize the analysis of microbial community structure, function and dynamics. The developed GeoChip 3.0 has several distinct features when compared with GeoChip 2.0 (He et al., 2007), and similar to GeoChip 2.0, our computational and experimental evaluation indicates that GeoChip 3.0 is a specific, sensitive and quantitative tool. Using GeoChip 3.0, the BioCON study showed that plant diversity significantly affected the structure, composition and potential functional activity of soil microbial communities.

GeoChip 3.0 has a few distinct features when compared with GeoChip 2.0 (He et al., 2007). First, GeoChip 3.0 is more comprehensive, covering approximately 57 000 gene sequences for 292 functional gene families whereas GeoChip 2.0 covers approximately 10 000 gene sequences in 150 gene families, which allows us to obtain more information about functional processes of microbial communities and to analyze more diverse environmental samples. Second, automatically retrieved sequences by key words are verified by HMMER using seed sequences instead of manual verification of sequence homology in GeoChip 2.0, and hence it is much quicker to retrieve the sequences of interest. Third, a CORS has been implemented in GeoChip 3.0 so that data normalization and comparison of different microbial communities can be performed (Liang et al., 2010), and this method can also minimize the effects of cross-hybridization and increase quantitative capability. Fourth, a software package and associated GeoChip design pipeline have been developed for sequence retrieval and verification, probe design and validation, array construction, data analysis, information storage and automatic update, which greatly facilitate the management of such a complicated array, especially for future updates. In addition, GeoChip 3.0 includes the phylogenic marker gyrB, which allows us to analyze functional activity and phylogenetic identity of microbial communities together and link microorganisms to their functions. Finally, we have added functional genes involved in antibiotic resistance, which allows us to detect and monitor microbial communities related to public health. All these unique features enable GeoChip 3.0 to be a powerful tool for analysis of the composition, structure, potential function and dynamics of microbial communities, and also link the microbial community structure with ecosystem functioning.

Specificity is a central issue in microarray technology. To ensure GeoChip 3.0 specificity, a few measurements have been taken. First, with experimentally established probe design criteria (He et al., 2005; Liebich et al., 2006), we used a novel software tool, CommOligo 2.0 for oligonucleotide probe design. CommOligo 2.0 considers multiple criteria (for example, sequence identity, continuous stretch length and free energy), and unique algorithms (for example, global alignment and ranking probe candidates) for the optimal probe selection. Second, each designed probe is validated against currently available databases. Third, a computational evaluation showed that a majority (90–95%) of probes on GeoChip 3.0 were far away from the thresholds of probe design criteria, indicating that they should be highly specific to their corresponding targets. Fourth, the GeoChip 3.0 specificity was examined with synthesized oligonucleotides and Shewanella genomic DNAs, and only a very small portion of false positives (0.0036–0.025%) was observed. As it was discussed in GeoChip 2.0, those false positives may be because of random error (He et al., 2007).

Group-specific probes are very important for environmental studies because functional genes are generally highly homologous. GeoChip 3.0 has more group-specific probes (66.7%) and covers more than 47 000 sequences in comparison with GeoChip 2.0 with 17.7% group-specific probes covering 3000 sequences. Such an increase in group-specific probes is mainly because of a significant increase in the number of sequences for each functional gene. Our computational analysis of these group-specific probes showed that the majority (95%) of those probes on GeoChip had 100% sequence homology to their corresponding target sequences and had no more than 90% of homology to their non-target sequences, suggesting that these group-specific probes are able to detect groups of homologous sequences. Although we did not evaluate sensitivity and quantitative capability for GeoChip 3.0, it is believed that the same properties remain with GeoChip 3.0 as previously described (Rhee et al., 2004; Tiquia et al., 2004; Wu et al., 2006) because we used the same probe format (50-mer oligonucleotides) and the same array fabrication technologies. Therefore, all these results suggest that GeoChip 3.0 is a specific, sensitive and quantitative tool.

GeoChip is a powerful tool for analyzing the structure, function and dynamics of microbial communities, and linking these properties to ecosystem processes and functions. Currently, high-throughput metagenomic technologies, such as GeoChip (He et al., 2007), PhyloChip (Brodie et al., 2006, 2007) and pyrosequencing (Margulies et al., 2005), provide us opportunities to better characterize microbial communities. In contrast to the 16S rRNA gene-based PhyloChip, GeoChip contains probes from all functionally known geochemical, ecological and environmental processes, and hence it can provide information about the structure and function of microbial communities. Because all GeoChip probes are selected from coding sequences of functional genes, GeoChip can be used not only for measuring the abundance, but also for the expression of functional genes in a microbial community if high quality of mRNAs can be recovered from environmental samples.

Pyrosequencing of targeted functional genes becomes possible, but requires conserved PCR primers and PCR amplifications. Designing PCR primers may be very difficult for most functional genes of interest. Even for genes with PCR primers available, some environmental samples may not be amplified. In addition, quantitative accuracy may be jeopardized because of PCR amplifications. In contrast, GeoChip does not rely on conserved primers or PCR amplifications, although it only detects functionally known genes. Therefore, GeoChip is an ideal tool for analyzing the structure, function and dynamics of microbial communities and for linking microbial diversity with ecosystem functioning. To our knowledge, GeoChip 3.0 is the most comprehensive microarray currently available for studying microbial communities associated with geobiochemical cycling, global climate change, bioenergy, agricuture, land use, ecosystem management, environmental cleanup and restoration, bioreactor systems and human health.