Introduction

Owing to its extremely important role in economic sustainability as well as for global nutrient cycling, bacterial diversity in soil has extensively been investigated using culture-independent 16S rRNA gene-based analysis (for example, Borneman and Triplett, 1997; Dunbar et al., 2002; Tringe et al., 2005; Costello and Schmidt, 2006; Schloss and Handelsman, 2006; Roesch et al., 2007; Elshahed et al., 2008). Collectively, these studies have revealed that, on a species level, soil is an extremely diverse habitat, with species richness estimates ranging between 2000 and 52 000 (for example, Tringe et al., 2005; Roesch et al., 2007). In addition to the highly diverse nature of individual soil samples, soil bacterial populations show immense global diversity at the putative species and strain levels. Within the thousands of soil studies conducted so far, no two clone libraries have exactly the same community composition, and exact (100%) sequence matches between the most abundant species and database-deposited sequences are very rarely encountered (Janssen, 2006; Elshahed et al., 2008).

However, on a higher taxonomic level, soil exhibits a remarkably stable community structure, in which the absolute majority of clones (92% in a recent compilation of 32 different 16S rRNA gene soil clone libraries (Janssen, 2006)) always belongs to nine major bacterial phyla: the Proteobacteria (mainly the α-, β- and δ-subdivisions), Actinobacteria, Acidobacteria, Chloroflexi, Bacteroidetes, Firmicutes, Planctomycetes, Verrucomicrobia and Gemmatimonadetes, out of an estimated 104 candidate phyla recognized at present according to Hugenholtz taxonomic framework in Greengenes database (July 2008) (DeSantis et al., 2006a, 2006b). This phylum-level community structure stability is remarkable, considering the wide variations in temperature, pH, soil properties, land usage regimes and vegetation among various soils.

We hypothesized that these previously outlined environmental and physicochemical variations in soils are mainly reflected in differences in the community composition (that is, relative abundance and diversity), mainly within these 11 major bacterial lineages (including the three Proteobacterial lineages). Relative abundance values (% representation of a specific lineage in a clone library) and abundance-based rankings are easily obtained and have been reported in many previously described 16S rRNA gene clone library studies (for example, Borneman and Triplett, 1997; Kuske et al., 1997; Nogales et al., 2001; Axelrood et al., 2002; Lipson and Schmidt, 2004; Roesch et al., 2007; Lesaulnier et al., 2008). On the other hand, comparing and ranking the diversity of various lineages within a soil sample and determining whether these relative diversity patterns are affected by soil physicochemical characteristics, soil treatments or historic land usage has never been reported previously.

Ranking phyla according to diversity within a single soil sample has been hampered by the small sample size of most 16S rRNA libraries available at present. Most available soil data sets derived from a single soil sample are relatively small, for example, <1000 clones, and larger data sets are usually a composite of multiple samples examining different treatments within the same ecosystem (Lesaulnier et al., 2008; Talera et al., 2008). Given that some of these 11 lineages sometimes represent less than 1% of the total sequences in a soil clone library (Schloss and Handelsman, 2006; Roesch et al., 2007), using a large data set is imperative to obtain a reasonable size subclone library for diversity estimates in all bacterial lineages. Recently, two studies reported on the bacterial community in five different soils using either near full-length sequences (Elshahed et al., 2008) or short pyrosequencing-generated fragments (Roesch et al., 2007). These five clone libraries are significantly larger than all previously reported 16S rRNA gene soil clone libraries, and thus present an opportunity to examine the relative diversity and rankings of various bacterial lineages in a diverse set of soils. Here, we report on using multiple sample size-unbiased approaches (single diversity indices, rarefaction curve analysis and diversity-ordering approaches) to determine the richness, evenness and diversity of the 11 major bacterial lineages (including the three Proteobacterial lineages) in each of these five data sets and report on the characteristic diversity ranks and diversity–abundance correlations observed, as well as their implications on ecosystem functions.

Materials and methods

Soil data sets

Five data sets, ranging in size between 13 001 and 53 533, were used in this study. Kessler Farm Soil clone library (KFS, n=13 001) was generated using 16S rRNA gene primers 27F and 1391R from an undisturbed tall grass prairie preserve in central Oklahoma (Elshahed et al., 2008). The other four clone libraries consist of pyrosequencing-generated fragments (average length 106 bp), generated using primer pair 1391F and 1525R (Roesch et al., 2007). The clone libraries originated from a maize field in southern Brazil (Brazil clone library, n=26 140), a sugarcane field in Florida (Florida clone library, n=28 328), an agricultural plot at the University of Illinois (Illinois clone library, n=31 818) and a boreal forest in Canada (Canada clone library, n=53 533). Details on the soil characteristics, sampling, DNA extraction, PCR amplification procedures, cloning (if applicable) and sequencing were described in the original manuscripts (Roesch et al., 2007; Elshahed et al., 2008).

Alignment, classification and operational taxonomic unit (OTU) assignment of clones in KFS library were reported previously in detail (Elshahed et al., 2008). In brief, after detecting and excluding potential chimeras using Mallard (Ashelford et al., 2006), the sequences were grouped into phyla, and each phylum was aligned using Greengenes NAST-aligner (DeSantis et al., 2006a, 2006b). Distance matrices were created for each group using the ‘create distance matrix’ function in Greengenes webserver (DeSantis et al., 2006a, 2006b) and the matrix generated was used to assign all clones to OTUs at a 97% sequence similarity cutoff using DOTUR (Schloss and Handelsman, 2005).

Sequences of each of the four pyrosequencing-generated Brazil, Florida, Illinois and Canada clone libraries were downloaded from Genbank database (accession numbers available in the original manuscript (Roesch et al., 2007)), and sequences from each library were sequentially aligned, and classified using Greengenes aligner and Classifier programs, respectively (DeSantis et al., 2006a, 2006b). Greengenes-based alignments and classifications were simultaneously conducted for all five clone libraries in a span of 2 weeks (April 2008) to exclude or at least minimize the effect of periodic update of the database on the classification results. Aligned sequences belonging to each of the target lineages were extracted from the master alignment file using a perl-script (available upon request). Each lineage-specific file was used to create a distance matrix using the ‘create distance matrix’ function in Greengenes webserver (DeSantis et al., 2006a, 2006b). The output matrix was used for OTU assignment at 97% sequence similarity cutoffs using DOTUR (Schloss and Handelsman, 2005). DOTUR output files were used as a starting point for rarefaction curve construction and computing various diversity indices used in this study. The numbers of clones, OTU0.03, and the relative abundance of each of the 11 bacterial lineages examined in this studies in the five clone libraries analyzed are provided as Supplementary material (Supplementary Table S1).

Diversity estimates

We sought to estimate and compare the richness and evenness aspects, as well as the total diversity for the 11 most abundant bacterial soil lineages in five different clone libraries. Single diversity indices were used to estimate richness and evenness, whereas rarefaction curve analysis and diversity-ordering approaches were used to rank lineages within environments.

Single indices rankings

In our attempt to choose indices for either the richness or the evenness aspects of diversity, we leaned towards indices that are minimally affected or totally unaffected by sample-size variations, as the sample sizes of the lineages compared within each soil as well as across the five soils were different (Supplementary Table S1). We chose the log series index (α) as the richness estimate (Kempton and Taylor, 1974, 1976; Taylor, 1978) (Table 1), as it has been shown that its value is relatively independent of sample size variation and is totally unaffected by sample size when the number of individuals in a community (N) is more than 1000 (Magurran, 2004). Out of the 55 lineages compared in this study (11 lineages per soil), 33 had N>1000, 10 had 500<N<1000 and 12 had N<500. As pointed out by Hill et al. (2003), even with smaller sample sizes, although α increases with the increase in sample size, it is still more stable to sample-size variation compared with other richness indices, for example, Margalef (Clifford and Stephenson, 1975) and Menhinick (Whittaker, 1977) indices, and is more sensitive to changes in OTU richness. For evenness estimation, the complement of Simpson dominance index D (Simpson, 1949) (1−D) was used as the evenness statistics (Magurran, 2004; Jarvis et al., 2008) (Table 1). Simpson index is known to put more weight on the abundant members of the community and hence is not affected by the variation in sample size.

Table 1 Formulae of various diversity indices used in this study

For each lineage in the five clone libraries, both the log series (α) and the complement of Simpson's dominance index (1−D) were computed as suggested by Magurran (2004). Values for the log series index and the complement of Simpson index for all 55 lineages analyzed in this study are provided as Supplementary material (Supplementary Table S2).

Rarefaction curve rankings

Rarefaction has been viewed as a solution to the sample-size dependence of single diversity indices (Hughes et al., 2001). Communities can be compared using their rarefaction curves, where the community whose rarefaction curve lies above is considered to be more diverse. Rarefaction curves can be used to compare communities with different sample sizes (Hughes et al., 2001) and take into account the two components (richness and evenness) of diversity (Olszewski, 2004). We used DOTUR (Schloss and Handelsman, 2005) to construct rarefaction curves for each of the 11 lineages in all five soils. In each soil, all the rarefaction curves were plotted on the same graph and phyla were ranked from the least diverse (lowermost rarefaction curve) to the most diverse (uppermost rarefaction curve). In cases where the rarefaction curves of two communities crossed, their diversities are incomparable using this method, and they are given a tied rank.

Diversity-ordering rankings

In addition to rarefaction curve-based rankings, we used diversity-ordering methods (12 different methods) to generate diversity profiles (plots of values of diversity indices versus a scale parameter for each community or lineage) that could then be compared graphically with rank lineages in all five different data sets. It has been argued previously that defining diversities in complex communities as a single number using single diversity indices is not practical, as it summarizes the multidimensional concept of community into a single scalar quantity (Tóthmérész, 1995; Liu et al., 2007). The varying degrees of emphasis that the single diversity indices put on rare or abundant species have been regarded as a disadvantage. Owing to the inadequacy of single indices, ecologists in various subfields of ecology have used diversity-ordering approaches to quantify diversity (see Liu et al., 2007 for a review). Diversity-ordering methods, being a parametric family of indices, give a more complete summarization of community diversity, with varying sensitivities or degrees of emphasis on rare versus common species as the scale parameter changes (Tóthmérész, 1995 no. 11; Liu et al., 2007 no. 3). This makes them less dependent on sample size, compared with many of the single diversity indices commonly used in microbial ecology. Moreover, the use of multiple methods of ordering (see below) and their comparison ensure a more accurate and complete assessment of diversity.

Diversity-ordering methods are placed into three groups depending on the properties they measure: information-related methods, expected number of species-related methods and intrinsic diversity-related methods. Table 1 shows the different methods within each group, their scale parameter and the formulae used for the calculation. The scale parameters for the information-related and the expected number of species-related methods are infinite (for a reference for the allowed values of the scale parameters, see Liu et al., 2007 no. 3). This means that, on using indices that belong to the these groups of diversity-ordering methods, even if the profiles did not cross within the range of the scale parameter used for comparison, there is no guarantee that they would not cross at higher values of the scale parameter. However, within the third group of methods (intrinsic diversity-related methods), only a finite number of values need to be calculated and compared (equivalent to the observed number of species (S) or S−1). This means that if the profiles did not cross, then the communities are definitely comparable. This makes the intrinsic diversity-related indices the most stringent. Again, as with rarefaction curves, if the diversity profiles of two communities or lineages crossed, they are considered incomparable using this method and given a tied rank. Figure 1 shows an example of comparative diversity of two lineages, the Planctomycetes and Gemmatimonadetes in Canada soil, using three diversity-ordering indices (one from each group): Renyi generalized entropy index, Hulbert diversity index and right tail-sum method. Although the profiles of the Planctomycetes and Gemmatimonadetes crossed with the right tail-sum method and hence were incomparable (Figure 1c), they were comparable using the other two methods (Figures 1a and b), and Planctomycetes was found to be more diverse than Gemmatimonadetes in both cases.

Figure 1
figure 1

Example of diversity-ordering profiles: a pairwise comparison of two lineages (Gemmatimonadetes, open diamonds; and Planctomycetes, open squares) within one (Canada clone library) of the five different data sets tested using (a) Renyi generalized entropy method as an example of information-related group of methods. (b) Hulbert family of diversity indices as an example of expected number of species-related group of methods. (c) Right tail-sum method as an example of intrinsic diversity-related group of methods. Similar pairwise comparisons were conducted between the 11 tested phyla in each of the five data sets used in this study to reach a consensus ranking within each soil data set.

For each of the 12 different diversity-ordering indices, a stepwise approach was taken, where pairwise comparisons between various lineages in the data set were conducted until ranking of all 11 lineages is achieved. A consensus ranking for each group of indices (that is, an information-related indices ranking, an expected number of species indices ranking and an intrinsic diversity indices ranking) was obtained for each clone library. The final rankings (Table 2) represent averages of the three groups rankings. Whenever lineages where incomparable, they were given a tied rank.

Table 2 Diversity rankings of 11 soil bacterial lineages using rarefaction curve analysis and diversity-ordering approachesa

To our knowledge, diversity ordering has not been previously used in comparing microbial communities. We believe that this approach might prove extremely valuable in studies where the two compared communities do not share any OTUs, such as ranking various phyla within a data set or comparing archaeal, bacterial and eukaryotic diversity within an ecosystem. It should also prove useful in comparing diversities of ecosystems with little shared OTUs on account of wide variations in environmental conditions.

Diversity–abundance correlations

To investigate the relationship between relative abundances and each of the richness, evenness and total diversity for the 11 lineages compared in this study, we calculated Pearson correlation coefficients between percentage abundance and values of the richness index (α), as well as the complement of Simpson's dominance index (1−D) for each of the 11 lineages across the five data sets analyzed. On the other hand, as total diversity is represented as a rank (rarefaction-based and diversity ordering-based), Spearman rank correlation coefficient was used to correlate between relative abundance ranks and total diversity ranks (the average of rarefaction curve and diversity-ordering rank) for each of the 11 lineages across the five data sets analyzed.

Results

Diversity ranking patterns in soil data sets

The 11 bacterial lineages examined in this study were ranked according to rarefaction curve analysis, as well as diversity-ordering approaches, using information-based methods (six indices), expected number of species-based methods (two indices) and intrinsic diversity-based methods (four indices). Phyla were ranked from one (least diverse) to 11 (most diverse).

The results (Table 2) of the ranking using rarefaction and diversity ordering were broadly in agreement, and indicated that some phylogenetic lineages were collectively classified as highly diverse in clone libraries examined. These lineages are the Firmicutes (average rank of 9.4 and 8.7 using rarefaction curve and diversity-ordering approaches, respectively), δ-Proteobacteria (average rank 9.3 and 9.5) and Planctomycetes (average rank 7.1 and 8.5). Similarly, a second group was collectively classified as the least diverse: The Gemmatimonadetes (average rank 3.3 and 2.9), the Verrucomicrobia (average rank 2.1 and 3.0) and the β-Proteobacteria (average rank 2.6 and 3.6). The five remaining phylogenetic lineages (Acidobacteria, Actinobacteria, α-Proteobacteria, Chloroflexi and Bacteroidetes) showed no consistent rankings among various data sets, as evident by wide range of rarefaction curve and diversity-ordering rankings in different clone libraries (rarefaction curve rankings ranged from 2.5 to 9, 3 to 11, 1 to 7, 3 to 10.5 and 2 to 8 and diversity ordering ranged from 3 to 9, 4 to 10, 2 to 10, 2 to 8 and 3 to 9 for the Acidobacteria, Actinobacteria, α-Proteobacteria, Bacteroidetes and Chloroflexi, respectively) (Table 2).

Interestingly, lineages that exhibited large variations in rankings between soils also exhibited large difference in relative abundances between different soils in our data set (Figure 2, black columns). The only two exceptions were α-Proteobacteria, where relative abundance in various soils exhibited remarkable stability in spite of the changes in diversity rank, and β-Proteobacteria, where significant changes in abundance were not accompanied by similar changes in the diversity rank. Furthermore, an examination of relative abundances within a recently compiled data set of 32 16S rRNA gene clone libraries (Janssen, 2006, Figure 2, gray columns), as well as a quantitative PCR analysis of the relative abundance of six bacterial lineages (Acidobacteria, Actinobacteria, α-Proteobacteria, β-Proteobacteria, Bacteroidetes and Firmicutes (Fierer et al., 2007, Figure 2, white columns)) in 69 different soil samples, confirmed the same general trend: Higher differences in relative abundances were observed in lineages with wide variations in diversity rankings, compared with smaller variations in relative abundances in lineages with consistently low- or high-diversity ranks.

Figure 2
figure 2

Ranges of relative abundances observed for the 11 lineages in the five data sets examined in this study (black columns), in a compilation of thirty-two 16S rRNA gene clone libraries analyzed by Janssen (2006) (gray columns), and in a quantitative PCR analysis of the relative percentage of six bacterial lineages (Acidobacteria, Actinobacteria, α-Proteobacteria, β-Proteobacteria, Bacteroidetes and Firmicutes) in 69 different soil samples (Fierer et al., 2007) (white columns). *Not reported.

We further examined the relationship between relative abundance and richness, evenness and total diversity rank in all lineages, with special emphasis on those that exhibited significant changes in both diversity rankings and abundance in the five libraries tested (Acidobacteria, Actinobacteria, Chloroflexi and Bacteroidetes). The correlation between abundance and various richness, evenness and diversity estimates is especially important within these four lineages, as the magnitude of change in community structure (abundance and diversity) is highest in these four phyla. Therefore, the nature of this correlation would have a greater effect on the overall community structure, compared with other lineages that show little changes in diversity and or abundance in data sets examined. Results (Table 3) indicate that richness, evenness and diversity, in general, exhibited similar patterns of correlation with abundance. Within these four lineages, a strong abundance–diversity correlation was observed within the Actinobacteria (r=0.7), a weaker positive correlation was observed within the Chloroflexi (r=0.58) and the Acidobacteria (r=0.41) and a strong negative correlation was observed within the Bacteroidetes (r=−0.99).

Table 3 Correlation coefficients between relative abundance and richness, evenness or total diversity within soil bacterial lineagesa

Discussion

The most important finding of this study is the identification of lineages that are consistently ranked as having high or low diversity in the five data sets analyzed. This observation could provide useful insights regarding our estimates of the contribution of various bacterial lineages to ecosystem functions. According to ecological and evolutionary theory, more diverse communities have higher contribution to ecosystem functions and services than less diverse counterparts (Giovannoni, 2004; Bell et al., 2005). It has been postulated previously that as each species uses slightly different resources and occupy a highly specified niche in the community, more diverse communities provide more services to the ecosystem than counterparts with less diversity (Loreau and Hector, 2001; Bell et al., 2005). From an evolutionary perspective, bacterial communities in soil are highly adapted and constrained by selection, with newly evolving, more ecologically fit cells rapidly evolving and eliminating lesser competitors (Giovannoni, 2004). As such, the communities observed in 16S rRNA gene studies are the victors of an evolutionary process in which less competitive species are continuously being purged from the community. It follows that lineages with higher diversity have higher contribution to ecosystem functioning than lineages with comparable abundance but lower diversity.

Whereas some lineages were consistently ranked as having high or low diversity in all data sets, others exhibited swings in their rankings in different environments (Table 2). Interestingly, these lineages were also those that, in general, exhibited wide ranges of relative abundance in the five clone libraries examined in this study (Figure 2). Whereas diversity rankings are not available and unreliable to compute in smaller data sets available at present, examination of previously available data sets have indeed shown that, in general, members of these lineages exhibited the largest variations in relative abundances in soils previously examined using 16S rRNA gene analysis (Janssen, 2006) and quantitative PCR (Fierer et al., 2007) (Figure 2). This argues that members of these lineages constitute a more dynamic core of microorganisms that are more readily affected by and responsive to variations in soil characteristics.

We further examined the correlation between relative abundance and diversity (richness, evenness and total diversity), with special emphasis on lineages exhibiting variations in diversity rankings and relative abundances in our data sets (Table 3). Theoretically, a positive correlation between diversity and abundance within a specific lineage suggests that the observed increase in relative abundance is driven by the recruitment of new bacterial species belonging to this lineage to the ecosystem through immigration, or the elevation of rare, previously undetected members of this specific lineages to higher numbers. Under this scenario, richness, evenness and diversity ranks will be positively correlated with the increase in relative abundance. On the other hand, a negative correlation between abundance and diversity suggests that the increase in relative abundance is driven by the propagation of a single, or relatively few, species to higher numbers, which results in the out-competition of other members of the community or a drop in their numbers to undetectable levels. Under this scenario, the observed richness, evenness and diversity rank will be negatively correlated with the increase in relative abundance. The results obtained in this study suggest that, within the four lineages that exhibited wide variations in both relative abundance and diversity rankings, members of the Acidobacteria, Actinobacteria and Chloroflexi follow the first scenario, whereas members of the Bacteroidetes follow the second scenario (Table 2). It is important to note that these correlations are computed using data sets that represent snapshots of five different soil bacterial communities. As such, these reported patterns do not represent dynamics of the recruitment, promotion and demotion of bacterial species within a single ecosystem, but are rather general trends of diversity–abundance correlations in different soil ecosystems.

This study represents a preliminary effort to understand the characteristic diversity ranks and diversity–abundance correlations observed in various bacterial lineages. Similar studies could greatly benefit from the anticipated availability of large (for example, >10 000 clones) capillary and pyrosequencing soil clone libraries in the future (Tiedje, 2008). Data sets from multiple, diverse soil ecosystems could help in determining whether the pattern obtained in this study represents a global pattern of diversity ranks in various soils. In addition, multiple data sets documenting the response of a single ecosystem to natural and/or anthropogenic manipulations should prove extremely valuable in tracing the diversity–abundance correlations observed at different time points to phylogenetic information regarding specific OTUs in various bacterial lineages. Finally, larger data sets could allow for conducting similar studies on finer taxonomic scales, for example, at the class and order levels). Members of some of the examined lineages in this study (for example, the α-, β- and δ-Proteobacteria and the Firmicutes) exhibit a bewildering array of metabolic and physiological capabilities. Therefore, it is conceivable that on account of this extremely diverse metabolic capabilities and multiple ecological roles within some of the 11 specific division or subdivision examined, the patterns of diversity and diversity–abundance correlations could significantly vary when examined at a lower (for example, class or order) taxonomic level.