Introduction

A major goal and challenge in microbial ecology is to link ecologically important metabolic functions to prokaryotic identity. Progress toward this goal has been hindered not only by the resistance of many microbes to traditional culture methods (Rappé and Giovannoni, 2003), but also by the inherent difficulty in attributing a microbially mediated process, measured in a complex environmental sample, to a specific taxon. Identifying the potential of specific organisms to contribute to an ecological process is valuable for making functional predictions and informing ecological models. Trait-based models have already shown promise for predicting ecosystem dynamics such as community structure (Follows et al., 2007) and decomposition (Allison, 2012), but would benefit from improved characterization of taxon-trait relationships. Traits related to resource acquisition are of particular interest, as these may be involved in both responding to and affecting ecosystem processes (Lavorel and Garnier, 2002).

The metabolic machinery of microorganisms contributes to ecosystem processes, including nutrient cycling and decomposition of organic matter. Decomposition by prokaryotes is initiated by the activity of extracellular enzymes that are secreted outside the cell, attached to the cell wall or located in the periplasm. These enzymes function to acquire energy and resources from organic matter for cellular growth, while catalyzing important transformations in the carbon (C), nitrogen (N) and phosphorus (P) cycles (Chrost and Siuda, 2002). Specific enzymes are produced to target the principal reservoirs of C, N and P in the environment. For example, alkaline phosphatases (APs) are broad specificity enzymes that release inorganic P (in the form of a phosphate group) from nucleic acids, phospholipids and other phosphate esters (Torriani, 1960; Sebastian and Ammerman, 2009, 2011). AP serves as the primary means by which prokaryotes hydrolyze P from organic material for uptake and use (Karl, 2000). Likewise, chitinases (CHIs) are produced to access the C and N resources contained in chitin, the second most abundant biopolymer on Earth next to cellulose (Jolles and Muzzarelli, 1999). Complete hydrolysis of chitin to monomer units also requires degradation by β-N-acetyl-glucosaminidase (NAG) (Gooday, 1990). The basics of biosynthesis, structure, catalytic properties and genetic regulation for extracellular enzymes are well known for only a few prokaryotes (for example, Bradshaw et al., 1981; Bassler et al., 1991; Jolles and Muzzarelli, 1999). Additionally, the range of taxa that are capable of producing specific extracellular enzymes remains largely unknown (Arnosti, 2011).

Our understanding of the metabolic capabilities of prokaryotic taxa has advanced significantly due to the sequencing of microbial genomes. For example, genomic analysis has allowed unprecedented insight into the environmental resource adaptation and ecological function of several key lineages, including Prochlorococcus (Rocap et al., 2003; Martiny et al., 2006), Synechococcus (Palenik et al., 2006), Roseobacter (Newton et al., 2010) and Escherichia coli (Luo et al., 2011). Since genome sequences represent the complete genetic repertoire of potential functions available to an organism, including strategies available for resource acquisition, the increasing availability of sequenced prokaryotic genomes holds much promise for clarifying taxon-trait relationships (Fraser et al., 2000; Ward and Klota, 2011).

Several previous studies have reported that traits related to resource use are associated with specific taxa or ecotypes at very fine-scale phylogenetic resolution (97% 16S rRNA sequence identity). This has been demonstrated for resource traits such as particle colonization (Hunt et al., 2008), light adaptation (Moore et al., 1998; West and Scanlan, 1999; Johnson et al., 2006; Becraft et al., 2011) and nutrient use (Jaspers and Overmann, 2004; Martiny et al., 2006; Bhaya et al., 2007; Choudhary and Johri, 2011). This fine-scale association appears to hold true across the prokaryotic domains for genetically simple carbon use traits (Martiny et al., 2012); however, it is unknown whether extracellular enzyme traits also follow this pattern.

The objectives of the present study were two-fold: (1) to determine which prokaryotic taxa have the genetic potential to produce extracellular enzymes; and (2) to evaluate the linkages between that genetic potential and phylogeny, as defined by the 16S ribosomal RNA gene sequence. We used sequenced prokaryotic genomes to identify taxa with the genetic potential to produce AP, CHI and NAG enzymes. We then evaluated the linkages between that genetic potential and phylogeny by examining the size and average 16S rRNA sequence identity of clades capable of AP, CHI or NAG production. We hypothesized that vertical inheritance should result in a non-random distribution of enzyme-positive genotypes among prokaryotic taxa. If the genetic potential to produce extracellular enzymes is highly conserved, then enzyme-positive genotypes should be shared within large, deep clades (Figure 1a). In this case, broad changes in the taxonomic composition of a community may influence functioning associated with the trait. More likely, the genetic potential to produce extracellular enzymes is less conserved (more labile) and enzyme-positive genotypes should be found within smaller, closely related clades (Figure 1b). Changes in community composition would be unlikely to have a big impact on the functioning of such a dispersed trait. If phylogenetic conservation is related to genetic complexity as predicted by Martiny et al. (2012), then enzyme-positive genotypes should be associated with small clades (Figure 1b) since extracellular enzyme production is commonly encoded by a few genes (see references in Supplementary Tables S1–S3). Alternatively, the traits under investigation may be randomly associated with prokaryotic taxa (Figure 1c), possibly resulting from frequent gene gain/loss, rapid convergent evolution or horizontal gene transfer that obscure vertical inheritance (Doolittle, 1999; Gogarten et al., 2002; Snel et al., 2002; Boucher et al., 2003).

Figure 1
figure 1

Hypothetical illustration depicting possible scenarios for the phylogenetic distribution and conservation of enzyme traits. Circles denote genomes with the genetic potential to produce extracellular enzymes (‘enzyme-positive genotypes’). Shaded branches highlight the phylogenetic depth (τD) of clades of enzyme-positive genotypes. Case (a) represents a strong correlation between phylogeny and enzyme trait, resulting in a large, deeply branching clade of enzyme-positive genotypes with relatively low 16S rRNA sequence identity (strong phylogenetic conservation). The correlation between phylogeny and enzyme trait is more labile in case (b), resulting in smaller clades of enzyme-positive genotypes with higher 16S rRNA sequence identity. Case (c) represents a random association between phylogeny and enzyme trait (no phylogenetic conservation).

We also hypothesized that the genetic potential to produce CHI and NAG enzymes should be positively correlated, thereby allowing the complete hydrolysis of chitin substrates by an individual organism. We used correlation analyses to test for associations between CHI- and NAG-positive genotypes within the data set.

Materials and methods

Identification of enzyme protein families

To target protein families with confirmed enzyme function, we conducted a literature search for empirically characterized amino acid sequences of prokaryotic AP (EC 3.1.3.1), CHI (EC 3.2.1.14) and NAG (EC 3.2.1.52). These enzymes were selected based on their ecological relevance and reasonably well-defined function/nomenclature. In all, 15–17 amino acid sequences from at least 11 different prokaryotic species were used as representatives for each enzyme (Supplementary Tables S1–S3). We then used the approach recently described by Martiny et al. (2012). The amino acid sequences were used to query the SEED database (Overbeek et al., 2005; http://seed-viewer.theseed.org) for the matching protein family identified by a unique ‘FIGfam’ number. SEED’s FIGfams are sets of isofunctional proteins that are homologous along the full length of the amino acid sequence (Meyer et al., 2009). Using FIGfams is a conservative method of identifying functional proteins, but provides an advantage over BLAST analysis since each FIGfam may encompass several genes coding for the same function. In total, 15 AP query sequences from 15 different species matched three FIGfams (FIG000766, FIG021520 and FIG024565). Seventeen CHI query sequences from 11 different species matched two FIGfams (FIG001347 and FIG004885). Sixteen NAG query sequences from 12 different species matched five FIGfams, but only three were used in our analysis (FIG001088, FIG003166 and FIG149633) because two corresponded to a different enzyme commission number (excluded FIG010408 and FIG048839). For each identified FIGfam, we compiled a list of prokaryotic genomes that had a protein sequence associated with the FIGfam function using the SEED API (Disz et al., 2010; http://www.theseed.org/servers/). This approach generated a matrix relating each genome to its corresponding genotype for AP, CHI and NAG. Genomes with the genetic potential to produce AP, CHI and/or NAG enzymes will be referred to as ‘enzyme-positive genotypes,’ whereas genomes in which the targeted protein families were not detected will be called ‘negative genotypes.’ We recognize that negative genomes may still contain genes for (non-homologous) proteins of the same or similar function that do not correspond to a defined FIGfam.

16S rRNA phylogenetic tree construction

We then generated a 16S rRNA phylogeny of all the annotated prokaryotic genomes available from SEED as of 5 December 2011. The organism name (derived from the NCBI species taxonomy ID) was used to extract the corresponding aligned 16S rRNA sequence from the Silva database (Release 108; Pruesse et al., 2007; http://beta.arb-silva.de/). Four eukaryotic large ribosomal subunit sequences were added to the alignment as outgroups (Arabidopsis thaliana, GenBank accession no. AX059457; Saccharomyces cerevisiae, AB278124; Aspergillus niger, EU884135; and Candida albicans, M60302). To account for uncertainty in the phylogeny, the sequence alignment was bootstrap sampled to generate 100 data sets using the SEQBOOT program from the PHYLIP software package (Felsenstein, 2005; http://evolution.genetics.washington.edu/phylip.html). The pairwise genetic distances between 16S rRNA sequences in each of the 100 data sets were measured with the DNADIST program utilizing the F84 model of nucleotide substitution (Felsenstein and Churchill, 1996). Phylogenetic trees were inferred from the square distance matrix of each data set with the neighbor-joining algorithm implemented by the NEIGHBOR program and randomized input order for the sequences. An additional distance matrix and neighbor-joining tree were constructed from the non-bootstrapped sequence alignment using the same methods and were used for visualization in iTOL (Letunic and Bork, 2007, 2011; http://itol.embl.de/) and to calculate phylogenetic-independent contrasts (PICs).

Phylogenetic conservation analysis

To evaluate the relationship between enzyme genotype and 16S rRNA phylogeny, we used the consenTRAIT algorithm (Martiny et al., 2012). The algorithm identified the root node of clades of enzyme-positive genotypes in which 90% of the terminal descendants shared the genotype for the enzyme of interest. Within each enzyme-positive clade, consenTRAIT calculated the average consensus sequence distance (d) between the root node (R) and the terminal node (S) for all clade members (m). If an enzyme-positive genotype did not have any neighbors that shared the trait (‘singleton’), then the average consensus sequence distance was calculated as half the branch length to the nearest neighbor. Trait depth, τD, was then calculated as the average of d values for n clades in the phylogeny:

This calculation was repeated for each bootstrapped tree and then averaged across the set of 100 trees to obtain the number of singletons, number of enzyme-positive clades, size of enzyme-positive clades, and τD for each enzyme studied. Bootstrap sampling with replacement approximates a normal distribution around a single sequence alignment, so analyzing the set of 100 bootstrapped trees allowed us to account for variation in the phylogeny. The calculated τD represents the average sequence difference within an enzyme-positive clade. τD was multiplied by two, and then subtracted from 1 as a measure of sequence identity of organisms within an enzyme-positive clade, and is comparable to a cutoff for defining operational taxonomic units.

To determine whether enzyme genotype and phylogeny were significantly non-randomly associated, we compared the observed size and trait depth (τD) of enzyme-positive clades to the same values calculated after randomizing the genome-genotype associations 1000 times (10 times for each of the 100 bootstrapped trees). The reported P-value is the fraction of randomizations that had a clade size or τD greater than or equal to that of the observed data. We considered enzyme genotype and phylogeny to be significantly associated (‘phylogenetically conserved’) for P-values <0.05.

Trait correlations

For all correlational analyses, the genetic potential to produce a specific enzyme was represented as ‘1’ if present or ‘0’ if absent within each genome. We examined associations between enzyme-positive genotypes specific for AP, CHI or NAG production first without considering phylogeny by calculating the Pearson’s product-moment correlation and Spearman’s rank correlation for all pairwise trait combinations using the ‘Hmisc’ package in R (Harrell, 2012; R v2.15.1, R Core Team, 2012; http://www.R-project.org/). Additionally, we calculated co-occurrence frequencies for directional pairwise combinations of enzyme-positive genotypes to account for the differences in total abundance of each enzyme in the data set. For example, we calculated the fraction of NAG-positive genotypes that also contained CHI. We then used the analysis of traits (AOT) function in the Phylocom software package (v4.2; Webb et al., 2008, 2011; http://phylodiversity.net/phylocom/) to calculate PICs that test for associations between specific enzyme-positive genotypes while correcting for the non-independence of related taxa (Felsenstein, 1985; Garland et al., 1992). Independent contrasts are calculated by AOT as the difference between the mean trait values of the bifurcating descendants at each internal node. The AOT function can be used to contrast two binary traits because the proportion of taxa possessing each trait is a continuous value between 0 and 1 for each clade. Contrasts were calculated for all pairwise trait combinations using the phylogeny generated from the non-bootstrapped sequence alignment. Before analysis, we used the FigTree program (v1.3.1; http://tree.bio.ed.ac.uk/software/figtree/) to transform all branch lengths into proportions to account for any zero values. We tested for significance of the resulting Pearson correlation coefficient using a table of critical values (Sokal and Rohlf, 1995).

The full output of AOT also includes a test of phylogenetic signal, which represents the degree to which each trait is conserved across the phylogeny (Blomberg and Garland, 2002). We used this calculation to validate the results of the consenTRAIT analysis. Phylocom uses the variance of PICs to calculate the phylogenetic signal similar to Blomberg and Garland’s K statistic (Blomberg and Garland, 2002; Blomberg et al., 2003; Webb et al., 2011) and determines the significance by comparing the observed PIC mean to a distribution of PIC means generated from 1000 randomizations of trait values across the tips of the phylogeny.

Results

Taxonomic composition of sequenced prokaryotic genomes

We analyzed 3058 prokaryotic genomes, including 30 phyla of Bacteria and Archaea, with 1–1312 genomes per phylum (Figure 2a). Proteobacteria (42.9% of genomes analyzed) was the most represented phylum in the data set. Other phyla that were disproportionately represented in the data set included Firmicutes (28.9%), Actinobacteria (10.0%) and Bacteroidetes (4.8%). Total representation from these abundant phyla accounted for 86.6% of the data set. The remaining phyla in the data set all contained <100 genomes each, and more than half of those contained <10 genomes each.

Figure 2
figure 2

Occurrence of the genetic potential to produce AP, CHI and/or NAG enzymes (‘enzyme-positive genotypes’) among prokaryotic taxa. (a) Log abundance of genomes by phylum. (b) Proportion of enzyme-positive genotypes within each phylum. (ce) Proportion of enzyme-positive genotypes specific for AP, CHI and NAG enzyme production.

Taxonomic distribution of enzyme-positive genotypes

Almost half of the 3058 genomes (1504, 49.2%) were identified as enzyme positive for AP, CHI, NAG or some combination (Figures 2b and 3). NAG-positive genotypes were most common (detected in 1127 genomes or 36.9% of all genomes analyzed), followed by AP- and CHI-positive genotypes (976, 31.9% and 461, 15.1%, respectively). Enzyme-positive genotypes were detected in all phyla except Korarchaeota, Thaumarchaeota, Aquificae, Chlamydiae, Chrysiogenetes and Fibrobacteres. Multi-enzyme genotypes capable of producing all three enzymes were found in the Acidobacteria, Actinobacteria, Bacteroidetes, Chloroflexi, Cyanobacteria, Dictyoglomi, Firmicutes, Proteobacteria, Spirochaetes, Theromotogae and Verrucomicrobia groups. AP- and NAG-positive genotypes were most broadly distributed across prokaryotic taxa and were detected in 22 and 19 phyla, respectively. CHI-positive genotypes were more narrowly distributed, and were found in only 12 of the 30 phyla.

Figure 3
figure 3

Distribution of prokaryotic genomes with the genetic potential for AP, CHI or NAG enzyme production across a neighbor-joining phylogeny of 16S rRNA sequences. The red inner ring shows AP-positive genotypes, the blue middle ring shows CHI-positive genotypes and the green outer ring shows NAG-positive genotypes. Gray bars represent enzyme-negative genotypes. Highlighted clades depict (1) Enterococcus, (2) Burkholderia, (3) Vibrio and (4) Escherichia.

Although enzyme-positive genotypes were distributed across all the major groups of prokaryotes, the pattern of distribution within most groups was variable. For example, genomes within Escherichia (n=153) and Enterococcus (n=75) genera showed notable variation in the genetic potential to produce all three enzymes (Figure 3). This pattern is striking given the high rRNA relatedness (>99% identity) of the genomes within Escherichia. By contrast, enzyme genotype was consistent within few taxonomic groups, including Burkholderia (n=57) species for all enzymes studied, and Vibrio (n=60) species for CHI and NAG only.

Phylogenetic conservation

Clades of enzyme-positive genotypes varied broadly in size and relatedness (Figure 4a; Supplementary Figures S1a and S2a). The mean clade size for enzyme-positive genotypes was 1.74–2.02 genomes per clade (Table 1). For all three enzymes studied, randomizing the assignment of enzyme-positive genotypes across the phylogeny resulted in a significantly higher occurrence of singletons and smaller clade size than the observed data (Table 1; Figure 4b, Supplementary Figures S1b and S2b; P<0.001). Additionally, the mean trait depth (τD) for all three enzymes was also significantly different from the null (randomized) distribution. τD within clades ranged from 0.008004 16S rRNA distance for CHI to 0.009780 16S rRNA distance for AP (Table 1; Figure 4c; Supplementary Figures S1c and S2c; P<0.012). This corresponds to a 16S sequence identity of 98.40% for CHI and 98.04% for AP. Supporting this, we also detected a significant phylogenetic signal for all three enzymes (P<0.001 for AP and P=0.001 for CHI and NAG). Thus, based on the larger than random clade size, trait depth and phylogenetic signal, our analysis suggests that the enzyme-positive genotypes were non-randomly associated with the phylogeny.

Figure 4
figure 4

Phylogenetic conservation of AP-positive genotypes. (a) Clade size range and relationship to mean trait depth, τD (average branch length from the root node to terminal nodes within enzyme-positive clades). (b) Frequency distribution of mean clade size of randomized data. The red bar represents the observed mean clade size, P<0.001. (c) Frequency distribution of τD of randomized data. The observed τD (red bar) is significantly different from the null distribution (P<0.001). The color reproduction of this figure is available on the ISME Journal online.

Table 1 Phylogenetic conservation of enzyme-positive genotypes

Trait correlations

We next analyzed the correlations between enzyme traits. Both Pearson and Spearman correlation tests supported significant (P<0.001, n=3058) associations between all pairwise enzyme combinations. AP- and NAG-positive genotypes, which were each individually more abundant in the data set than CHI-positive genotypes, were also most highly correlated (r=0.419). CHI-NAG and AP-CHI genotype combinations were slightly less correlated (r=0.322 and 0.302, respectively). These association patterns remained similar and significant (P<0.001, n=3058) even after considering phylogeny using independent contrasts. AP- and NAG-positive genotypes were still most correlated (r=0.474), followed by CHI-NAG (r=0.354) and AP-CHI (r=0.207).

Evaluating the frequency of directional combinations of enzyme-positive genotypes revealed a different association pattern. As we hypothesized, a high proportion of CHI-positive genotypes (73.75%) were also positive for NAG, regardless of the genotype for AP. By contrast, only 30.17% of NAG-positive genotypes were also positive for CHI. These results indicated that genomes with the genetic potential to produce CHI enzymes were mostly a subset of those capable of producing NAG enzymes.

Discussion

Our objectives were to determine which prokaryotic taxa have the genetic potential to produce AP, CHI and/or NAG enzymes (‘enzyme-positive genotypes’), and to evaluate the linkages between that genetic potential and phylogeny for sequenced prokaryotic genomes. Consistent with our hypothesis, the genetic potential to produce AP, CHI and/or NAG enzymes showed a significantly non-random association with phylogeny when measured by both the size (Figure 4b; Supplementary Figures S1b and S2b; P<0.001) and average phylogenetic relatedness (Figure 4c; Supplementary Figures S1c and S2c; P<0.012) of clades containing enzyme-positive genotypes. Although this finding suggests that vertical inheritance was generally important for the distribution of these traits in prokaryotes, the minimum phylogenetic relatedness within enzyme-positive clades was high, on average >98% 16S rRNA sequence identity (Table 1).

The relatedness value of >98% is closer to the species demarcation threshold suggested for predicting phenotypic potential (Konstantinidis et al., 2006a, 2006b) than the common threshold for distinguishing prokaryotic species (Stackebrandt and Goebel, 1994; Petti, 2007). This function could be inferred for operational taxonomic units defined by a 98% sequence identity cutoff or higher. However, the sequence identity values presented here should be interpreted with some caution. The high percentage of singletons (enzyme-positive genotypes whose nearest neighbors did not share the trait) for each enzyme is likely a consequence of under sampling prokaryotic diversity. When these singletons were excluded as clades in the phylogenetic conservation analysis, the average relatedness within enzyme-positive clades decreased (AP=96.9%, CHI=98.0% and NAG=96.7%). The number of singletons may change as we continue to sample more of the prokaryotic diversity in nature, though it is unclear how trait associations may be affected. Sampling biases in prokaryotic genome sequencing likely also influenced the variation in individual clade size and phylogenetic relatedness (Figure 4a; Supplementary Figures S1a and S2a), such that large clades with high relatedness are the result of deep sampling of a few taxonomic groups in the data set (for example, E. coli). Despite these biases, the fine-scale association that we detected between enzyme genotype and phylogeny is evidence of ‘microdiversity,’ the occurrence of ecologically or physiologically distinct populations within phylogenetically related groups (Moore et al., 1998).

Microdiversity is a well-documented phenomenon among prokaryotes, particularly for traits related to resource use. Prochlorococcus isolates have been divided into ecotypes based on exploitation of light resources, despite being >97% similar in ribosomal identity (Moore et al., 1998; West and Scanlan, 1999). Martiny et al. (2006) also found that the gene content for phosphate acquisition was not congruent with rRNA phylogeny for members of Prochlorococcus. Likewise, extremely close relatives of hot spring Synechococcus have been shown to differ in their adaptations to light levels (Becraft et al., 2011) as well as phosphorus and nitrogen use pathways (Bhaya et al., 2007). Isolates of Pseudomonas, Acinetobacter (Choudhary and Johri, 2011) and Brevundimonas (Jaspers and Overmann, 2004), which had identical 16S rRNA gene sequences, were also found to occupy different ecological niches, with each using a unique combination of carbon substrates. Additionally, Hunt et al. (2008) demonstrated that Vibrio isolates could be resolved into ecologically distinct populations based on resource partitioning within the water column. These examples represent cases of local adaptation or niche specialization, which can be the first step in the process of ecological speciation. Likewise, the microdiversity detected for AP-, CHI- and NAG-positive genotypes may provide evidence for the importance of extracellular enzymes in the origins of ecological diversity.

Although not tested here, we speculate that several ecological processes could contribute to microdiversity in the ability to use phosphate esters and chitin among prokaryotic taxa. It is likely that gene content reflects differential adaptation to environmental conditions even among closely related organisms. Extracellular enzymes are commonly produced by prokaryotes in low-nutrient environments to access the resources trapped in high molecular weight compounds (Torriani, 1960; Münster and Chrost, 1990; Amon and Benner, 1996; Delpin and Goodman, 2009). As such, microdiversity in enzyme genotype may occur if individuals are adapted to contrasting nutrient regimes. For example, nitrogen transporter genes differed between coastal (high nutrient) and open ocean (low nutrient) isolates of Synechococcus (Palenik et al., 2006) and phosphate acquisition genes varied among strains of Prochlorococcus in relation to the specific nutrient availability where the strains were isolated (Martiny et al., 2006). These patterns of variation were reflected in the distribution of AP-positive genotypes among Cyanobacteria in our data set (Figure 3), and may be the result of frequent local resource adaptation. Ideally, environmental data collected from an organism’s isolation source could be used to test for correlations with particular traits. We suspect that the taxonomic distribution of AP-, CHI- and NAG-positive genotypes may be related to nutrient supply; however, these data were not available for the sequenced genomes used in this study. Fortunately, the field of microbial ecology is experiencing a shift toward more meticulous measurement and reporting of contextual environmental data (for example, Yilmaz et al., 2011).

Microdiversity in enzyme genotype may also be associated with lifestyle strategy. Commensal or pathogenic strains may have greater access to more labile nutrients than their free-living counterparts, alleviating the necessity for extracellular enzymes. We notably did not detect the ability to use phosphate esters or chitin within Fibrobacteres, a gut symbiont (Ransom-Jones et al., 2012), or Chlamydiae, an obligate intracellular pathogen (Horn, 2008). Recently, Luo et al. (2011) found that environmental isolates of E. coli shared a set of genes important for resource acquisition that were absent from enteric isolates. Likewise, the enteric strains shared a set of genes involved in the transport and use of several labile nutrients, which were absent from the environmental isolates. We found that members within Escherichia varied in their genetic potential to produce all three enzymes (Figure 3). This variation may be evidence that they are changing their ecology or even speciating more often than other lineages, such as Burkholderia, which was relatively coherent for enzyme genotype (Figure 3).

Finally, horizontal gene transfer may also result in important ecological differences between closely related organisms (Welch et al., 2002). The extent to which horizontal gene transfer may occur for resource acquisition machinery among prokaryotes is unclear. Previous work suggests that horizontal gene transfer may significantly impact the evolution of chitin-degrading enzymes (Garcia-Vallve et al., 1999). However, adaptive transfer of genes is limited to those that can be transferred as a functional unit containing a complement of genes that are involved in processing a single resource molecule (Wiedenbeck and Cohan, 2011). Structural or physiological incompatibilities may further inhibit the horizontal transfer of genes between individuals (Cohan and Koeppel, 2008).

The fine-scale association between enzyme genotype and phylogeny is consistent with the relatively simple genetic structure underlying the production of extracellular enzymes. These enzyme systems involve few genes/operons that may allow enzyme function to be rapidly gained or lost though evolutionary processes including horizontal gene transfer. Martiny et al. (2012) found similar association values for carbon usage traits, which ranged from 96.6% to 97.8% 16S rRNA sequence identity within positive clades, depending on the data set analyzed. By contrast, more complex traits such as oxygenic photosynthesis or sulfate reduction involve many more genes and show phylogenetic conservation at much deeper taxonomic levels (80% and 92.2% mean 16S rRNA sequence identity, respectively; Martiny et al., 2012). Other traits not directly related to resource use, such as rRNA operon copy number (Rastogi et al., 2009) and host adaptation (Ettema and Andersson, 2009), may also be conserved at deeper taxonomic levels.

We predicted that the genetic potential to produce CHI and NAG enzymes should be positively correlated, thereby allowing the complete hydrolysis of chitin substrates by an individual organism. In support of our hypothesis, we found a significant association between the ability to hydrolyze chitin and use the end products of hydrolysis, regardless of whether we corrected for shared evolutionary history. Allocation to extracellular enzymes represents a significant investment of carbon and nitrogen, and cells should be under selection to regulate production and maximize substrate use (Allison et al., 2007, 2011). Most of the Vibrio genomes in our data set were capable of complete chitin hydrolysis, which is consistent with their known ecology. Vibrios can be found in a range of aquatic habitats and have one of the most well-studied chitinolytic systems of any prokaryote (Keyhani and Roseman, 1999; Xibing Li and Roseman, 2004; Pruzzo et al., 2008). While a high proportion of CHI-positive genotypes (73.75%) were also positive for NAG, we detected potential for CHI production without the potential for NAG production in a few groups within the Firmicutes phylum (Figure 3). For these strains, CHI production may be primarily involved in virulence (Larsen et al., 2010) or biofilm formation (Tirumalai and Prakash, 2011) instead of resource acquisition. Alternatively, we may not have identified all NAG-positive genotypes in our analysis. By contrast, a majority (69.83%) of NAG-positive genomes appeared to lack the genetic potential for CHI production. We suspect that this may be related to the numerous biological and environmental sources of NAG substrates aside from the products of CHI activity, causing a relatively low correlation between the two enzymes. We also detected significant positive correlations between AP- and CHI-positive genotypes as well as AP- and NAG-positive genotypes, suggesting that some prokaryotes may function as general ‘enzyme producers,’ with the ability to produce a suite of extracellular enzymes. Furthermore, the similarity of the raw correlations with the PIC analysis indicates that these traits are correlated independent of phylogeny, and that there may be selection for the traits to co-occur.

Identifying the appropriate level of phylogenetic relatedness for a trait of interest is critical for defining ecologically coherent units and predicting/interpreting community function. Our study shows that extracellular enzyme genes are not randomly distributed throughout the prokaryotic phylogeny, but the average level of phylogenetic conservation for enzyme genes is less than the level typically used to define bacterial operational taxonomic units based on 16S markers. Consequently, the microdiversity associated with enzyme traits could mask correlations between enzyme activity and community composition (for example, Frossard et al., 2012) if operational taxonomic units are defined below 98% 16S rRNA sequence identity. Trait-based techniques such as GeoChip (He et al., 2007), which allow for directed assessment of functional genes and processes, may be more appropriate for investigating the ecological consequences of phylogenetically fine-scale microbial traits than inferring function based on phylogenetic marker genes. Continued efforts by microbial ecologists to identify the levels of phylogenetic conservation of metabolic functions among prokaryotes will enhance our ability to predict microbial impacts on ecosystem processes.