Hydrogen (H2) is ubiquitous in the biosphere and has the highest energy content per unit mass of all the naturally occurring fuels. High concentrations of H2 have been reported in a variety of environments that include the deep subsurface (Stevens and McKinley, 1995), hydrothermal vents (Jannasch and Mottl, 1985), salt-evaporation ponds (Hoehler et al., 2001) and a range of terrestrial geothermal features (Conrad et al., 1985; Inskeep and McDermott, 2005; Spear et al., 2005). H2 is an important source of energy for the maintenance and growth of microbial populations (Wolin, 1982; Morita, 2000) and has recently been postulated to be the primary fuel supporting a number of microbial assemblages (Stevens and McKinley, 1995; Chapelle et al., 2002), including those inhabiting geothermal spring environments (Conrad et al., 1985; Spear et al., 2005).

H2 can be produced by both geological and biological processes. Geologically, the production of H2 occurs through the hydrolysis of reduced meteoric water when in contact with iron-bearing basalts (Apps and van de Kamp, 1993), through the reduction of water during serpentinization reactions in the presence of ultramafic minerals such as olivine (Oze and Sharma, 2007), and through the radiolysis of water (Lin et al., 2005). Basalt-catalyzed H2 production occurs readily in the presence of reduced acidic fluids (Stevens and McKinley, 2000), whereas H2 production from serpentinization reactions is more prevalent at alkaline pH (Oze and Sharma, 2005; Cardace and Hoehler, 2009). Thus, geological H2 production through serpentinization can be expected to predominate in alkaline environments rich in ultramafic minerals, whereas geological H2 production by hydrolysis can be expected to be predominant in acidic and basalt-rich environments. Radiolytic production of H2 is likely an important abiotic source of H2 in subsurface environments rich in radioactive minerals (Lin et al., 2005).

Biologically, [FeFe]- and [NiFe]-hydrogenase enzymes catalyze the transfer of excess electrons generated during fermentative or photosynthetic metabolism to protons to form H2 (Vignais et al., 2001). The H2 evolution activity of [FeFe]-hydrogenase is 10- to 100-fold greater than that of the [NiFe]-hydrogenase (Frey, 2002), an observation that is consistent with the physiological role of most [NiFe]-hydrogenase in H2 uptake (Vignais et al., 2001). Recently, PCR primers for the specific amplification of the large subunit of the [FeFe]-hydrogenase gene (hydA) were developed to examine the diversity of hydA as a proxy for fermentative bacteria in a H2-generating saline microbial-mat system (Boyd et al., 2009b) and in a H2-generating anaerobic bioreactor (Xing et al., 2008). The results of these studies, as well as those from a number of metagenomic investigations that include the deep subsurface (Chivian et al., 2008) and termite hindgut environments (Warnecke et al., 2007), collectively indicate a widespread distribution and diversity of [FeFe]-hydrogenase in nature. However, our understanding of the environmental constraints that control the distribution and diversity of [FeFe]-hydrogenase-mediated H2 production in natural systems is confounded by the tight coupling between H2 production and consumption and by the abiological reactions that contribute to H2 production and consumption in natural systems. In this study, we describe the distribution and phylogenetic diversity of [FeFe]-hydrogenase along physical and geochemical gradients in Yellowstone National Park (YNP), WY, USA. The results suggest that pH imposes strong phylogenetic niche conservatism (Harvey and Pagel, 1991; Wiens, 2004) and that the YNP geothermal complex imposes dispersal limitation (Hubbell, 2001) on fermentative bacterial communities, as assessed using hydA-deduced amino acid sequences as a functional marker.

Materials and methods

Sample collection

Approximately 500 mg samples of microbial mat or sediment were collected from 65 locations in YNP, including 16 geothermal spring locations in the Witch's Creek area of the Heart Lake Geyser Basin (HG), 12 locations in and around Imperial Geyser (IG) (Lower Geyser Basin), 21 locations in the 100 Springs Plain area of the Norris Geyser Basin (NG) and 15 locations in the thermal field at the north end of Nymph Lake (NL) in July 2007. All samples were collected aseptically, placed into sterile microcentrifuge tubes, and were flash frozen on-site using an ethanol and dry ice slurry. Samples were kept on dry ice during transport to the laboratory, where they were kept at −80 °C until they were further processed. The field sites were initially chosen to sample the range of environmental pH and temperature combinations commonly encountered in YNP springs across a variety of geographical locations. pH and temperature were determined on-site at all of the 65 sites with a model 59002-00 Cole-Parmer temperature compensated pH meter (Vernon Hills, IL, USA). Temperature was confirmed using an alcohol thermometer.

Chemical analyses

Nine sites were chosen for more detailed chemical and biological examination based on the detection of hydA (see below) together with the pH and temperature combination of the sites. Concentrations of ferrous iron (Fe2+) and total sulfide (S2−) were quantified on-site with a Hach DR/2000 spectrophotometer (Hach Company, Loveland, CO, USA), and Hach ferrozine pillows and sulfide reagents, respectively. For both Fe2+ and S2− determinations, water samples were filtered (0.22 μm) before addition of reagents. A separate 10 ml sample of water was filtered (0.22 μm) into sterile tubes and was immediately frozen on dry ice for use in determining the concentrations of nitrate (NO3), nitrite (NO2), ammonia (NH4+), and phosphate (PO43−) using a SEALQuAAtro (West Sussex, UK), calibrated daily with freshly prepared standards. Salinity was determined using a YSI model 33 S-C-T meter (Yellow Springs Instrument Company, Yellow Springs, OH, USA).

DNA extraction and PCR amplification of hydA

Approximately 100 mg of microbial mat or sediment was subjected to community genomic DNA extraction, purification, and quantification as earlier described (Boyd et al., 2007). All DNA extracts were screened for the presence of 16S rRNA genes to ensure the presence of PCR-amplifiable DNA by using 10 ng of DNA, and primers bacterial 1070F (5′-ATGGCTGTCGTCAGCT-3′) and universal 1492R (5′-GGTTACCTTGTTACGACTT-3′) (Boyd et al., 2009a). Approximately 500 bp fragments of hydA were PCR-amplified from 10 ng of environmental genomic DNA as template and primer pair FeFe-272F and FeFe-427R using previously established reagent concentrations and reaction conditions (Boyd et al., 2009b). Twenty-nine of the 65 sites yielded amplicons of the correct size and out of those 9 were selected for further sequence analysis. These nine sites were chosen to sample the range of pH (1.90–9.84) and temperature combinations (32–65 °C) in YNP where hydA was detected. For each of these nine sites, new samples of microbial mat were sampled in July 2008 such that a more comprehensive assessment of geochemical data could be collected concurrently. The physical, chemical, and biological attributes of each of these sites reflect data derived from samples collected during July 2008. Community DNA extraction and quantification was performed as described above. hydA was amplified in triplicate and equal volumes of each replicate amplification were pooled. Pooled amplicons were purified using the Wizard PCR Preps DNA purification system (Promega, Madison, WI, USA), quantified using the Low DNA Mass Ladder (Invitrogen, Carlsbad, CA, USA), cloned using the pGem-T Easy Vector System (Promega), and sequenced using the M13F-M13R primer pair as described previously (Boyd et al., 2009a).

chlL/bchL primer design and PCR amplification of chlL/bchL

Protochlorophyllide oxidoreductase subunit L genes sequences encoded by chlL/bchL were compiled from the GenBank database using tBLASTn with the Anabaena variabilis str. ATCC 29413 ChlL sequence as bait. The corresponding putative chlL/bchL sequences were imported into MEGA4 (version 4.0.1), translated in frame, and the inferred amino-acid sequences were aligned using the ClustalW application (Gonnet substitution matrix, default parameters) within the MEGA4 program. Putative ChlL/BchL sequences were screened for the presence of conserved signature catalytic residues as defined previously (Burke et al., 1993). Sequences that lacked these domains were discarded without further consideration, resulting in 94 sequences for use in identifying consensus regions. Reverse translation of aligned ChlL/BchL sequences facilitated the design of primers containing minimal degeneracy.

Degenerate PCR primers corresponding to positions 54–60 (QIGCDPKHAD) and positions 167–174 (NGFDALFA) in the Anabaena variabilis str. ATCC 29413 ChlL sequence (YP_322845) were selected. Primer specificity and PCR condition optimization was performed in reactions containing genomic DNA from positive amplification control cultures of Synechoccocus sp. JA-3-3Ab, Roseiflexus sp. Rs-1 and using genomic DNA from Thermatoga maritima MSB8, Azotobacter vinelandii and Methanobacterium thermoautotrophicum str. Delta H as negative amplification controls. Primers chlL/bchL-54F (5′-CARATYGGHTGYCAYCCNAARCAYGA-3′, where H=A, C, or T; Y=C or T; N=A, C, G, or T; R=A or G; 768-fold degeneracy) and chlL/bchL-167R (5′-AAYGRYTTYGAYDSBWTHTTYGC-3′, where M=A or C; S=C or G; B=C, G or T; 4608-fold degeneracy) were used to amplify a 340 bp fragment of chlL/bchL. Fifty microliter PCRs contained 10 ng of genomic DNA as template, 2 mM MgCl2 (Invitrogen), 200 μM of each deoxyribonucleotide triphosphate (Promega), 1 μM of both the forward and reverse primer, 200 nM molecular-grade bovine serum albumin (Roche, Indianapolis, IN, USA) and 0.25 U Taq DNA Polymerase (Invitrogen) in 1 × PCR buffer (Invitrogen). Step-down PCR conditions included an initial 4 min denaturation at 94 °C, followed by 4 cycles of denaturation at 94 °C (1 min), annealing at 64, 62, 60 and 58 °C (1 min), and primer extension at 72 °C (1.5 min). The conditions of the final 30 cycles consisted of denaturation at 94 °C (1 min), annealing at 56.5 °C (1 min) and primer extension at 72 °C (1.5 min), with a final extension step at 72 °C (20 min).

Phylogenetic analyses

hydA sequences were translated using the translate tool available on the ExPASy proteomics server ( ClustalX (version 2.0.8) (Larkin et al., 2007) was used to align inferred amino-acid sequences using the Gonnet 250 protein weight matrix with a pairwise alignment gap opening penalty of 13 and gap extension penalty of 0.05. ClustalX was also used to create a pairwise sequence identity matrix. The pairwise sequence identify matrix was then used in the program DOTUR (Schloss and Handelsman, 2005) to identify and group operational taxonomic units and to perform rarefaction analyses.

The phylogenetic position of putative HydA was assessed using MrBayes (version 3.1) (Huelsenbeck and Ronquist, 2001). Tree topologies were sampled every 500 generations for 2 × 106 generations using the Whelson and Goldman (WAG) evolutionary model with fixed amino-acid frequencies and gamma-shaped rate variation with a proportion of invariable sites as recommended by ProtTest (Abascal et al., 2005). The Saccharomyces cerevisiae ‘Narf’ protein, a distantly related homolog of HydA, served as the outgroup in the phylogenetic analysis. A consensus phylogeny was generated from 2000 trees sampled at stationarity (standard deviation between split frequencies <0.03). The HydA phylogram was rate-smoothed using the non-parametric rate smoothing approach implemented in version 1.71 of the program r8s (Sanderson, 2002).

Community ecology

Phylocom (version 4.0.1) (Webb et al., 2008) was used to calculate various metrics of phylogenetic community structure, including Rao's quadratic entropy (Dp), the net relatedness index (NRI) and the nearest taxon index (NTI) using the non-parametric-rate smoothed Bayesian chronogram. Dp represents an abundance-weighted measure of the pairwise differences in branch length of sequences in a community and is thus a metric describing phylogenetic diversity. A higher Dp index for an assemblage is indicative of higher phylogenetic diversity relative to the total HydA pool. The NRI metric is a measure of tree-wide phylogenetic clustering of sequences from a given community, whereas the NTI metric is a standardized measure of the phylogenetic distance to the nearest taxon for each taxon within a community and is more sensitive to terminal (branch tip) clustering (Webb, 2000). Increasingly positive NRI and NTI scores indicate that co-occurring species are more phylogenetically related than expected by chance (phylogenetic clustering). On the contrary, increasingly negative NRI and NTI scores indicate that co-occurring species are less phylogenetically related than expected by chance (phylogenetic overdispersion) (Webb, 2000). Statistical confidence in NRI and NTI values was assessed by comparison with randomly generated phylogenies (null communities) using the ‘phylogeny shuffle’ model in Phylocom.

A Rao phylogenetic community distance matrix for all pairwise comparisons of sampled communities was constructed from site abundance data and an NRPS HydA cladogram using Phylocom. Euclidean distance matrices derived from the nine environmental variables listed in Table 1 as well as geographic distances (GDs) and elevation differences between sites were constructed using the base package within R (version 2.10.1) (R Development Core Team, 2010). Environmental parameters that were below the analytical detection limit were given a value of 0 for the purpose of this study. With Rao phylogenetic distance as the response variable, Model selection through Akaike Information Criteria adjusted for small sample size (AICc), Mantel regressions, and principle coordinate ordinations (PCO) were performed using the R packages Ecodist (Goslee and Urban, 2007), pgirmess (version 1.4.3) (, vegan (, and labdsv ( Traditional Mantel regressions were performed on the data because of the inherent redundancy of the environmental and phylogenetic distance data, whereas Model selection and regression was performed to more closely evaluate the relative performance of individual and combinations of parameters in describing the phylogenetic data. We utilized PCO analysis to visually inspect HydA community phylogenetic relationships, as arrayed on two-dimensional plots, for patterns of clustering. We considered the model with the lowest AICc value to be the best and evaluated the relative plausibility of each model by examining differences between the AICc value for the best model and values for every other model (ΔAICc) (Johnson and Omland, 2004). Models with ΔAICc <2 were considered strongly supported by the data, and models with ΔAICc>10 and/or Mantel significance (P) values >0.05 can be considered to have essentially no support from the data.

Table 1 Geographical, physical and chemicala data for nine selected environments in Yellowstone National Park


The geothermal springs selected for this study represent the broad range of temperature (25–93 °C) and pH (1.90–9.84) combinations that are commonly encountered in YNP springs (Figure 1, Table 1). In addition, the range of measured chemical parameters was broad and generally reflects the ranges of concentrations that are encountered in YNP springs. Of the 65 mat or sediment samples collected in the present study, 29 yielded hydA PCR products of the predicted size. These 29 putative hydA amplicons clustered within a defined pH and temperature range (Figure 1a), indicating that the distribution of hydA in YNP is non-random. hydA was not identified in any spring with temperature greater than 65 °C. At this temperature, hydA was restricted to environments with alkaline pH (pH>8.9). In contrast, the upper temperature limit for the detection of hydA in acidic environments (pH<3.0) was 36 °C, indicating that the upper temperature limit for hydA in YNP springs is pH-dependent.

Figure 1
figure 1

Distribution of hydA (a) and bchL (b) in sediment and mat sampled from 65 geothermal springs from four geographic locations in Yellowstone National Park as a function of spring water pH and temperature. Red squares denote environments where amplicons were detected and blue triangles denote environments where amplicons were not detected.

Bacteriochlorophyll biosynthesis genes bchL/chlL were detected in 41 of the 65 mat or sediment samples collected from YNP springs in the present study. Like hydA, the putative bchL/chlL amplicons clustered within a defined pH and temperature range (Figure 1b). In springs with alkaline pH (pH>9.0), the upper temperature limit for the detection of bchL/chlL genes was 70 °C. On the contrary, the upper temperature limit for the detection of bchL/chlL genes at acidic pH (<3.0) was 50 °C, indicating that the upper temperature limit for bchL/chlL in YNP springs is also pH-dependent. bchL/chlL genes were detected in 27 of the 29 environments where hydA was detected, with the exceptions being the samples collected from springs with low pH (<3.0) and low temperature (<30 °C) (Figure 1a). In contrast, bchL/chlL genes were detected in 13 environments in YNP where hydA was not detected.

Of the 29 environments that yielded putative hydA amplicons, nine were chosen for further analysis on the basis of their geochemistry and geographic location in order to span the broad geochemical and geographical spectrum available in YNP (Table 1). A total of 160 hydA clones were sequenced (Supplementary Table 1) from these nine environments such that the average depth of sequence coverage, as indicated by the predicted number of phylotypes in each environment by rarefaction analysis, was approximately the same (data not shown). Rao's phylogenetic diversity index (Dp), a β-diversity metric which incorporates abundance weights for branch lengths associated with each assemblage, varied significantly with environmental pH (R2=0.85, P<0.01, second-order polynomial (SOP)). The highest Dp indices were observed in HydA assemblages sampled from slightly acidic to circumneutral springs whereas the lowest indices were observed in assemblages sampled from alkaline environments, indicating higher diversity in assemblages sampled from acidic environments. No other measured environmental variable exhibited significant correlation with Dp.

Mantel regression and Model selection approaches evaluated using the difference in Akaike Information Criteria (ΔAICc) indicated that the HydA phylogeny shows evidence for both ecological and geographic structure. Both statistical approaches indicated that GD between sites was the best individual explanatory variable for predicting the phylogenetic relatedness of HydA assemblages (ΔAICc=1.05; Mantel R2=0.45, P<0.01) (Table 2). Importantly, pH differences between sites was also a strong individual explanatory variable for predicting the phylogenetic relatedness of HydA assemblages (ΔAICc=6.48; Mantel R2=0.36, P<0.01). Differences in site elevation and salinity also ranked as important explanatory variables of HydA phylogenetic relatedness by Mantel regression approaches, but not in model selection approaches. Site elevation was autocorrelated with GD (adj R2=0.83, P<0.01), and is thus of negligible importance in predicting HydA phylogenetic relatedness. Importantly, other models comprising a single explanatory variable had no statistical support from the data (ΔAICc>10.0 and/or Mantel P>0.05) in predicting the relatedness of HydA assemblages.

Table 2 Phylogenetic diversity metrics computed for each of the nine assemblages

Models incorporating various combinations of GD and pH distance with additional explanatory variables were often stronger predictors of the phylogenetic relatedness of HydA assemblages than single variable models. The combined GD and pH distance model was the best explanatory model for predicting HydA phylogenetic relatedness (ΔAICc=0.00; Mantel R2=0.51, P<0.01) (Table 2) although models incorporating GD and PO43− differences between sites (ΔAICc=0.15; Mantel R2=0.50, P<0.01), and models incorporating pH and PO43− differences between sites (ΔAICc=0.44; Mantel R2=0.50, P<0.01) were also strong predictors of HydA phylogenetic relatedness. Furthermore, a model incorporating GD and salinity differences between sites (ΔAICc=1.42; Mantel R2=0.49, P<0.01) was also as strong of a predictor of HydA phylogenetic relatedness as models incorporating PO43− differences. In contrast to GD, pH, and salinity differences between sites, PO43− differences between sites alone was not a statistically significant predictor of HydA relatedness between sites (Table 2). Linear regressions of Rao among community HydA relatedness plotted as a function of between site GD, pH, and salinity differences all resulted in positive trending and statistically significant relationships (Figure 2). In contrast, a linear regression of Rao among community HydA relatedness plotted as a function of between-site PO43− differences yielded a slightly negative trending and statistically insignificant relationship. Such a negative relationship would suggest phylogenetic over-dispersion or competitive exclusion with respect to PO43−. Given that this negative relationship is weak, invoking competitive exclusion would be tenuous at best and PO43− parameter is therefore discounted. Thus, models, which incorporate GD, pH, and salinity differences between sites as explanatory variables of community phylogenetic relatedness, are more strongly supported by the data.

Figure 2
figure 2

Scatter plots of HydA Rao phylogenetic distance between sites as a function of between-site geographic distance, pH differences, salinity differences, and phosphate differences. Mantel R2 and associated P-values are reported for each plot. Environmental characteristics associated with each environment are described in Table 1.

The phylogenetic relatedness of HydA assemblages in relation to environmental and geographic characteristics was further examined using PCO of Rao among-community distances (Figure 3). PCO also indicated a strong relationship between the Rao among-communtiy phylogenetic relatedness of HydA assemblages, GD and spring pH differences. The first axis (72.1% variance explained) was significantly correlated to geographic distance (adj R2=0.88, P<0.01) and to pH differences between sites (adj R2=0.53, P=0.02). The second axis (22.5% variance explained) was significantly correlated to salinity (adj R2=0.55, P=0.01). Significant correlations were not observed between PCO axes and other variables. Thus, the results of PCO analysis corroborate the model selection approaches and indicate that the phylogenetic relatedness of HydA assemblages is likely controlled through the interaction between site geographic and pH differences and to a lesser extent, salinity differences.

Figure 3
figure 3

PCO analysis of hydA deduced amino acid sequences from nine environments. Assemblages sampled from acidic environments (NL1 and NG) are denoted in orange, circumneutral environments (IG1, IG2, IG3, IG4, and NL2) in green, and alkaline environments (HL1 and HL2) in blue. IG, Imperial Geyser within the Lower Geyser Basin; HL, Heart Lake Geyser Basin; NG, Norris Geyser Basin; NL, Nymph Lake.

HydA assemblages sampled from YNP geothermal springs contain lower taxonomic diversity and are more likely to be phylogenetically clustered when compared with randomly generated HydA assemblages as indicated by statistical confidence in NRI and NTI (Table 3). HydA from alkaline environments exhibited both terminal (as indicated by NTI) and tree-wide (as indicated by NRI) phylogenetic clustering whereas the extent of phylogenetic clustering in assemblages sampled from acidic environments was variable. Phylogenetic clustering as measured by NTI was greater in HydA assemblages sampled from alkaline environments than assemblages sampled from circumneutral to acidic environments. Statistically significant and positive NRI and/or NTI for all YNP HydA assemblages suggests a role for ecological filtering, rather than competition in structuring the phylogenetic diversity of HydA in the Yellowstone geothermal complex.

Table 3 Model ranking using ΔAICc and Mantel correlation coefficients (R2) where HydA Rao among community phylogenetic distance is the response variable

The dominant HydA phyla in YNP springs reflected spring pH (Figure 4), a finding that is consistent with the results of Mantel tests, PCO analysis, and phylogenetic diversity metrics (see above) which taken together indicate an important role for pH in controlling the phylogenetic composition of HydA assemblages. HydA assemblages sampled from acidic springs (NL1, NG) were dominated by sequences affiliated with the Thermotogae whereas assemblages sampled from circumneutral springs (IG1, IG2, IG3, IG4, NL2) were dominated by sequences affiliated with the Firmicutes. The sole sequence obtained from both alkaline springs (HL1, HL2) was affiliated with the Elusimicrobia (candidate division Termite Group 1). The abundance of sequences affiliated with Thermotogae and Elusimicrobia in each of the nine assemblages was positively correlated with pH (r2=0.71 and 0.58, respectively, SOP) (data not shown). In contrast, the abundance of sequences affiliated with the Firmicutes was not correlated strongly to pH (r2=0.08, SOP) but was significantly correlated to both sulfide and salinity (r2=0.85 and 0.70, respectively, SOP).

Figure 4
figure 4

Phylum-level phylogenetic composition of HydA sequences from three spring types as determined by BLASTp analysis to closest cultivated representative. Spring types were defined as acidic (NL1 and NG), circumneutral (IG1, IG2, IG3, IG4, and NL2), and alkaline (HL1 and HL2). The relative abundance of sequences within each phylum for each spring type represents the percent of total sequences for that spring type. The phylum Elusimicrobia is also known as candidate division Termite Group 1. IG, Imperial Geyser within the Lower Geyser Basin; HL, Heart Lake Geyser Basin; NG, Norris Geyser Basin; NL, Nymph Lake.


The distribution and phylogenetic diversity of protein-encoding genes provides insight into the factors, both biotic and abiotic, that have constrained the evolution of those genes in a given environment. In this study, hydA served as a proxy for examining the factors that have influenced the ecology and evolution of fermentative bacteria putatively involved in H2 production in YNP. Our results indicate that the distribution of hydA in YNP is non-random and is pH- dependent, suggesting a role for pH in the ecology and evolution of hydA in YNP. The pH-dependent distribution of hydA in YNP is similar to the distribution of phototrophs in YNP as evinced by the distribution of bchL/chlL genes where the upper temperature limit for oxygenic phototrophs in alkaline environments was observed to decrease from 70 °C in alkaline pH 9.0 systems to 48 °C in acidic pH 3.0 systems, a set of observations that are consistent with the qualitative trends noted in earlier studies (Cox and Shock, 2003; Spear et al., 2005; Lehr et al., 2007). The relationship between the presence of hydA and bchL/chlL could be explained, in part, by the temporal dynamics of phototrophic mat communities, which cycle from oxygen saturation during the day to anoxia at night (Revsbech and Ward, 1984). At night, oxygenic phototrophs ferment stored polyglucose and excrete metabolites such as acetate, butyrate, and propionate which are rapidly consumed under periods of anoxia by fermentative organisms (Anderson et al., 1987). Thus, the pH-dependent distribution of hydA in YNP springs might reflect the distribution of oxygenic phototrophs as a result of their production and excretion of fermentable organic carbon.

hydA was not detected in any YNP spring with temperature >65 °C, suggesting that fermentative bacteria are unlikely to be an important source of H2 in these environments. Since the upper temperature limit for photosynthesis is 72 °C (Shock and Holland, 2007), photosynthetically derived H2 is also an unlikely source for H2 in these environments. However, elevated concentrations of H2 are routinely measured in springs in YNP where temperatures exceed 65 °C (Inskeep and McDermott, 2005; Spear et al., 2005). H2 in these systems is likely of abiotic origin, derived from subsurface basalt-catalyzed water hydrolysis (Spear et al., 2005) and could further support the notion of an H2 driven ‘deep hot biosphere’ (Gold, 1992). Basalt-catalyzed H2 production is sensitive to both solution pH and reaction temperature, with a near doubling of H2 production rates with a doubling of incubation temperature (Stevens and McKinley, 2000). Similarly, the rate of basalt-catalyzed H2 production increases with increasing water acidity (lower pH) (Stevens and McKinley, 2000). Coincidentally, hydA was not detected in YNP springs predicted to have elevated geological H2 production (for example, acidic pH, >36 °C; alkaline pH, >65 °C), an observation that might reflect the unfavorable thermodynamics of bacterial organic carbon fermentation in the presence of high H2 partial pressure (Schink, 1997). Thus, environments with elevated inputs of geological H2 might select against bacterial fermentative metabolisms, which in turn might constrain the distribution of hydA in YNP. Further examination of the distribution of hydA across a gradient of environmental H2 concentrations and across the physical and chemical limits for photosynthesis (‘photosynthetic fringe’) (Shock and Holland, 2007) in YNP will continue to provide insight into the role of biological and abiological factors in controlling the distribution of hydA and bacterial fermentation in YNP.

Of all of the measured environmental parameters, GD was the best predictor of HydA assemblage relatedness, suggesting dispersal limitation even across the small spatial scale sampled in the present study (maximum between-site distance is 53.3 km). These findings are consistent with a number of recent studies focused on microbial communities ([(Martiny et al., 2006) and references therein] (Whitaker et al., 2003)), which together challenge the long-held notion that ‘everything is everywhere and the environment selects’ and implicate the fundamental role of geographic constraints in the assembly of microbial lineages in natural communities. For example, an examination of endemic populations of the hyperthermophilic archaeon Sulfolobus using multi-locus sequence analysis revealed dispersal limitation in geothermal environments across large geographical scales spanning the Northern hemisphere (Whitaker et al., 2003). Similarly, Papke et al. (Papke et al., 2003), showed geographic isolation in cyanobacterial populations sampled from geothermal springs from geographically distinct environments across the globe. More recently, Takacs-Vesbach et al. (Takacs-Vesbach et al., 2008), identified historical patterns in the phylogenetic diversity of the Aquificales genus Sulfurihydrogenibium as revealed by the similarity of 16S rRNA gene sequences within a given volcanic caldera in YNP but not between calderas. This suggests that geographic provinces structure population diversity of Sulfurihydrogenibium at very small spatial scales, a finding that is consistent with the results presented herein. Thus, dispersal limitation should be considered in all investigations of the ecological factors that influence microbial community structure, even when sample sites span small spatial scales.

As expected, pH was also a strong predictor of HydA phylogenetic diversity, with the maximum diversity observed in assemblages sampled from environments with slightly acidic pH (4–6). Thus, environments with slightly acidic pH are likely to harbor higher than average HydA functional variation, if functional variation is positively correlated with genetic variation. Increased functional variation, whether driven by biotic or abiotic processes, enables members of a community to respond differentially to changes in their environment (McCann, 2000) and has been linked to increased ecosystem function (Tilman et al., 1997). In the YNP subsurface, the boiling of fluids under high pressure leads to the partitioning of various solutes and volatile gases into either the vapor phase or hot-water phase (Fournier, 2005). Vapor-dominated springs are generally acidic whereas hot-water phase dominated springs are generally neutral to alkaline. Springs with circumneutral to slightly acidic pH (for example, pH 4–6) are the result of dilution of vapor-dominated water with meteoric fluid, the extent of which can be affected by water-table drawdown as a function of seasonal changes in precipitation and hydrology (Nordstrom et al., 2005). Thus, the high species variability associated with slightly acidic springs might enable members of a community to better respond to change in both physical and chemical conditions in these environments due to seasonal hydrological and chemical changes. The end-result is a more resilient community, which is able to maintain a relatively steady evolutionary tempo in these geothermal environments.

The composition of the HydA assemblages varied substantially over the three spring types sampled in the present study, with each spring type dominated by a HydA affiliated with a different bacterial phylum. The abundance of dominant sequence lineages in each of the spring types was correlated strongly with environmental pH or salinity, a hypothesis which would be consistent with the inferred role of ecological filtering in YNP HydA assemblages as assessed by the NRI and/or NTI metrics. Previous studies have identified pH as an important driver of the distribution and phylogenetic diversity of soil and freshwater lake bacterial and fungal communities in several geographic locales (Newton et al., 2007; Lauber et al., 2009; Dumbrell et al., 2010), whereas salinity has been suggested to represent the predominant factor controlling the global phylogenetic diversity of bacterial assemblages (Lozupone and Knight, 2007). For the first time, the results of the present study indicate a role for pH and to a lesser extent salinity as filters of the phylogenetic diversity of a functional gene as a proxy for a metabolic guild (bacterial fermentation) in natural environments. Importantly, springs in YNP are inherently low salinity environments and thus the role of salinity in controlling the phylogenetic diversity and relatedness of HydA assemblages on a larger scale may be underestimated. Further cross-system comparisons of HydA assemblages in environments that span a gradient of salinity and pH regimes will continue to provide insight into the relative role of pH and salinity in structuring HydA communities and phylogeny.

In summary, the results presented here indicate that both geographic factors such as the fragmented nature of hot spring systems and ecological factors such as spring pH, acting in concert, explain phylogenetically structured HydA communities in YNP springs. Our observations indicate that spring pH has imposed strong phylogenetic niche conservatism on HydA assemblages in YNP springs. Such niche conservatism, [that is, the tendency of lineages to maintain their ancestral ecological niche (Wiens, 2004)] signifies that the constituents of a HydA bacterial community are likely to inherit their pH affinity from their ancestors and furthermore pass on this same pH predilection to their descendants. This also suggests restricted gene flow across pH extremes. Within a narrow pH realm, the detection of dispersal limitation signifies not just that geographically proximal communities are more closely related phylogenetically but that local adaptation could be limited to isolated and persistent local communities, following the arguments of Hubbell (2001). These results are consistent with earlier examinations of species endemism in geothermal environments (Papke et al., 2003; Whitaker et al., 2003; Takacs-Vesbach et al., 2008) and show endemism of bacterial lineages at small spatial scales. In addition to the role of pH and GD, the results presented here allude to the importance of interspecies or interguild relationships (that is, phototrophs and fermentative bacteria) in structuring the genetic diversity of communities. Further examination of HydA phylogenetic diversity of samples collected from locations that span even smaller geographical and geochemical scales in combination with high resolution genetic analysis of multiple loci will continue to provide insight into the relative roles of dispersal limitation, niche conservatism, and interspecies interactions that structure fermentative bacterial communities and microbial communities in general. A further extension of the present study would be to examine the phylogenetic structure of phenotypic traits, such as inferred protein stability in the presence of gradients of salt and temperature, in order to determine which traits facilitate ecological filtering among geochemically- and geographically-disparate environments. Such studies will provide insight into our understanding of the evolutionary forces that shape the phylogenetic structure of geothermal spring microbial communities and how this relates to function at the enzyme, population, or ecosystem level.