Geochemical and Microbial Community Attributes in Relation to Hyporheic Zone Geological Facies

The hyporheic zone (HZ) is the active ecotone between the surface stream and groundwater, where exchanges of nutrients and organic carbon have been shown to stimulate microbial activity and transformations of carbon and nitrogen. To examine the relationship between sediment texture, biogeochemistry, and biological activity in the Columbia River HZ, the grain size distributions for sediment samples were characterized to define geological facies, and the relationships among physical properties of the facies, physicochemical attributes of the local environment, and the structure and activity of associated microbial communities were examined. Mud and sand content and the presence of microbial heterotrophic and nitrifying communities partially explained the variability in many biogeochemical attributes such as C:N ratio and %TOC. Microbial community analysis revealed a high relative abundance of putative ammonia-oxidizing Thaumarchaeota and nitrite-oxidizing Nitrospirae. Network analysis showed negative relationships between sets of co-varying organisms and sand and mud contents, and positive relationships with total organic carbon. Our results indicate grain size distribution is a good predictor of biogeochemical properties, and that subsets of the overall microbial community respond to different sediment texture. Relationships between facies and hydrobiogeochemical properties enable facies-based conditional simulation/mapping of these properties to inform multiscale modeling of hyporheic exchange and biogeochemical processes.

. The spatial distributions of freeze core samples and their attributes: (a) geologic properties (geometric mean grain size, and percentages of gravel/sand/mud/silt/clay); (b) biogeochemical properties, See Table 2 for a description of biogeochemical parameters measured. The blue to red color scale correspond to measured parameter values from the lower to upper bounds, which are listed below the parameter name. pH. Samples were dried at 35 °C for 72 hrs and sieved to <1 mm. 2 ml of Milli-Q ultrapure water (EMD Millipore, Darmstadt, GE) was added to 2 g sediment. The sediment/water mixture was vortexed for 5 seconds then allowed to settle for 10 minutes. The pH of the resulting slurry was determined using a Denver Instruments Model 215 pH meter with a pH/ATC probe (Denver Instrument Company, Denver CO).
Biological activity. Extracellular enzyme activity was determined using the method described by 44  For each sample, three, 0.25 g (wet) subsamples were weighed into 50 ml Corning Centrifuge tubes (Sigma Aldrich, St. Louis MO). 37.5 ml 50 mM MOPS buffer (pH 7.2) was added to each tube and vortexed for 1 minute. 200 µL aliquots of the suspension were transferred to a 96-well plate. In addition to the suspension, sample wells received 50 µL substrate, wells designated for quench standards received 50 µL substrate standards (10 µM, 5 µM, 2.5 µM, 1.25 µM, 0.625 µM, 0.313 µM, 0.156 µM, 0.078 µM, 0.0038 µM). Blank wells received 200 µL sample suspension and 50 µl buffer. Plates were incubated in the dark for 1.5 hr at ~21 °C. 10 µL aliquots of 1 M NaOH were added to each well to stop the reaction. Fluorescence was read with 365 nm excitation and 450 nm emission filters on a SpectrMax GeminiXS Microplate Spectrofluorometer (Molecular Devices, Sunnyvale CA). Activities were calculated on a per gram (dry weight) per hour basis. The measured properties include biomass and microbial activity parameters.
Quantitative analysis of microbial abundance. Quantitative PCR (qPCR) was used to quantitate Bacterial 16 s rRNA gene copies in the genomic DNA samples. A general domain-level Bacterial assay (Nadkarni et al. 45 ), was performed. Hydrolysis-probe assays and a double-stranded DNA standard was synthesized using  Table 1. Major geological properties from the core sample measurements. GRAVEL, SAND, MUD, SILT, and CLAY denote decimal fractions of each grain size. FC is the classification in Folk's system 111,112 .
gBlocks Gene Fragments by Integrated DNA Technologies (IDT; Coralville, Iowa (IDT). The standard was 924 bp in length, and consisted of a slightly modified portion of the 16 S rRNA gene of Escherichia coli 45 . Standard curves were generated across 6 orders of magnitude, and efficiency was 93%. Assays were performed in 384-well plates using a Life Technologies ViiA7 real-time PCR instrument at the DNA Services Facility at the University of Illinois at Chicago, using conditions described previously 45 . Final bacterial 16 S rRNA gene abundance was calculated from the linear regression of the standards, and copies per total extract was calculated by multiplying by the elution volume. Final normalization was to grams of dry sediment.
Genomic DNA isolation. Subsamples were removed from −80 °C storage and thawed in an ice bucket for ~1hr. Excess water was removed by centrifugation for 1 min at 10,000 x g. Samples were split, and dry mass and moisture content was determined on one subsample by drying for 72 hrs at 65 °C. The moisture content was then used to calculate the dry mass of the other subsample. Genomic DNA was extracted using a MoBio PowerSoil kit (MoBio Laboratories, Inc., Carlsbad, CA). Extractions were done following manufacturer's instructions with the addition of a 2 hr proteinase-K incubation to facilitate cell lysis. Briefly, 2 µl of 20 mg/ml proteinase-K solution (Applied Biosystems) and 60 µl of C1 solution was added to each bead tube, vortexed for 5 sec and then placed in a 60 °C water bath for 1hr.
Community structure analysis. In phylogeny, an operational taxonomic unit (OTU) is an operational definition of a species or group of species often used when only DNA sequence data is available. It is the most commonly used microbial diversity unit. In this study, OTUs were derived from amplicon analysis of the V4 region of the 16 S ribosomal RNA gene (rrnA). Amplification from genomic DNA isolated from sediment samples was performed using the Earth Microbiome Project barcoded primer set 515 F/806 R (http://press.igsb.anl. gov/earthmicrobiome/emp-standard-protocols/16s/) 46 with the exception that the twelve-base barcode sequence was included in the forward primer. Amplicons were sequenced on an Illumina MiSeq using the 300 cycle MiSeq Reagent Kit v2 (http://www.illumina.com/) according to manufacturer's instructions.
Resulting sequences were processed with QIIME 47 . The split_libraries_fastq.py was used to de-multiplex the fastq-formatted sequences, and sequences were screened for a Phred quality score of 20. Trimmed sequences were analyzed using mothur v 1.34.0 48 . Sequences were aligned against the SILVA v 123 seed alignment (Silva. seed_v123.tgz) provided on the mothur website (http://mothur.org/wiki/Silva_reference_files). The alignment was trimmed to columns 13,862-23,444 with a maximum allowed homopolymer run length of 8. Sequences were pre-clustered with an allowance of 2 mismatches and checked for chimeric sequences using uchime 49 . This process resulted in 367,620 total sequences and 102,351 unique sequences. Sequences were classified against the SILVA v123 seed reference set using the default Wang method 50 . A distance matrix was built using the default onegap option and end gap penalties (countends = T), and the sequences were clustered using the default average neighbor algorithm. OTUs were defined at 97% identity. Diversity and sampling calculations were performed on data sets down-sampled to the size of the smallest set (N = 11,934).

Network analysis.
OTUs comprising > 0.1% total abundance of any of the samples (620 OTUs) were used in network analysis with the biogeochemical (BGC) data. The absolute abundance of organisms was estimated by multiplying the relative abundance values calculated from the amplicon data by the qPCR-derived rrnA copies/g dry sediment values. Biogeochemical data was log e -transformed. qPCR is widely used for DNA or ribonucleic acid (RNA) quantification in molecular biology with amplification of nucleic acid using a polymerase chain reaction (PCR) 51 . Network analysis was done using the WGCNA package 52 . OTU abundances were correlated using the 'cor' function and 'spearman' method. Dissimilarity was calculated (1 -abs(x)), and data was clustered using the 'hclust' function and the 'average' method. In biogeochemistry, the clusters of community composition data by network analysis, based on Spearman's rank correlation, are named modules. Here, modules were defined using the 'cutreeDynamic' function with the options deepsplit = 2, pamRespectsDendro = F and minCluster-Size = 10. The module eigengene of a given module is defined as the first principal component of the standardized gene expression profiles. Here, module eigengenes were extracted and correlated with the transformed BGC data, as described above. Correlations between OTUs and BGC data were also calculated as described. Node pairs for which the Spearman rho value magnitude was ≥ 0.6 and having a corresponding P-value ≤ 0.05 were exported to Cytoscape 53 for visualization.

Statistical analysis.
Clustering of samples based on grain size distribution is critical as it provides the basis for defining facies and grouping components of the complex biogeochemical system into a manageable set of discrete classes. Geological classification was done following Folk's classification system with the descriptive terminology modified from and the Udden-Wentworth grade scale 39,40 .
Alternative clustering approaches based on statistical algorithms were also explored, including Principal Component Analysis (PCA), hierarchical and Expectation-Maximization (EM) clustering. PCA was used for identifying similarities in the multivariate set of variables including physical, geochemical, and microbial community attributes. PCA reveals the internal structure of the data in a way that best explains the variance in the data. By using only the first few principal components, the dimensionality of the transformed data can be reduced. The built-in R function prcomp was used for PCA and the package 'factoextra' was used for visualization. The Expectation-Maximization (EM) algorithm is a general approach for approximating maximum likelihood estimations in a multivariate data set 54 . It provides objective clustering solutions and also offers uncertainty measures of the resulting classification 55,56 .
Multivariate Regression and ANalysis Of VAriance (ANOVA) approaches have been widely used in various disciplines to evaluate relationships between dependent variables and multiple explanatory variables, discrete or continuous [57][58][59][60][61][62][63][64] . Multiple regression can be displayed by plotting the dependent variable against the raw values of the independent variable. In practice, the procedure and resulting multiple regression coefficients may be misleading because the value of the partial slope may change in magnitude and even in sign relative to the slope obtained in simple least-squares regression. The issue can be even worse when the explanatory variables are dependent on or constrained by each other. One possible solution is to evaluate partial relationships with partial regression. Partial regression is useful for visualizing the true scatter of points around the partial regression line and identifying influential observations and non-linear patterns more efficiently than using plots of residuals vs fitted values. Partial regression analysis can be done by evaluating residuals of the dependent variable on the remaining explanatory variables vs residuals of the target explanatory variable on the remaining explanatory variables 65 .

Results
Associations of geological facies and biogeochemical properties. The relationships between geological attributes (grain size distribution [gsd] and gsd-based facies) and biogeochemical properties were evaluated. The freeze core sediment samples were generally dominated by gravel, with geometric average grain size ranging from 0.11 to 23.7mm. According to Folk's classification system, the samples can be classified into four different classes/facies: gravel (G), muddy gravel (MG), muddy sandy gravel (MSG), and gravelly mud (GM), each containing 14, 1, 5, and 1 sample(s) respectively, as shown in Fig. 2. Folk's classification (FC) is based on the relative ratios of gravel, sand, and mud. The grain sizes also exhibit different sorting: by adding sand, silt, and clay content, such that the samples can be further classified as very coarse gravel (11), coarse gravel (3), very coarse silty sandy very coarse gravel (1), coarse silty sandy very coarse gravel (3), muddy very coarse gravel (1), very coarse gravelly medium silt (1), and medium silty sandy coarse gravel (1), with the number in each parenthesis indicating the fractional mass of material in the class, as summarized in Table 1.
An alternative clustering approach to Folk's classification is statistical EM clustering of the full range grain size distributions. Compared to the FC facies, the number of samples is more evenly dispersed among EM facies. There are different ways to quantitatively describe the grain size features, including the raw grain size percentages (e.g., %gravel, %sand, %mud), the classes based on the different attributes of gsd, and geometric and arithmetic averages of the grain sizes. These geological properties are all listed in Table 1, and any subset of these descriptors can be included for evaluating their control on biogeochemical variations. However, if the complete set of these physical attributes are used as explanatory variables, their relative contributions cannot be reliably resolved because of the closure problem (i.e., the constant sum of the grain size fractions). This occurs because the fractions of the different grain size classes sum to one, inducing spurious negative correlations between the grain size classes. Therefore, a parameter screening was done by performing paired individual regression between the physical attributes and all environmental variables, and the two most influential parameters were identified as %sand and %mud, which are nearly orthogonal to one another (Fig. 3).
The complete environmental variables and measurements are listed in Supplemental Table S1. Dependent variables selected for further analysis, as shown in Table 2, include those with strong associations with either C/N/P concentration, or C/N/P-mining enzyme/activities. Multivariate relationships among measured variables are visualized across the first two PCA axes (the first two PC's explained 66.5% of overall variability) in Fig. 3, which shows that samples don't cluster and most variables don't co-vary. Furthermore, it shows that C:N ratio has a negative correlation with %mud, and that R N:P (the ratio of N-acetyl-glucosaminidase and aminopeptidase to phosphatase activity) has a negative relationship with %sand; while Ru and %TOC are positively correlated.
PCA shows that %mud and %sand are essentially orthogonal, but still they are slightly correlated with each other due to the closure problem. Therefore, partial regression analysis was performed to study the 'true' relationships between the predictors (i.e., %sand, %mud) and dependent variables to avoid misleading regression coefficients and relationships.
In order to understand whether the %sand and %mud have nonlinear relationships with biogeochemical attributes, the partial regression starts with a quadratic model and then model selection techniques are used. Specifically, the initial partial regression models include intercept, linear, and second order terms. Then the Akaike Information Criterion (AIC) was used for backward removal of unnecessary terms 66 . The final model represents the most parsimonious model of the true relationship between the dependent variable and the predictors (see Fig. 4). In Fig. 4, the initial models are in red and the final models are in blue. Only one line is shown if the initial quadratic model cannot be reduced. The results show that most biogeochemical attributes have good linear correlations with %sand and several of them also correlate well with %mud. Exceptions to this include microbial activity parameters such as R N:P and R C:N (the ratio of beta-glucosidase to N-acetyl-glucosaminidase and aminopeptidase activity), which have strong nonlinear relationships with %sand (the R 2 for fitting the variations of the two parameters are 0.479 and 0.403 respectively). As R 2 represents the amount of uncertainty that can be explained by a relationship, such a percentage reduction in uncertainty is very useful when constructing reactive transport model inputs compared to mapping without grain size information and assuming a homogeneous spatial distribution of BGC properties.
Relation of microbial taxa (or sets of taxa) to facies. Microbial community composition. Amplicon analysis of the 16 S rRNA gene was performed on genomic DNA isolated from the 21 freeze core samples to examine how the native microbial community structure varies across the sampled facies. Good's coverage estimates suggest the sequencing depth sampled between 84 and 92% of the total community (Table 3). Species richness and diversity were similar across all samples, with Chao1 richness estimates ranging from 5143 to 12679, and the inverse Simpson index ranging from 44 to 306. Membership was similar across samples at the Phylum level with 11-19% Acidobacteria, 9-22% Proteobacteria, 7-18% Thaumarchaea, 5-19% Nitrospirae, 3-17% Actinobacteria,  Table 2. Major biogeochemical properties from the core sample measurements. Variable names are defined below. a Resazurin-resorufin assay per dry gram of sediment, a proxy for aerobic respiration rate. b Total organic carbon content. c Carbon:nitrogen ratio. d Copies of rrnA detected per gram (dry weight) of sediment as determined by qPCR; a measure of bacterial and archaeal total abundance. e Ratio of β-glucosidase activity to N-acetyl-glucosaminidase + aminopeptidase activities; a measure of whether an ecological system is controlled by energy flow or N limitation. f Ratio of β-glucosidase activity to phosphatase activity; as above, but for P limitation. g Ratio of N-acetyl-glucosaminidase + aminopeptidase activities to phosphatase activity; as above, but comparing the relative strength of control by N versus P.
Scientific REPORTS | 7: 12006 | DOI:10.1038/s41598-017-12275-w 5-13% Chloroflexi, 4-12% Planctomycetes, 1-8% Verrucomicrobia, 1-5% Latescibacteria and 5-8% unclassified Bacteria ( Figure S2A). A distance decay analysis of β diversity (Bray-Curtis) indicated a negligible effect of spatial separation on community composition ( Figure S3, R 2 = 0.055). The Thaumarchaeota and Nitrospirae appear to be less diverse than other clades ( Figure S2B), with 5 OTUs comprising 85% of summed Thaumarchaeal abundance and 5 OTUs comprising 70% of the Nitrospira. For comparison, the top 5 Acidobacterial OTUs comprise only 27% of the Acidobacteria; in contrast, the top 5 Proteobacterial OTUs comprise only 16% of the Proteobacteria. A Class-level examination of the distribution of Thaumaracheal OTUs showed variation in composition across sites, with the most upstream sites (FC01 and FC02) containing more OTUs of the Marine Group I (MG-I), while most other samples contain more Soil Crenarchaeal Group (SCG) OTUs (Fig. 5). Described Thaumarchaeota are ammonia oxidizers, and Nitrospira are nitrite oxidizers, thus together they complete the nitrification pathway from ammonia to nitrate. Despite this potential for cooperative metabolism, there was not a strong relationship between overall Thaumarchaeal abundance and Nitrospiral abundance ( Figure S4, R 2 = 0.065). Analysis at the Class level, though, revealed a positive correlation between SCG abundance and Nitrospira (R 2 = 0.309), and a negative correlation between MG-I abundance and Nitrospira (R 2 = 0.207) ( Figure S4).
To examine the relationship between biogeochemical parameters and community composition, network analysis based on Spearman rank abundance was performed on the most abundant OTUs in the data set (comprising at least 0.1% of the community in at least one sample). First, the OTUs which were observed to co-vary in abundance across the sample set, and thus are assumed to either respond to similar stimuli or exhibit cooperative growth, were grouped, yielding sixteen 'modules' which were randomly assigned color designations. Co-variance between two organisms can manifest either positively, with both increasing in abundance together and decreasing together, or negatively, with opposing increases and decreases. Relationships between OTUs within modules were almost all positive (data not shown). Because the organisms in the modules respond in concert, they can be considered as a single ecological unit to simplify analysis. Modules were correlated with measured environmental parameters (described above) classified into three types: the physical parameters %sand and %mud, as indicators of hydraulic conductivity [67][68][69] (Fig. 6A), the chemical parameters %TOC and C:N, as measures of carbon and nitrogen availability (Fig. 6B), and the biological parameters Ru, R C:N , R C:P and R N:P . No significant correlation between the modules and the biological parameters was observed.
We observed strong correlations between the turquoise, tan, purple, pink, brown, and blue modules and %sand and between the red, green and salmon modules and %mud. No modules correlated with both, and, surprisingly, all significant correlations were negative. The same modules that correlated with %sand also had   Table 2).  Table 3. Microbial diversity of freeze core communities. a after QA/QC. significant positive relationships with %TOC, in agreement with relationships observed in Figs 3 and 4, while the pink module also positively correlated with C:N ratio. The brown module contained 29 OTUs, including the most overall abundant OTU within the data set, a SCG OTU, and three abundant Nitrospira OTUs (Table S2). These dominant taxa, save one of the Nitrospira OTUs, have strong individual correlations with %sand, and %TOC, likely driving the overall relationship between the brown module and these properties (Fig. 7). Over half (23) of the member OTUs had a significant positive correlation with %TOC, including presumed heterotrophs, such as the Acidobacteria and Actinobacteria, and autotrophs (the dominant SCG, two of the abundant Nitrospirae). A subset of these organisms was also associated with resazurin reduction, including not only two Acidobacteria, a Deltaproteobacterium (Desulfurellaceae), and two Chloroflexi (one of which was an Anaerolineae), but also the Thaumarchaeota and Nitrospirae, suggesting nitrification is a significant component of the aerobic respiration measured by resazurin reduction.

Discussion
Geophysical properties of sediments modulate hydrologic flow that in turn influences the distribution and dynamics of microbial populations and biogeochemical attributes in the HZ 1,9 . The relationships between sediment size and permeability have been explored extensively in literature 28,70,71 , which generally indicate higher permeability for coarser materials. This relationship is routinely used to estimate permeability from grain size distributions for modeling exchange flows in the HZ 29,68,72 . In the Columbia River HZ, we propose a model in which areas of higher hyporheic exchange associated with larger average particle sizes (e.g., at least 10 s of mm in diameter) have higher sustained O 2 and organic carbon concentrations due to higher connectivity with surface waters (see Fig. 8). This nutrient-rich environment promotes higher microbial biomass and increased heterotrophic respiration. Heterotrophic metabolism can release NH 4 , depending upon the C:N ratio of the organic matter, subsequently stimulating nitrification 73,74 . Therefore, in our model, aerobic metabolic rates are considered to be higher in facies with lower amounts of fine-grained materials. Increased fine particle content (i.e., %mud) in the HZ increases surface area available for colonization and can promote higher microbial abundances 75 , but, fine-grained sediments also have very low pore connectivity, limiting microbial activity due to slow recharge rates for O 2 and nutrients; the interplay of these mechanisms make the relationship between particle size and microbial activity site-specific 76 .
The correspondence between sand and mud contents and various biogeochemical attributes (e.g., %TOC, R C:N , R N:P ) suggest that grain size-based FC facies also have good correspondence to BGC characteristics, and therefore an efficient way to infer BGC properties is to utilize the statistical distributions of BGC properties with respect to grain size-based facies. The information is particularly needed when modeling is extended to larger spatial scales where both hydrogeological and biogeochemical properties are difficult to discern with adequate spatial coverage and resolution. Sediment grain size, however, is more easily obtained or inferred (e.g., via sieving or imaging). The fact that %sand can explain more of the variability of biogeochemical attributes than %mud could be site-specific to the study area and related to the structure (e.g., compactness and connectivity) of the sediment sampled in the current study. Comparison of classifications showed that Folk's classification was more consistent than hierarchical or EM clustering, as the Folk's groupings do not change when more samples were added for facies definition. This is a desirable feature for characterizing the HZ at various locations and larger scales. Moreover, ANOVA were performed by considering the Folk classes (G, MG, MSG, GM) as factors, and similarly for the other two classification schemes (EM-clustering, and hierarchical clustering). The analysis showed that FC geological facies explained more of the variability in the environmental variables (Table S3). Therefore, Folk's classification should be more applicable for delineating HZ facies at other locations. The mechanisms of geological control on biogeochemical attributes might vary at locations with different sediment texture/structure and geomorphology. The samples analyzed in this study were collected over a relatively limited area close with coarser sediment texture (gravel dominated), and the variability in BGC properties among the local FC facies (G, MSG, MG, GM) is expected to be smaller in the study area than over larger stretches of the river that include sediments with a wider range of grain size distributions. Meanwhile, managed and seasonal river stage changes in the Columbia River also affect the water flow in the HZ and their related biogeochemical processes. The potential differences in biogeochemical activity will be examined in future studies that expand sampling at different seasons to locations with distinctly different hydrodynamic attributes (flow velocity, depth, shear stress) and geomorphological features (e.g., river bed form, bathymetric slope and aspect), where different Our results demonstrate strong relationships between microbial community assemblage structure, respiration, and grain size distribution. Previous work suggests that increased fraction of fine material such as mud reduces bed permeability and hence water residence time, which results in resource depletion -in particular oxygen -thus creating a distinct biogeochemical environment 13 . We observed negative correlations between the sediment characteristics %sand or %mud and organism modules, driven by near-universally negative relationships between individual OTUs and these characteristics. A high percentage of fine sediments can result in low pore connectivity and decreased hydraulic conductivity, and thus low recharge rates for O 2 and organic carbon. Organisms we observed to be negatively correlated with %sand included highly abundant putative aerobic autotrophs (Thaumarchaea and Nitrospira). We noted only negative correlations with %mud which runs counter to the common assumption that sediments with higher concentrations of fines contain more organic carbon 77 . Relationships of soil respiration and organic C turnover to the size of microbial biomass are unclear 78,79 . In this study, we found that %TOC and biomass had inverse correlations with %sand and %mud, indicating that increased permeability, and thus decreased residence time, is critical to stimulating biological activity, likely by recharging electron donor and acceptor pools.
The high abundances of Thaumarchaea and Nitrospira suggest nitrification is an important process in the Columbia River hyporheic environment. Thaumarchaea, and Nitrospira, together comprised >20% of the community assemblage in most samples analyzed (Fig. 5). Interestingly, few ammonia-oxidizing bacterial taxa (AOB) were identified in these communities: only three OTUs were classified as Betaproteobacteria of family Nitrosomonadaceae with an average abundance of 0.6%. Similar results have been observed in marine environments. Waters in and around cold seeps in the ocean have been shown to have high abundance (20-60% of Figure 7. Network plot showing correlations of OTU members (rounded squares) of the brown module to the geophysical (circles), chemical (diamonds) and biological (triangles) parameters measured. OTU node size indicates estimated absolute abundance averaged across all samples, and color indicates Phylum association: magenta -Thaumarchaeota, cyan -Nitrospirae, light green -Acidobacteria, red -Actinobacteria, dark green -Chloroflexi, yellow -Euryarchaeota, cyan -Planctomycetes, blue -Proteobacteria, orange -Thermotogae, brown -Gemmatimonadetes, pink -JL-ETNP-Z39, black -Latescibacteria, and gray -unclassified. OTU nodes are labeled with their Class designation. Dashed edges denote negative relationships; solid edges denote positive relationships. Nodes with significant relationships to carbon (%TOC), texture (%SAND), and oxygen consumption (Ru/g) and their associated edges are highlighted. Edge colors denote connections to geophysical (blue), chemical (red) and biological (green) properties. the microbial community), low diversity populations of Thaumarchaea 80 . In many environments, including soil, estuarine sediments, marine waters and sediments, freshwater lakes, and hot springs, AOA have been observed to outnumber AOB by orders of magnitude, although exceptions to this have also been reported (reviewed in ref. 81 ). Investigations into what drives nitrifier community composition have implicated salinity, pH, and temperature 82 . In the Columbia River HZ, such factors are largely controlled by hydraulic regime, i.e. whether local conditions are groundwater-or river water-dominated. Spatial factors affecting exchange, such as sediment composition, should therefore influence the establishment and dynamics of these populations.
Other abundant (>0.5% average relative abundance) OTUs identified suggested a variety of other metabolisms. Several were related to aerobic heterotrophs such as the Actinobacterium Gaiella, the Rhizobiales Variibacter, and the Chthoniobacterales, members of which are capable of metabolizing plant-derived polysaccharides 83 . Others suggested anaerobic conditions exist within this generally oxidizing system, which might occur within restricted regions, e.g., in areas of the sediment dominated by fine-grained material; physical constraints on water movement influence residence time and hence O 2 depletion in the HZ 84,85 . We found OTUs classified as a Chloroflexi of class Ardenticatenia, the described member of which is a facultative aerobic chemoheterotroph capable of growth via dissimilatory iron-and/or nitrate-reduction under anaerobic conditions 86 , Latescibacteria, which have previously been identified in anoxic environments 87 and have many genes for the breakdown of algal-derived carbohydrates, and Hyphomicrobium, an Alphaproteobacterium for which the described species are facultative methylotrophic denitrifiers, capable of utilizing methanol, monomethylamine, or chloromethane 88,89 . The species richness across samples was 5-10x lower than reported for sediments sampled from the Tongue River 21 , which could be due to the lower sequencing effort in this study yielding a less comprehensive survey or differences in local environmental conditions selecting for a less diverse community.
Nitrification is thought to be an important function within the HZ, but the relationship between environmental conditions and ammonia-oxidizing organisms is complex. Studies targeting the ammonia oxidase (amoA) gene have examined the effect of salinity [90][91][92][93][94] and nutrients 95-107 on the relative abundances and activity of AOA and AOB but have yielded conflicting results. Other studies have indicated a seasonal cycling of AOA and AOB, suggesting C speciation or temperature may be drivers of population abundances. It is also possible that interactions with other organisms are critical, and a broader knowledge of the microbial community structure and function will be necessary to understand terrestrial and aquatic nitrification processes. The dearth of AOB in the Columbia River HZ may be related to the available NH 4 pool that possibly vary with respect to gsd-based facies. AOB can tolerate higher concentrations of NH 4 and NO 2 , but may also require higher concentrations, it also appears the AOA have an unusually high affinity for ammonia that gives them an advantage when concentrations are low 108,109 . The solid phase exchange capability of Hanford hyporheic sediments may account for a large pool of exchangeable NH 4 that results in low aqueous concentrations (results not shown). Under such conditions, AOA would be predicted to outcompete AOB, resulting in the observed distribution.
River basins and reaches are geologically heterogeneous where some regions experience a high degree of hyporheic exchange over short time scales while others experience limited exchange with low fluxes of electron donors and acceptors, even over long periods of time 2,7 . Sediment permeability affects resource availability and drives changes in microbial abundance and assemblage composition. In turn, microbial function alters the chemistry of the system through metabolic activity and the hydrology through growth and extracellular matrix Conceptual model for relationships among textural, hydrological, and biogeochemical properties (e.g., flow rate, hydraulic conductivity, biomass, oxygen, organic compound). Higher flow rates through HZ facies with coarser particle size (and thus higher permeability) result in greater input of O 2 and organic carbon in those facies, which stimulates higher microbial biomass and increased heterotrophic activity. formation 110 . Associations between microbial/BGC features and facies properties provide qualitative constraints for hydro-biogeochemical simulation models of hyporheic zone function. For example, in our study area, the influence of hyporheic exchange rate seems to dominate the impact of surface area; therefore, regions with coarser grain material have greater biomass, more microbial activity, and potentially a shift towards autotrophic nitrification. These qualitative shifts need to be captured by simulation models; if the models do not predict these shifts, it suggests a need for model exploration and refinement. The associations can also provide some quantitative constraints, such as the organic carbon content of sediments and shifts in organic carbon across facies. Our study suggests a complex, yet predictable, dynamic between sediment properties, hydrogeochemistry, microbial community and microbially-mediated nutrient transformation along a short reach. To better understand the role the HZ plays in global biogeochemical cycles will require both broader sampling regimes to capture a wider diversity of sediment classes and investigations at increased spatial resolution to characterize microbial communities and function within and across geologic facies. This will help reduce uncertainty in larger scale characterization and modeling, whereby those facies-relevant microbial and biogeochemical properties can be mapped to larger scale features, or facies, defined by a combination of sediment texture and surface water flow.