Introduction

The competitive exclusion principle, originally developed to conceptualize the organization of macroorganism communities (Hardin, 1960), posits that species richness is maintained by niche differentiation. While competitive exclusion theory has been applied to planktonic microbial communities (Fuhrman et al., 2006; Mou et al., 2008), the validity of this framework for microbes has been challenged from its beginning because of apparent overlap in major niche dimensions among co-occurring species (Hutchinson, 1961). This puzzle has recently been reinforced by genomic and metagenomic data showing that many genes of known biogeochemical or ecological relevance are broadly shared across major marine bacterioplankton groups (Moran, 2008), making it difficult to establish whether there are clear and unique ecological roles for individual bacterial taxa. One potentially important component of ecological function that has not typically been considered is diversity in sensing and responding to environmental cues. Thus, information on dynamic gene expression of individual microbial taxa inhabiting the same environment (Poretsky et al., 2005; Frias-Lopez et al., 2008; Poretsky et al., 2010) can address a key aspect of niche differentiation.

Here we examine taxon-specific gene expression by the microbial members of a well-mixed coastal ocean system. Sequencing with the Illumina GAIIx (San Diego, CA, USA) platform provided deep coverage of the community transcriptome, allowing comparisons of transcripts assigned to hundreds of different reference genomes. This analysis reveals novel details on the functional niches occupied by members of a marine bacterioplankton community and provides insights into the diversity of strategies that support this complex microbial assemblage.

Materials and methods

Sample collection

Quarterly samples were collected at Marsh Landing (31°25′4.08 N, 81°17′43.26 W), Sapelo Island, Georgia, USA as part of the Sapelo Island Microbial Observatory (SIMO, http://www.simo.marsci.uga.edu) and named as follows: FN96 (7 November 2008), FN116 (17 February 2009), FN125 (14 May 2009) and FN158 (14 August 2009). All samples were collected at night, 4–6 h after sunset and 1 h before high tide (see Supplementary Table S1 for times and environmental data). Cell collection for RNA extraction was conducted as described previously (Poretsky et al., 2009; Gifford et al., 2011). Briefly, 6–8 l of water was drawn from a depth of 1 m and passed through a 3-μm prefilter (Capsule Pleated Versapor Membrane; Pall Life Sciences, Ann Arbor, MI, USA) and a 0.22-μm collection filter (Supor polyethersulfone; Pall Life Sciences). The 0.22-μm filter was placed in a WhirlPak bag and flash frozen in liquid nitrogen. Total time from start of filtration to flash freezing was 11–14 min.

RNA processing and sequencing

RNA processing was carried out as described by Poretsky et al. (2009) and Gifford et al. (2011). The frozen 0.22-μm collection filters were shattered, placed into 50 ml Falcon tubes with 8 ml of RLT buffer (Qiagen, Valencia, CA, USA) and 2 ml of PowerSoil beads (MO BIO, Carlsbad, CA, USA) and vortexed for 10 min on a MO BIO vortex adapter. RNA was extracted from the 50 ml tubes using an RNeasy kit (Qiagen), and any contaminating DNA was digested using TurboDNAse (Applied Biosystems, Austin, TX, USA). Ribosomal RNA (rRNA) was reduced using a two-step approach in which the samples were first treated enzymatically with the mRNA-only isolation kit (Epicentre, Madison, WI, USA) and then used in subtractive hybridizations with MicrobeExpress and MicrobeEnrich kits (Applied Biosystems). The enriched mRNAs were linearly amplified using the Message Amp II-Bacteria kit (Applied Biosystems), reverse transcribed to complementary DNA (cDNA) with the Universal Riboclone cDNA synthesis system (Promega, Madison, WI, USA) and purified with the QIAQuick PCR purification kit (Qiagen). The four cDNA samples were sheared to ∼300 bp, barcoded and sequenced in one lane of an Illumina GAIIx run (Supplementary Table S2). Sequences are deposited in the CAMERA database (http://camera.calit2.net) under accession name ‘CAM_P_0000917’.

Bioinformatics pipeline

An initial BLASTn (bit score cutoff ⩾50) comparison of a 25 000 read subsample from the combined metatranscriptome library against the SILVA database was used to assemble a database of rRNA sequences. The full metatranscriptome libraries were then searched against this custom database to identify rRNA sequences for removal. All remaining, non-rRNA sequences were compared with National Center for Biotechnology Information’s (NCBI; http://www.ncbi.nlm.nih.gov) RefSeq database (version 43, September 2010) using BLASTx with a bit score cutoff ⩾40 to identify protein-encoding sequences. Taxonomic affiliation was assigned based on the top RefSeq hit and included all three domains of life as well as viruses. Assignments to Clusters of Orthologous Groups (COGs) for select reference genomes were obtained from the Integrated Microbial Genomes system (http://img.jgi.doe.gov)

Ortholog identification

For a set of 16 genomes selected based on their representation in the transcriptome and taxonomic breadth, orthologous genes were identified in a two-step process. Each gene in a subject genome was reciprocally compared against the other 15 genomes using BLASTp, and reciprocal hits with an E-value<10−4 were considered orthologs. A complete list of reciprocal best-hit ortholog pairs among the 15 genomes was iteratively collapsed by combining rows with common orthologs.

Gene searches

Ribosomal protein (RP) transcripts were identified by a text-based query of the read annotations. Transcripts binning to broad ecological categories for motility, alternative energy generation, transporters and inorganic nutrient uptake and metabolism were identified by keyword searches of the 16 selected reference genomes (Supplementary Table S3), retrieving any gene with a matching annotation along with any orthologs of that gene. The selected genes’ annotations were then visually inspected to confirm a match with the ecological category of interest. For organic compound transporters, the process was the same except genes were initially identified by their COG transporter classification as described in Poretsky et al. (2010).

Genome islands

Putative genome islands were defined as those regions containing a gene with the word ‘phage’ in the annotation and with flanking genes (10 up- or downstream of the phage gene) having a significantly lower mean ortholog count and mean hit count (P<0.05, bootstrapping 95% confidence interval) than the average for the entire genome bin.

Statistical analysis

Statistically significant differences in relative gene expression within an ortholog group were determined using the non-parametric Wilcoxon rank-sum test (P<0.05). Treating each genome bin separately, genes that had read counts >1.5 times the interquartile range of their ortholog count group were labeled as outliers. Rows in the ortholog master table (described above) that contained IDs of outlier genes were then retrieved to make a table of outlier orthologous relationships. The indicator species analysis approach of Dufrene and Legendre (1997) was applied to the ortholog master table to identify ‘indicator genes’: those genes that were both expression outliers and whose expression was biased toward certain genomes. The latter was determined by the indicator value (IV), calculated as the proportion of gene expression contributed by one genome to the total expression of that gene summed over all the genomes.

The mean percent transcriptome within a genome was calculated as:

where aji is the number of reads of gene j in genome i, and ni is the total number of reads in genome i. The IV of gene j in genome i was then calculated as:

where g=number of genomes. Genes with an IV>50 (majority of expression) were considered indicator genes for a particular genome. Because this approach is based on binning of transcripts to reference genomes, genes present in the wild population but not in the best-matched reference strain are missed in the analysis.

Results and discussion

Samples and sequencing

The Sapelo Island Microbial Observatory site is characteristic of nearshore habitats of the southeastern US, with ecological influences from marsh, freshwater and coastal environments (Pomeroy and Wiegert, 1981; Cai, 2011). Four high-tide, nighttime samples representing the fall, winter, spring and summer seasons yielded 31 million 100 bp cDNA reads. Contaminating rRNAs were identified by a BLAST search against a subset of the SILVA database (see methods) and accounted for 62% of the reads (Supplementary Table S2). Of the remaining 11 million potential protein-encoding reads, 4.1 million had a significant hit (bit score>40) in a BLASTx analysis against NCBI’s RefSeq database. The percentage of potential protein-encoding reads with a significant RefSeq hit (mean 34%) is lower than in previous analyses (52% in Poretsky et al., 2010; 53% in Gifford et al., 2011), likely due to the shorter Illumina read length.

Active community members

Based on the highest-scoring hit from the RefSeq BLAST, ∼4000 taxa were represented in the community transcriptome (Table 1). The distribution of hits among the taxonomic bins was log normal, with the top 200 bins accounting for 75% of all hits, followed by a long tail of bins with very few hits. Sequences recruiting to the Alphaproteobacteria, Gammaproteobacteria, Betaproteobacteria and Bacteroidetes dominated the transcriptomes (Table 1). The recently sequenced genome of Puniceispirillum marinum, the only SAR116 representative, recruited the most reads of any genome. Small, streamlined genomes, such as those related to Pelagibacter ubique, Betaproteobacterium KB13 and Flavobacteria MS024-2A and MS024-3C, were well represented, as were some medium-to-large genomes, particularly those related to the Roseobacter clade and the ‘oligotrophic marine gamma’ group. Hits to the Betaproteobacteria were predominantly to genomes of methylotrophic taxa. Archaeal genomes generally recruited few transcripts except for Nitrosopumilus maritimus, which was the fifth highest transcript-recruiting genome. Transcripts assigned to the Verrucomicrobiales, a relatively recently described phylum identified in both terrestrial and aquatic habitats, were also well represented. Overall, the reference bins were indicative of a highly diverse active coastal bacterioplankton community.

Table 1 Taxonomic binning of coastal metatranscriptomic reads based on the highest-scoring pair from the BLASTx search against RefSeq (after rRNA removal; see Materials and methods)

RP expression as an in situ growth indicator

Ribosomal proteins (RPs) are essential for protein synthesis, and bacterial genomes typically contain 50–60 RP genes (http://ribosome.med.miyazaki-u.ac.jp). Over 218 000 reads were annotated as RPs in the metatranscriptomes (Table 1), recruiting to 1903 different taxonomic bins. As up to 40% of a bacterial cell’s energy is allocated to protein synthesis, cells strictly regulate and coordinate synthesis of RPs to balance translation machinery with available resources (Wilson and Nierhaus, 2007; Maguire, 2009). Indeed, levels of RP transcripts have been shown previously to be well correlated with growth rates in yeast (Eisen et al., 1998), Bacteria (Wei et al., 2001) and Archaea (Hendrickson et al., 2008). These characteristics of RPs in model organisms suggested to us that their abundance could be leveraged as an index of in situ growth rates.

We used three strategies to evaluate whether the %RP values for bacterioplankton transcriptomes tracked with in situ growth rates. First, %RP data was checked for temperature-related seasonal shifts, as temperature has previously been shown to positively correlate with bulk community growth rates (del Giorgio and Cole, 1998 and references within); these changes were observed (Figure 1a), with %RP maxima occurring in the summer for the majority of taxa (56%), followed by spring, winter and fall (20, 18, and 7% of taxa, respectively) (Figure 1b). The higher frequency of maxima occurring in winter rather than fall is potentially driven by the enrichment in %RP values in the winter over all other seasons for several Bacteroidetes and Verrucomicrobia taxa. Second, %RP values were evaluated relative to bacterial community secondary production rates (based on 3H-leucine incorporation) that were measured concurrently with RNA sample collection; comparisons showed that relative abundance of RP transcripts followed the observed differences in bacterial production (summer: 2.8 × 10−6 g C l−1 h−1; spring: 1.8 × 10−6 g C l−1 h−1; fall: 0.4 × 10−6 g C l−1 h−1; and winter: 0.3 × 10−6 g C l−1 h−1). Finally, %RP patterns were compared against available literature values for taxon-specific in situ growth rates (Yokakawa et al., 2004; Malmstrom et al., 2005; Allers et al., 2007; Teira et al., 2009; Ferrerra et al., 2011), and results showed that rankings of taxa based on %RP tracked well with rankings based on measures of in situ growth rates (Figure 1d).

Figure 1
figure 1

Relative abundance of ribosomal protein (RP) reads in the top 200 reference genomes (eukaryotic hits excluded). (a) Rank order for the 200 reference genomes separated by season. (b) Temporal trends in %RP for the four seasonal samples with individual taxa separated. Gamma=Gammaproteobacteria, Bacter=Bacteroidetes, Beta=Betaproteobacteria, Roseo=Roseobacter, Verru=Verrucomicrobia, Misc=Miscellaneous. (c) Same as (a), except the seasonal samples are combined for each genome bin. See Table 2 for full names and taxonomic affiliations. (d) Group-specific growth rate data from Yokakawa et al., 2004; Malmstrom et al., 2005; Allers et al., 2007; Teira et al., 2009; Ferrerra et al., 2011.

Among the 200 top-recruiting reference bins, the percent of sequences identified as ribosomal proteins (%RP hits) within a genome ranged from 0.05% (Candidatus Pelagibacter ubique HTCC1002) to 20.5% (Chryseobacterium gleum ATCC 35910), and showed distinct phylogenetic patterns (Figure 1c). Gammaproteobacteria transcriptomes were clearly enriched in RP genes (mean 8.8%), and represented many of the highest %RP bins, including the polysaccharide degraders Teredinibacter turnerae T7901 (16.2%; the reference bacterium was isolated from a wood-boring bivalve), Saccharophagus degradans 2–40 (14.3%), Marinomonas sp. MWYL1 (13.9%), and Cellvibriojaponicus sp. Ueda107 (12.9%). Bacteroidetes transcriptomes ranged from 2.6 to 20.5%RP, and included the two bins with the highest %RP, C. gleum (the top symbol in Figure 1c) and Capnocytophaga gingivalis ATCC 33624 (18.8%). SAR116 Puniceispirillum marinum IMCC1322 had a mid-range %RP (6.9%). Reference genomes for the Roseobacters were dispersed throughout the %RP distribution, from 1.8% (Roseobacter litoralis Och 149) to 9.2% (Citreicella sp. SE45). Finally, despite their dominance in the total transcript pool (Table 1), members of the SAR11 clade had the lowest %RP of all the groups examined, ranging from 0.05 to 1.7%RP.

Transcriptome characteristics

We conducted detailed analyses on 16 reference genome bins with high coverage in the metatranscriptomes and spanning the range of %RP (Table 2). The 16 genomes accounted for 29% of combined library reads, with the number of reads recruited per genome ranging from 11 000 (Synechococcus sp. WH8109) to >259 000 (P. marinum IMCC1322; Table 1). While most genomes approached transcriptome saturation (Supplementary Figure S1), the percentage of genes hit in the reference genomes varied widely, from 28% in P. parvula Ellin514 to over 91% in SAR11 isolate HTCC7211 (Table 2). The average amino-acid sequence identity between the transcripts and the genes they were assigned to in the reference genomes ranged from 74% for P. parvula Ellin514 to 91% for Synechococcus sp. WH8109 (Table 1).

Table 2 Summary statistics for reads binning to the select 16 genomes

Functionally redundant genes among the 16 genomes were identified using reciprocal BLAST analysis (Figure 2 and Supplementary Figure S2). The overall commonness of genes present in a genome (that is, how often they were also found in other genomes) was inversely correlated to genome size (R2=0.63, P<0.001), with the highest overlap in gene content occurring in the small, streamlined SAR11 and Betaproteobacteria genomes, and the lowest in the large genomes of Pedosphaera parvula Ellin514 and alpha proteobacterium BAL199 (Table 2). Genome regions with low or non-detectable expression often had few orthologs in other genomes (Figure 2 and Supplementary Figure S2) and many were flanked on one side by phage elements, suggesting these are genomic islands (Coleman et al., 2006) not present in the sampled coastal populations. These low-recruiting regions were enriched in the SAR116 and Roseobacter reference genomes and depleted in the streamlined SAR11, KB13 and Flavobacteria genomes (Table 2). In comparisons limited to reference genomes, these islands could be mistakenly identified as regions of unique, niche-defining genes even if they are neither present nor expressed in the natural populations. However, the combination of expression data with ortholog analysis identifies actively expressed functional content representative of the sampled population.

Figure 2
figure 2

Transcriptome of SAR116 clade member Puniceispirillum marinum IMCC1322. The outer ring shows the 2543 genes in the IMCC1322 genome colored according to the number of RefSeq hits in the combined metatranscriptome. The inner rings denote the presence of orthologs to an IMC1322 gene in the other 15 genomes. Ring axis labels correspond to gene order as designated by NCBI; for SAR116 this corresponds with the order of locus tag numbers.

Relationship between expression level and gene prevalence

We found a statistically significant positive relationship between the expression level of a gene (the number of reads recruited to a gene in a reference bin) and how commonly it was harbored in the other taxa (the number of genomes with an ortholog for that gene) (Wilcoxon rank-sum test, P<0.05; Figure 3, Supplementary Figure S3). Thus, most highly expressed genes were shared by multiple taxa, although the SAR11 members HTCC1002, HTCC1062 and HIMB114 as well as MS024-2A and N. maritimus SCM1 did not have as strong a pattern as the others (Supplementary Figure S3). Previous observations from marine metatranscriptomic data (Hewson et al., 2009; Stewart et al., 2011) support this conclusion, and suggest that an analysis restricted to only the more highly expressed genes is likely to miss unique functional capabilities that distinguish taxa.

Figure 3
figure 3

Expression level (hit count) as a function of ortholog number (representation in the other 15 genomes). Error bars denote 1.5 times the interquartile range. The magnitude of the y-axis varies for each genome and is not shown for clarity.

Bacterioplankton niches

We therefore took a broader approach to identifying niche-defining features for each taxon that involved characterization of three classes of transcripts. (1) Highly expressed genes: the 10 most highly expressed genes in each genome bin, representing the processes garnering the most transcriptional effort by that taxon (Table 3). (2) Ecological benchmark genes: selected biogeochemically relevant genes representing traits such as nutrient acquisition, substrate transport, energy acquisition and motility (Table 4). (3) Indicator genes: the genes whose expression was higher than expected based on commonness in the other genomes (>1.5 interquartile range of its ortholog group (Figure 3) and for which the majority of expression (>50%) was found in that taxon (Table 5); see Materials and methods).

Table 3 Top 10 highest expressed genes for the select 16 reference genomes
Table 4 Transcriptional effort devoted to key ecological and niche-defining functions
Table 5 Transcriptional effort devoted to select indicator genes of the 16 reference genomes, expressed as percent of a bin’s transcriptome

SAR116

The P. marinum highly expressed genes encoded a Na+/solute symporter, and ABC sugar and TRAP dicarboxylate transporters, along with energy transduction and transcription/translation machinery (Table 3). The highly expressed V-type H+-translocating pyrophosphatase establishes a proton gradient across the membrane and has been hypothesized to have a role in scavenging metabolic energy in substrate-limited cells (García-Contreras et al., 2004; Rinta-Kanto et al., 2012).

Expression of ecological benchmark genes showed that the SAR116 populations in this coastal ocean are likely motile and supplement heterotrophic growth with light-driven phototrophy (Table 4) via proteorhodopsin (Béjà et al., 2000). Transcripts indicate cells were actively transporting and metabolizing most inorganic nutrients, but the transcriptome was particularly enriched in sequences for processing polyphosphate and nitrate (Table 4). SAR116 populations also had sequences for amino acid and five-carbon carbohydrate uptake overrepresented compared with other bacterioplankton taxa (Table 4).

SAR116 indicator genes included those for the uptake and oxidation of methanesulfonate (Table 5), a compound generated from the oxidation of dimethyl sulfoxide (Kelly and Murrell, 1999). These included methanesulfonate monooxygenase (MSO) subunits, and a nitrate/sulfonate/bicarbonate permease that neighbors the SAR116 MSO genes and is homologous to a methanesulfonate transporter from the soil Alphaproteobacteria Methylosulfonomonas methylovora (Jamshed et al., 2006). Indeed, this entire SAR116 genome region (SAR116_2098-2109) has high homology and synteny with the M. methylovora methanesulfonate utilization operon. Two other SAR116 indicator genes fell in the degradation pathway for the aromatic compound protocatechuate (Table 5).

Roseobacters

The two selected roseobacter genome bins showed clear differences in their highly expressed genes (Table 3). For Roseobacter sp. AzwK-3b, these genes encoded aerobic anoxygenic photosynthesis (AAnP). For non-phototrophic Citreicella sp. SE45, which had the highest Roseobacter %RP, these genes encoded growth-related processes (transcription, translation and energy transduction). Formate dehydrogenase was expressed in both genomes, and made up >11% of all Citreicella sp. SE45 hits (Table 3).

The ecological gene analysis revealed that both roseobacter taxa were assimilating a variety of inorganic nutrients and obtaining energy via the sox-based sulfur oxidation pathway (Table 4). Cells were expressing genes for the uptake of organic nitrogen compounds, such as amino acids and polyamines (Table 4), and there was an indicator gene for urea metabolism in SE45 (Table 5). Interestingly, while both reference genomes harbor genes for flagellar synthesis, expression of these genes was hardly detectable in the sampled populations (Table 4).

SAR11

For the four SAR11 genome bins, expression of a Na+/solute symporter and proteorhodopsin accounted for 16–41% of transcripts (Table 3). The high expression of these two genes is likely linked, as establishing a sodium gradient across the cell membrane is important in proteorhodopsin-mediated growth stimulation in the flavobacterium Dokdonia MED134 (Kimura et al., 2011), and this Na+ gradient could subsequently drive substrate uptake via the highly expressed symporter. While the substrate of the SAR11 Na+/solute symporter has not been experimentally identified, we hypothesize that it transports acetate based on sequence similarity to an acetate permease in Escherichia coli (HMPREF9346_04543) and proximity in all four SAR11 reference genomes to an expressed acetyl-CoA synthetase (see Supplementary Information). Other highly expressed SAR11 genes included V-type H+-translocating pyrophosphatases and taurine transporter protein.

The ecological benchmark gene set showed that ammonia transporters were among the most highly expressed genes (Table 3). Indeed, SAR11s were second only to the ammonia-oxidizing N. maritimus populations in the percent transcriptional effort devoted to ammonia processing (Table 4). Phosphate transporters were also enriched in several SAR11 bins, and HIMB114 devoted more transcriptional effort to phosphonate acquisition than any other taxon (Table 5). The SAR11 transcriptomes were enriched in transporters for amino acids, carboxylic acids and nucleotides (Table 4).

Indicator genes in HTCC7211 included two subunits of adenylyl-sulfate reductase (APS reductase; Table 5). These genes have close homologs in HTCC1002 and HTCC1062, which were also well expressed, and cluster in the Apr lineage I group of sulfur-oxidizing bacteria that are hypothesized to use these genes in the reverse direction to oxidize sulfur (Meyer and Kuever, 2007). The HIMB114 transcriptome was significantly enriched in carbon monoxide oxidation sequences (Table 4), though these genes are of the type II form whose function in CO oxidation has been recently questioned (Cunliffe, 2011). HIMB114’s indicator genes included a dehydrogenase for tartrate (HIMB114_0953; Malik and Viola, 2010), a compound shown to be secreted by marine algae (Marsh et al., 1992).

Three of the four SAR11 genomes were significantly enriched in proline/glycine betaine transporters (Table 4), and HTCC7211 had indicator genes for both transporting and degrading glycine betaine (Table 5). Tripp et al. (2008, 2009) have shown that the growth of SAR11 HTCC1062 in culture is significantly improved by glycine betaine. The HTCC7211 bin also had two indicator genes for uptake of ectoine/hydroxyectoine, which serve as compatible solutes during osmotic stress (Mulligan et al., 2011) as well as a carbon and nitrogen source (Lecher et al., 2009). Overall, the SAR11 bins indicate substantial transcriptional investment in the uptake and metabolism of compatible solutes.

Gammaproteobacteria

The three diverse Gammaproteobacteria lineages all showed high expression of TonB-dependent transporters associated with iron and vitamin uptake and, more recently, a range of substrates that includes metals, sugars and oligosaccharides (Schauer et al., 2008) and are potentially involved in motility (Cursino et al., 2009) (Tables 3, 4 and 5). All three bins also had abundant transcripts for light-driven energy generation via AAnP genes (HTCC2080 and NOR51-B) or proteorhodopsin (HTCC2207; Table 4). Members of the HTCC2080 and HTCC2207 populations were motile at the time of sampling, and HTCC2207 devoted more transcriptional effort to motility than any other taxonomic bin (Table 4).

Gammaproteobacteria transcript pools were variously enriched in ecological benchmark genes for acquisition of phosphorus (Table 4; inorganic phosphate transporters, alkaline phosphatases and polyphosphate metabolism), but depleted in ammonia transporter expression relative to the Alphaproteobacteria groups. The Gammaproteobacteria bins were enriched in Na+/H+ antiporter transcripts (Table 3), which work to maintain the membrane Na+ gradient and, as in the SAR11s (see above) and Bacteroidetes (see below), may be coupled with light-driven proton pumping.

The indicator gene analysis for Gammaproteobacteria was biased toward genes for fatty acid metabolism. The HTCC2080 bin contained nine indicator genes for fatty acid metabolism, including four acyl-CoA dehydrogenases and a 3-ketoacyl-CoA thiolase involved in fatty acid β-oxidation (Table 5). Furthermore, HTCC2080 populations were expressing 153 genes involved in lipid transport and metabolism, three times the average for these functional gene categories in the other 15 genomes (Supplementary Table S5).

Twenty four of the Gammaproteobacterium HTCC2207 indicator genes were for motility, including those for flagellar assembly and chemotaxis (Table 5). Four HTCC2207 indicator genes contained cadherin domains involved in complex carbohydrate degradation via cell aggregation and direct binding to cellulose, xylan and related compounds (Fraiberg et al. 2010, 2011). There were six indicator genes for breaking glycosydic bonds, including four annotated as general glycosyl hydrolases, possibly targeting the β1–4 linkages found in cellulose, and two genes annotated as β1–3 glucanase and laminarninase, possibly targeting the β1–3 linkages of laminarin (a storage glucan found in brown algae) or chrysolaminarin (a storage glucan of diatoms), suggesting the HTCC2207 populations were binding to and degrading carbohydrate-rich particulate material.

Flavobacterium MS024-2A

Highly expressed genes in the MS024-2A bin included TonB-dependent transporters, proteorhodopsin, V-type H+-translocating pyrophosphatase, translation elongation factor Tu and a histone family DNA-binding protein (Table 3). The ecological benchmark gene set for the MSO24-2A populations was unique in containing genes for the oxidation of hydrogen (Table 4) which, along with the expressed proteorhodopsin, can provide supplemental energy to bacteria (Woyke et al., 2009).The MS024-2A populations were expressing genes for the transport of phosphorus and sulfur compounds, and were particularly enriched in transcripts for alkaline phosphatase, polyphosphate kinase (ppk) and polysulfide reductase (nrfD) (Table 4). Interestingly, there were relatively few N-related transcripts. MS024-2A populations were using a gliding motility system (gldJMO) likely involved in translocation across a solid surface and currently thought to be unique to the Flavobacteria (Table 4).

Similar to the Gammaproteobacteria, MS024-2A populations were also enriched in antiporters (two Na+/H+ exchangers, Table 4). Indicator genes included two Na+/solute symporters and a subunit of NADH:ubiquinone oxidoreductase (Table 5). The latter gene was experimentally shown to be a component of flavobacterium MED134’s light-driven growth (Kimura et al., 2011), and further supports a transcriptional link between proteorhodopsin-based proton pumping and sodium-driven transport. Flavobacteria MS024-2A was similar to Gammaproteobacteria HTCC2207 in that the indicator genes included those for the attachment (cadherins) and breakdown (2 glycosyl hydrolases) of complex carbohydrates, and also included a carbohydrate synthesis gene (a glycogen synthase) (Table 5).

Betaproteobacteria

The genome bin for Betaproteobacterium strain KB13 portrayed a highly specialized chemoautotroph population. The most highly expressed gene was methanol dehydrogenase, accounting for 39% of all KB13 transcripts (Tables 3 and 5), in good agreement with a proteomic study of coastal bacterioplankton (Sowell et al. 2011). Methylotrophy was likely supplemented by light-driven proton pumping, as suggested by the high expression of a xanthorhodopsin (which has not been experimentally differentiated from proteorhodopsin in KB13). Other highly expressed genes included V-type H+-translocating pyrophosphatase (Rinta-Kanto et al., 2012), a glucose/sorbosone dehydrogenase and a cytochrome c oxidase. An indicator gene for bacterioferritin suggests KB13 was storing iron (Table 5).

Archaea

Expression patterns in the Thaumarchaeota Nitrosopumilus maritimus bin was also indicative of an autotrophic specialist, with the most highly expressed genes including two subunits of ammonia monoxygenase and two ammonia transporters (Table 3). This genome bin contained the second highest proportion of transcripts devoted to phosphonate transport (Table 4; Urakawa et al., 2011), as well as two nirK-like genes (though the exact function of the latter is uncertain, see Hollibaugh et al., 2011; Table 5).

Synechococcus

The only photosynthetic organism in the 16 genomes was Synechococcus sp. WH8109 and, correspondingly, transcripts for light harvesting (photosystem core proteins; psaAB) and carbon fixation (RuBisCO) were found in both the highly expressed and indicator genes for this bin (Tables 3 and 5). Other highly expressed genes included three subunits of RNA polymerase (β, β′, γ), glutamate synthase and NADH dehydrogenase (ubiquinone). The ecological benchmark gene set revealed that WH8109 populations were expressing genes for nitrite assimilation (Table 4), and had the second highest relative expression of urease (Table 5).

Several Synechococcus indicator genes are involved in sulfur assimilation (Table 5), including the diagnostic gene for sulfolipid synthesis (UDP-sulfoquinovose synthase, sqdB). Sulfolipid-enriched membranes in open ocean Cyanobacteria have been hypothesized to decrease their phosphorus cell quota in P-limited environments (Van Mooy et al., 2006, 2009), although the phosphorus concentrations in this coastal site were high for the four sampling dates (mean 0.72±0.19 μM), possibly suggesting sulfolipid synthesis is a widespread strategy.

Verrucomicrobia

Unlike all the other heterotrophs, the highly expressed genes in Pedosphaera parvula Ellin514 did not include any for alternative energy acquisition through phototrophy or chemolithotrophy. The most highly expressed genes included DNA-directed RNA polymerase (α, β and β′ subunits), translation elongation factors G and Tu, and chaperonin GroEL, which was similar to other high %RP taxa (SE45, IMCC1322, NOR51; Figure 1) for which many of the most highly expressed genes were related to protein synthesis (Table 3).

Few N or P transporter transcripts were caught in our ecological benchmark gene analysis, but interestingly, Ellin514 was enriched in genes for both polyphosphate (polyP kinase) and polysulphide (polyS reductase NrfD) metabolism.

P. parvula Ellin514 populations were highly enriched in transcripts for secretion systems (Table 4), with indicator genes including type II and III secretion system proteins and the type IV pilus assembly proteins used for gliding motility (Table 5). Three indicator genes were ABC-2 transporters, which transport polysaccharides to the outside of the cell. An indicator gene for capsular exopolysaccharide synthesis corroborates exopolymer formation activity. While there have been few phenotypic studies of Verrucomicrobia, particularly in aquatic environments, characterization of Lentisphaera araneosa from the sister phylum Lentisphaerae showed it too was an abundant producer of exopolysaccharides (Cho et al., 2004; Thrash et al., 2010). Indicator genes also encoded cell wall polymer degradation genes as well as homologs to myrosinases (Table 5), genes that cleave glucose from glucosinolate plant secondary metabolites. Together, the P. parvula indicator genes are suggestive of surface attachment and/or biofilm formation on cells or particles, possibly for a pathogenic or mutualistic lifestyle.

Synthesis

New aspects of niche differentiation based on preferential expression of genes for the transport and assimilation of organic compounds emerged from this analysis, including acetate in the SAR11s, fatty acids in Gammaproteobacteria HTCC2080 and NOR51, aromatic compounds in SAR116 and complex carbohydrates in Gammaproteobacteria HTCC2207 and Bacteroidetes MSO24-2A. Compounds not previously considered important substrates for heterotrophic bacteria but predicted here to be transported or metabolized in coastal seawater included tartrate (SAR11 HIMB114), taurine (several Alphaproteobacteria taxa), methanesulfonate (SAR116) and ectoine (SAR11 HTCC7211).

Although all selected taxa except Synechococcus WH8109 are heterotrophs, there was significant transcriptional effort devoted to obtaining energy by non-heterotrophic means. Expression of genes for phototrophy via AAnP and proteorhodopsin accounted for up to a quarter of some transcript bins (mean 6.5%; Table 3). A strong transcriptional pattern linked Na+-driven transport with proteorhodopsin for multiple taxa. Chemolithotrophic energy acquisition through the sox system, APS reductase system and hydrogen oxidation accounted for up to 7% of bins (mean 0.5%; Table 3).

Variations in motility, adhesion and secretion systems suggested differentiation with regard to the extent of interactions with living cells or detrital particles, with the Verrucomicrobia transcriptomes, in particular, showing expression of genes associated with cell–cell interactions. Synechococcus WH8109 expressed genes to incorporate sulfur into membrane lipids despite the non-limiting concentrations of phosphorus. Betaproteobacterium KB13 and its relatives specialized in methanol-based C1 metabolism. Thus, the diverse assemblage of bacterioplankton is maintained to some degree by the presence of unrecognized niche dimensions involving differential gene content, differential gene regulation and transcriptional linkages between genes.

Marine bacterioplankton taxa are frequently partitioned into two super-niches believed to represent divergent adaptive strategies (Moran et al., 2004; Polz et al., 2006; Lauro et al., 2009; Yooseph et al., 2010); the first consists of cells that live singly in nutrient-poor seawater, while the second consists of cells that inhabit nutrient-rich particles, patches and eukaryotic cells suspended in the seawater matrix. Expression data from this study support the super-niche paradigms, and suggest a link between activity levels, substrate utilization and transcriptome diversity (as measured by Pielou’s (1966) eveness index). Slowly growing (low %RP) taxa, such as SAR11 populations, N. maritimus, and betaproteobacterium KB13, exhibited low diversity transcriptomes, with the majority of transcriptional effort placed in a few key processes, such as light-mediated transport, ammonia oxidation and transport, or C1 metabolism, and relatively little investment in sensing and responding to the environment (Figure 4). These organisms may have evolved to focus on only a few metabolisms independent of environmental conditions, a strategy that potentially misses ephemeral substrates but maintains consistent growth. More rapidly growing (high %RP) taxa, such as Verrucomicrobium P. parvula Ellin514, SAR116 P. marinum, Roseobacter Citreicella sp. SE45 and Gammaproteobacteria NOR51 and HTCC2080 exhibited high diversity transcriptomes, with transcriptional effort spread across a variety of process such as motility, chemotaxis, defense systems and signal transduction (Figure 4). Relevant to the paradox of the plankton, some taxa showed intermediate characteristics or possessed traits from both canonical super-niches (for example, Roseobacter Azwk-3B, alpha proteobacterium BAL199 and flavobacterium MS024-2A), suggesting a continuum of ecological strategies between these two extremes (Figures 1 and 4).

Figure 4
figure 4

Potential growth rate and transcriptome composition for the 16 selected genome bins. (a) %RP. (b) Assignment to COG metabolic categories (excluding ‘C’ (energy production and conservation) and ‘Q’ (secondary metabolites)), representing metabolic diversity. (c) Assignment to COG cellular processes and signaling categories, ‘V’ (defense mechanisms),’T’ (signal transduction mechanisms) and ‘N’ (cell motility), representing ability to sense and respond to the environment. (d) Rank abundance plots for expressed genes (RPs excluded). (e) Transcriptome evenness (Pielou, 1966) (RPs excluded). (f) Total reads binning to the 16 selected genomes.

Several rapidly growing taxa based on the %RP index had consistently low representation in the transcriptome relative to the other select genome bins (Figure 4f), indicating that the fastest growing bacterial groups were not necessarily the most abundant (for example, NOR51-B, Citreicella sp. SE45, P. parvula Ellin514) and that the most abundant groups were not the most active (for example, SAR11 clade members). This is consistent with recent data on estuarine bacterioplankton 16S rRNA:DNA ratios (Campbell et al., 2011) and dilution experiments (Ferrera et al., 2011). The assumption that bacteria capable of rapid growth show this ability only intermittently under conditions that are conducive for blooming (Yooseph et al., 2010) may therefore underestimate the trophic and biogeochemical influence of these fast-growing taxa. Instead, numbers may be kept in check by top-down control within the microbial food web (Worden et al., 2006; Ferrera et al., 2011). Selective bacterial mortality mediated through protist grazing or viral lysis (Suttle, 2007) would lessen competitive exclusion between co-occurring heterotrophs.

These examinations of niche-defining gene expression patterns among major bacterioplankton lineages are based on four samples from southeastern US coastal waters, providing a time-averaged view of microbial activities that is not biased by one particular set of environmental conditions. However, seasonal variations in expression may offer a different, more dynamic view of a taxon’s niche. Interestingly, though we detected taxon-specific seasonal shifts in %RP (Figure 1b), the most highly expressed genes were generally consistent across seasons for an individual taxon (Supplementary Table S6), suggesting that many of these genes are unlikely to resolve seasonal niche characteristics. Future work combining the indicator gene analysis with more highly resolved temporal information will provide better information on seasonal dynamics of niche differentiation in this coastal ocean.

While Hutchinson’s paradox of the plankton (1961) was originally based on observations of functional overlap among tens of coexisting phytoplankton species, contemporary 16S rRNA gene surveys and metagenomic analyses suggest bacterioplankton assemblages contain orders of magnitude more coexisting taxa with many overlapping abilities (Moran, 2008). One resolution to the apparent paradox is that there is less functional overlap and more niche diversity in bacterioplankton communities than we have had the technical ability to observe. The increasing availability of reference genomes has enabled bacterial niche characterization based on unique genes, but such analyses define niches based on potential metabolic activities of reference organisms rather than on actual metabolic activities of natural populations. The 11 million transcripts sequenced in this study provide the most highly resolved catalog of realized niche dimensions yet available for hundreds of bacterioplankton taxa.