Introduction

Metaproteomics is a new approach that enables the direct observation of proteins expressed by mixed microbial assemblages. Mass spectrometry coupled with an ever increasing bank of genomic and metagenomic sequences can be used effectively for protein identification, thus providing insight into microbial community functions. To date, proteomic analyses have been used to identify proteorhodopsin (Giovannoni et al., 2005a), an oxidoreductase, a predicted aminopeptidase and a conserved hypothetical protein (Kan et al., 2005) in marine environments, but a comprehensive analysis of a marine metaproteome has yet to be performed. Comprehensive metaproteomic analyses have been applied to other microbial communities including the natural microbial biofilm of an acid mine drainage site (Ram et al., 2005; Lo et al., 2007), soil (Schulze et al., 2005), activated sludge (Ehlers and Cloette, 1999; Wilmes and Bond, 2004) and waste water (Lacerda et al., 2007).

The Sargasso Sea is an oligotrophic subtropical gyre where key nutrients, particularly nitrogen and phosphorus, are often depleted. A shortage of nutrients can limit microbial productivity (Cotner et al., 1997; Rivkin and Anderson, 1997) and affect community composition (Lomas et al., 2004). Levels of dissolved inorganic phosphorus and dissolved nitrate+nitrite in the Sargasso Sea are often below concentrations of 5 and 10 nM, respectively (Wu et al., 2000; Lipschultz, 2001; Steinberg et al., 2001). At the Bermuda Atlantic Time-series Study site, prokaryotes have a major function in nutrient cycling and can reach densities of 5–10 × 105 cells per ml in the photic zone during summer (Carlson et al., 1996). For an organism to survive in such oligotrophic conditions, it has been suggested that efficient competition for nutrients that are at low levels in the environment would require the expression of high affinity, multisubstrate transporters (Hirsch et al., 1979).

The SAR11 clade is a group of alpha-proteobacteria that is abundant throughout the oceans and always present in the Sargasso Sea. SAR11 bacteria are comprised of several ecotypes that vary temporally and over depth (Morris et al., 2005; Carlson et al., unpublished), and reach their greatest abundance in surface waters during the summer where 35% of all prokaryotic cells and 18% of the biomass are SAR11 cells (Morris et al., 2002). This clade is highly active, accounting for 50% of amino-acid assimilation and 30% of dimethylsulfoniopropionate assimilation among microbes in one tracer study (Malmstrom et al., 2004), and is thought to have a significant function in nutrient cycling in the oceans.

Marine Prochlorococcus and Synechococcus are the dominant oxygenic phototrophs in subtropical oceans, accounting for up to 15% and 5% of phytoplankton cell counts, respectively (Sieracki et al., 1995; DuRand et al., 2001), and the majority of oceanic primary production. In the Sargasso Sea, Prochlorococcus cell numbers peak during the summer at depths between 80 and 200 m, whereas Synechococcus cells reach their maximum density during spring blooms in surface waters. Synechococcus cells are slightly larger (0.62–1.56 μm3 compared to 0.077–0.22 μm3 for Prochlorococcus (Heldal et al., 2003)), and thus account for a higher proportion of primary producer biomass than cell numbers would suggest.

The goal of this study was to identify proteins expressed by SAR11 in the Sargasso Sea during a season when nutrients were highly depleted. We employed the method of Wilhelm et al. (2007) to build a database of SAR11 environmental protein-coding sequences (eCDSs) from the Sargasso Sea, using the genome of a Sargasso Sea SAR11 isolate, HTCC7211 (Stingl et al., unpublished), as a query. Peptides detected in microbial samples collected from the Sargasso Sea were then mapped to the SAR11 environmental sequences. To test the specificity of our approach, we repeated the analysis using the high-light adapted Prochlorococcus marinus str. MIT 9312 and marine Synechococcus sp. strain CC9605.

Materials and methods

Database construction

The SAR11 eCDS database used in this study was constructed by binning SAR11 genomic DNA fragments from the Sargasso Sea (Venter et al., 2004) with the reciprocal best-hit procedure that is fully described by Wilhelm et al. (2007). eCDSs that were similar to the protein-coding sequences of a query genome were collected using TBLASTN with an expect score cutoff of 1 × 10−20 and complexity filtering turned off. Each hit returned from this search was then translated and queried using BLASTP with default parameters against the bacterial portion of the National Center for Biotechnology Information nonredundant proteins (NCBI-nr) database. If the best hit in the second search was the original query gene, that is, if the eCDS had higher sequence similarity to the query gene than any other known gene, it was scored as a best hit. The entire nonredundant collection of such sequences obtained using the genomes of SAR11 strains HTCC7211 and HTCC1062 as queries (Stingl et al., unpublished; Giovannoni et al., 2005b) were included in this study, as well as the sequences from the query genomes. The total number of sequences in this database is 257836.

The Prochlorococcus environmental sequence (Prochlorococcus eCDS) and Synechococcus environmental sequence (Synechococcus eCDS) databases were created in an identical manner using Prochlorococcus marinus str. MIT 9312 and Synechococcus sp. strain CC9605 as the query genomes. There are 21477 sequences in the Prochlorococcus eCDS database and 23347 sequences in the Synechococcus eCDS database.

Sample collection

Microbial cells were collected from hydrostation S off the Bermuda coast (32° 10′ N, 64° 30′ W) on 19 and 20 September 2005. Cells were concentrated from 230 l (19 September) or 240 l (20 September) of surface water (5 m) by tangential flow filtration using tandem Millipore Pellicon systems with 30 kDa regenerated cellulose filters at a rate of ∼1 l−1 min−1 per system. A Beckman J2-21 centrifuge with a JA-20 rotor was used at 48 400 g for 1 h at 4°C to pellet the concentrated cells. Each of the resulting pellets was split into two equal samples (four samples total) and stored at −80°C until analysis.

Reagents

All reagents were obtained from Sigma Aldrich (St Louis, MO, USA) unless otherwise specified. Nanopure or Milli-Q quality water (∼18 MΩ cm or better), ammonium bicarbonate (NH4HCO3), bicinchoninic acid (Pierce, Rockford, IL, USA), urea, thiourea, tetrafluoroethylene, dithiothreitol, 3-((3-cholamidopropyl)dimethylammonio)-1-propanesulfonate (CHAPS) sequencing grade modified trypsin (Promega, Madison, WI, USA), HPLC grade methanol, trifluoroacetic acid, acetonitrile, ammonium formate, formic acid and ammonium hydroxide (NH4OH).

Sample preparation

Each of the four sample pellets was divided into two equal subsamples; one for a global preparation and one for a soluble/insoluble preparation. For two of the samples (1 and 2), bead beating in NH4HCO3 buffer was used to physically lyse the cells. The remaining two samples (3 and 4) were chemically lysed using tetrafluoroethylene.

Global and soluble/insoluble protein preparation with bead beating

Two sample pellets (1 and 2) were prepared as outlined by Adkins et al. (2006). Briefly, pellets were resuspended in 100 mM NH4HCO3 buffer and lysed by bead beating. Proteins from subsamples designated for global analysis were denatured, reduced with dithiothreitol and digested using trypsin. Subsamples for soluble/insoluble analysis were ultracentrifuged and the supernatant was treated as above, whereas the pellet was resuspended in denaturing buffer containing 1% CHAPS to solubilize hydrophobic proteins before enzymatic digestion.

Global and soluble/insoluble protein preparation with tetrafluoroethylene

Samples (3 and 4) were prepared as described in the previous section, except that lyses was achieved chemically by the addition of tetrafluoroethylene to the resuspended pellet at a concentration of 50% (v:v) followed by sonication for one minute in an ice bath and incubation at 60°C for 2 h with gentle shaking (300 r.p.m.).

Sample cleanup

Peptides were cleaned using either a 1 ml/100 mg C18 solid phase extraction (SPE) column (samples 1 and 2) or a strong cation exchange (SCX) SPE column (samples 3 and 4) following the manufacturer's (Supelco/Sigma Aldrich, St Louis, MO, USA) instructions. Peptides were eluted with 80:20 acetonitrile: H2O, 0.1% trifluoroacetic acid (C18 columns) or 80:15:5 methanol:H2O:NH4OH (SCX columns), and were concentrated in a speed-vac to a volume of 50–100 μl. Final peptide concentration was determined by bicinchoninic acid protein assay and samples were quick frozen in liquid nitrogen and stored at −80 °C until needed for analysis.

SCX sample fractionation

Strong cation exchange fractionation following the methods outlined by Adkins et al. (2002, 2006) was performed only on sample 2 using a Polysulfoethyl 2.1 × 35 mm, 3 μm particle size column (PolyLC, Columbia, MD, USA) on an Agilent 1100 series high performance liquid chromatography system. Mobile phases consisted of 10 mM ammonium formate (pH 3.0)/25% acetonitrile (A) and 500 mM ammonium formate (pH 6.8)/25% acetonitrile (B) at a flow rate of 200 μl min−1. Peptides were separated using a gradient from 0% to 50% B over 40 min, followed by a gradient of 50%–100% B over 10 min. The mobile phase was then held at 100% B for 10 min. A total of 23 fractions were collected and eight were chosen from the SCX chromatogram for analysis. Each fraction was dried under vacuum and redissolved in 30 μl of 25 mM NH4HCO3, and 10 μl of each fraction were analyzed by capillary LC-MS/MS.

Sample loading and capillary LC separation

The online capillary liquid chromatography (LC) setup with an ion trap mass spectrometry (MS) has been described elsewhere (Sowell et al., 2008). Briefly, peptide samples were loaded onto a 60 cm hand-packed C18 column before electrospray into a Thermo Fisher Scientific LTQ linear ion trap mass spectrometer set to perform tandem MS (MS/MS) on the top 10 ions using data-dependent settings and a dynamic exclusion window of 1 min. The aqueous to organic (mobile phase A: 0.2% acetic acid and 0.05% trifluoroacetic acid in water, and mobile phase B: 0.1% trifluoroacetic acid in 90% acetonitrile/10% water) gradient flowed exponentially over 100 min.

Peptide identifications

A total of 247 LC-MS/MS analyses were performed to produce MS/MS spectra. Each data set was searched, using the SEQUEST algorithm (Eng et al., 1994), against the protein databases mentioned above for peptide/protein identification. A standard parameter file with no modifications to amino-acid residues and a mass error window of 3 m/z units for precursor mass and 0 m/z units for fragmentation mass was used. The searches were allowed for all possible peptide termini, that is, not limited to tryptic only termini. Peptide identifications were considered acceptable if they passed the thresholds determined by Washburn et al. (2001) and the additional filter of having a Peptide Prophet score of at least 0.5 (Keller et al., 2002).

Phylogenetic tree construction

A neighbor-joining phylogenetic tree of nearly full length SAR11 PstS eCDS amino-acid sequences was generated in the sequence environment ARB (Ludwig et al., 2004) using the Dayhoff PAM model (Dayhoff, 1978). The Synechococcus sequence of PstS (gi∣15618) was used as the outgroup. Short sequences were inserted using the ARB parsimony insertion tool (Ludwig et al., 1998).

Results

A total of 236 SAR11 proteins were identified from 2215 unique peptides that collectively matched 13040 spectra (Supplementary Table S1). The criterion for protein identification was that at least one of its peptides be identified in three or more spectra or that each of at least two of its peptides be identified in one or more spectra. These peptide sequences typically matched multiple SAR11 eCDSs and therefore were mapped to 9380 SAR11 eCDSs from the Sargasso Sea.

The most frequently detected proteins included periplasmic transport proteins for substrates such as phosphate, amino acids, phosphonate and spermidine/putrescine (Table 1). The most frequently detected proteins with functions other than nutrient transport included glutamine synthetase, the chaperones DnaK and GroEL and F0F1-ATP synthase subunits (Table 2).

Table 1 SAR11 proteins involved in nutrient uptake detected by capillary LC MS/MS in Sargasso Sea surface water in September 2005
Table 2 Proportion of transporters detected in a variety of Gram-negative bacteria using techniques similar to those employed in this study

Using the Prochlorococcus eCDS database, 1911 unique peptides from 6270 spectra that mapped to 402 proteins and 1681 eCDSs were detected (Supplementary Table S2). For Synechococcus, 2407 unique peptides from 12092 spectra mapping to 404 proteins and 2632 eCDSs were identified. (Supplementary Table S3). Frequently detected Prochlorococcus and Synechococcus proteins were similar to those of SAR11 including transport proteins (substrate specificity for urea and phosphate), glutamine synthetase, and chaperones (DnaK, Cpn60 and GroEL). Proteins involved in CO2 fixation and photosynthesis were also prevalent.

To determine the specificity of our results, peptide hits to the SAR11, Prochlorococcus and Synechococcus eCDS databases were compared to each other and to the remainder of the Sargasso Sea database (Sargasso Sea eCDSs minus SAR11, Prochlorococcus and Synechococcus eCDSs) using BLASTP. Surprisingly, only 177 spectra corresponding to 24 peptides from 11 proteins were common between the SAR11 and the cyanobacterial eCDS databases (Figure 1a). As would be expected from their more recent evolutionary divergence, there were more peptides (1062) in common between the Prochlororoccus and Synechococcus eCDS databases. Slightly more than half of the peptides (1226 peptides (55%)) from the SAR11 data set matched eCDSs from the remainder of the Sargasso Sea metagenomic data (Figure 1b). In the cases of Prochlorococcus and Synechococcus, only 41% (784 peptides) and 36% (860 peptides) of the peptides, respectively, matched eCDSs from the remainder of the Sargasso Sea database. Even after the removal of all peptides that were not unique to each data set, 84% of the detected SAR11 proteins (49% of eCDSs), 71% of the detected Prochlorococcus proteins (83% of eCDSs) and 76% of the detected Synechococcus proteins (36% of eCDSs) were hit by at least one unique peptide. The distribution of the peptides between the databases can be found in Figure 1 and Supplementary Tables S1, S2 and S3.

Figure 1
figure 1

A large proportion of the detected peptides were unique to either SAR11, Prochlorococcus or Synechococcus (a). SAR11, Prochlorococcus and Synechococcus peptides detected in a Sargasso Sea protein sample and their distribution between the SAR11 eCDS, the Prochlorococcus eCDS and the Synechococcus eCDS databases. (b) Total number of peptides detected from each database and the fraction (red) that was also seen in the remainder of the Sargasso Sea eCDS database (Sargasso Sea eCDS database−(SAR11 eCDS database+Prochlorococcus eCDS database+Synechococcus eCDS database)) determined using BLASTP with a requirement of 100% identity over the length of the query peptide. eCDS,environmental protein-coding sequence.

Discussion

The prevalence of periplasmic substrate-binding proteins

Our most striking finding is the very high proportion of spectra matching SAR11 transport proteins (Table 3). In particular, spectra for periplasmic substrate-binding subunits of ABC transport systems were frequently observed (Table 1). Spectra corresponding to peptides that matched SAR11 eCDSs associated with transport functions accounted for 67% of the total spectra for SAR11. Using similar methods with cultured SAR11 cells, 28%–35% of all spectra matched transport proteins (Sowell et al., 2008), whereas when the same approach was applied to other Gram-negative bacteria, only 4%–11% of spectra matched to transport proteins (Adkins et al., 2006; Callister et al., 2006; Ding et al., 2006; Hixson et al., 2006; Elias et al., 2008).

Table 3 Detected Prochlorococcus and Synechococcus proteins involved in photosynthesis, CO2 fixation,and nitrogen utilization

The prevalence of SAR11 periplasmic substrate-binding proteins in the Sargasso Sea metaproteome was not unanticipated. SAR11 cells are among the smallest known free-living bacteria (Rappe et al., 2002) and cryo-electron tomography images have shown that 25%–35% of cell volume is devoted to periplasmic space (Nicastro et al., 2006). As proposed by Hirsch et al. (1979) and later developed by Button (1993), optimal metabolic designs for survival in oligotrophic environments would likely devote a large proportion of resources to the expression of high affinity uptake systems that target multiple substrates. Button and Robertson (2000) also suggested that the surface area-to-volume ratio of a cell is determined by equilibrium between the cytoplasmic volume and the membrane's capacity to import nutrients for metabolism. According to this model, small cells with large membranes and dilute cytoplasms are better able to compete in oligotrophic environments, because their capacity to metabolize available substrates is matched to the membrane's ability to import them.

Two of the most abundant proteins we observed were periplasmic substrate-binding proteins involved in phosphorus acquisition, suggesting that these cells were maximizing phosphorus transport to cope with the low concentrations found in Sargasso Sea surface waters in the late summer (Wu et al., 2000; Steinberg et al., 2001). The periplasmic substrate-binding protein involved in the uptake of phosphate (PstS) had the highest spectral count (Table 2 and Supplementary Table S1), but peptides from the ATPase and permease subunits (PstB and PstA) as well as from the proteins involved in the regulation of the pst operon (PhoU, PhoR and PhoB) were also detected (Figure 2). Expression of PstS and other members of the pst operon is known to be induced in many bacteria when phosphorus is depleted in the medium (Scanlan et al., 1997; Dyhrman and Haley, 2006; Martiny et al., 2006). Concentrations of phosphorus are generally <5 nM during summer stratified conditions in the Sargasso Sea (Wu et al., 2000; Mills et al., 2008), where SAR11 cells have been reported to contribute substantially to inorganic phosphate uptake (Zubkov et al., 2007), indicating that they can effectively compete for inorganic nutrients at extremely low concentrations.

Figure 2
figure 2

Proteins involved in phosphate and phosphonate uptake were among the most frequently detected in a mass spectrometric (MS/MS) analysis of Sargasso Sea surface water. The figure shows spectral matches to eCDSs in the region of the HTCC7211 genome from gene 1058 to gene 1090, which includes genes for phosphate (pst) and phosphonate (phn) transport and metabolism. The SAR11 eCDSs are arranged vertically by their percent amino-acid identity to the homologous HTCC7211 gene (Wilhelm et al., 2007) and are heat mapped according to their MS/MS spectral count. Periplasmic transporters (PstS and PhnD) were the most highly detected subunits, but the ATPase and permease subunits (PstA, PstB and PstE) and regulatory proteins (PhoU, PhoR and PhoB) were also frequently observed. eCDSs,environmental protein-coding sequences.

Figure 3 shows that multiple variants of the SAR11 PstS protein were detected, with as many as six amino-acid substitutions observed at one sequence position. Marine bacteria often have highly variable gene sequences in nature. Much of this appears to be neutral variation (Wilhelm et al., 2007), but some may be associated with the evolutionary divergence of ecotypes. The SAR11 clade consists of multiple ecotypes that vary in abundance with depth and season (Field et al., 1997; Morris et al., 2002, 2005; Carlson et al., unpublished). The data in Figure 3 show that some subclades of the SAR11 pstS genes found in the Sargasso Sea metagnomic data were highly translated in summer surface samples from the Sargasso Sea, but others were not. Notably, the PstS subclade that includes Sargasso Sea SAR11 strain HTCC7211 was highly detected, but the subclade of PstS proteins that included coastal strain HTCC1062 was infrequently detected.

Figure 3
figure 3

Peptide coverage and phylogenetic position of PstS eCDSs. Amino-acid coverage of the PstS eCDSs reveals that some subclades of SAR11 PstS genes represented in the Sargasso Sea metagenomic data were frequently detected in these summer surface samples, and others were not. The neighbor-joining phylogenetic tree was inferred from nearly full-length SAR11 PstS eCDS amino-acid sequences with ARB, using the Dayhoff PAM model. Short sequences (designated with an asterisk (*)) were inserted using the ARB parsimony insertion tool. For each sequence in the tree, the colored bars show which amino-acid positions were detected. Amino acids are heat mapped according to the prevalence of the given residue at that position in the consensus sequence, with red indicating the most common and green indicating the least common amino acid. Gray indicates undetected residues. In some cases, as many as six amino-acid substitutions were observed. eCDSs, environmental protein-coding sequences.

The periplasmic substrate-binding protein for phosphonate acquisition (PhnD) was also among the top frequently detected SAR11 proteins, and the membrane-spanning ATPase (PhnE) for this transport system was also observed (Table 2 and Supplementary Table S1). Phosphonates have recently been recognized as a potential source of phosphorus in oligotrophic systems (Kolowith et al., 2001). These compounds, which contain a C–P bond, are particularly stable and are thought to mainly originate from eukaryotic lipids. Genes for phosphonate metabolism were not found in coastal SAR11 isolates, but appear in the Sargasso Sea isolate, HTCC 7211. Pho proteins that regulate the pst operon also regulate the phn operon in Escherichia coli and have been shown to be induced by phosphorus limitation (Wanner, 1996).

Our findings provide the first direct evidence of SAR11 cells in open ocean gyres devoting significant resources to the acquisition of phosphorus from phosphonates. From Figure 2, it is apparent that while proteins involved in phosphonate transport were frequently detected, the cytoplasmic proteins involved in the degradation of phosphonate were not, even though the genetic potential for expression of these proteins was present in the environment. This observation supports the theory of Button and Robertson and suggests that phosphonate processing enzymes are rare, relative to transporters, because the total flux of phosphonates through the pathway never requires more than one or a few of each phosphonate processing enzyme per cell and expressing fewer cytoplasmic proteins allows the cells to maintain a large surface area-to-volume ratio.

We found evidence that SAR11 cells express transporters involved in the acquisition of organic compounds that contain reduced sulfur, such as methionine and dimethylsulfoniopropionate (DMSP). Tripp et al. (2008) reported that the SAR11 clade lacks the genes for assimilatory sulfate reduction and therefore relies solely on organic molecules for sulfur. While there are no known transporters that specifically import methionine or DMSP, it has been predicted that the high-affinity amino-acid transporter, YhdWXYZ, could be involved in methionine uptake, and the OpuA uptake system, which is annotated as an ABC-type glycine betaine transporter, has also been shown to import DMSP (Holtmann and Bremer, 2004). The observation of multiple peptides from the periplasmic components of each of these transport systems (YhdW and OpuAC), but not sulfate transport proteins, is consistent with the finding that SAR11 relies on reduced sulfur compounds as a source of sulfur.

Transporters for other organic compounds, such as amino acids and sugars, were also frequently detected. DctP is the periplasmic-binding subunit of a tripartite ATP-independent periplasmic (TRAP) dicarboxylate transporter for mannitol and/or chloroaromatic compounds. This high-affinity transporter relies on proton motive force to drive uptake (Forward et al., 1997). The periplasmic substrate-binding protein PotD is involved in spermidine uptake in E. coli, but is also reported to import putrescine with a lower affinity. Both of these compounds are nitrogen-containing polyamines that have been detected in many organisms (Igarashi and Kashiwagi, 1999) including diatoms, where they are a component of the silica wall (Sumper et al., 2005). Diatoms are a likely source of polyamines in the Sargasso Sea.

All of the proteins involved in the uptake of ammonium and its incorporation into the amino acids, glutamine and glutamate, were detected (AmtB, GlnA, GlnT and GltB; Table 2), suggesting that the SAR11 population was metabolically poised to assimilate inorganic nitrogen.

The absence of evidence for iron transport functions was conspicuous, suggesting that under the prevailing conditions, this part of the SAR11 metabolic repertoire may not have been highly expressed. Consistent with our observations, Sedwick et al. (2005) reported higher levels of iron in Sargasso Sea summer surface waters compared to the spring, presumably from eolian input from the Saharan desert. The periplasmic iron-binding transporter was one of only two periplasmic substrate-binding proteins not observed in the metaproteomic data, the other being a protein of uncertain substrate specificity. A number of different factors (e.g., peptide ionization efficiency or sample preparation biases) might explain the absence of mass spectral data for a given protein. However, it is unlikely that these factors would lead to the absence of peptide-spectra for the iron transport proteins, when so many other transport proteins were detected at such high abundance.

The membrane protease HflKC and the chaperone proteins GroEL, GroES and DnaK were among the most prevalent proteins detected in this study; ATP-dependent proteases FtsH, ClpX and HslU were also detected (Supplementary Table S1). The detection of these chaperone and protease proteins suggests that protein refolding and proteolysis may be integral to bacterial survival in ocean surface water, because proteins are continually being damaged as a result of exposure to environmental stresses.

The approach we employed in this study has previously been applied to many different organisms and was chosen here to maximize the number of proteins detected. However, it was not designed to detect integral membrane proteins like proteorhodopsin. In previous work (Giovannoni et al., 2005a), we employed procedures that were optimized for proteorhodopsin detection by (1) focusing on small cells, (2) isolating the membrane fraction, (3) focusing on SDS-polyacrylamide gel electrophoresis gel sections that contained proteins of the desired size and (4) solubilizing the protein with dodecylmaltoside, a detergent that was known to be compatible with proteorhodopsin detection. Proteorhodopsin requires specialized methods for detection, because it contains seven transmembrane helices and is imbedded in the plasma membrane, making it difficult to solubilize and digest. In addition, few of the tryptic peptides in proteorhodopsin are within the mass range commonly scanned in proteomics experiments. Like most analytical procedures, proteomics is sensitive to bias, and the absence of detection of proteorhodopsin, which is thought to be abundant in marine environments (Fuhrman et al., 2008), is a good reminder that the lack of detection of a protein is not a conclusive proof of the absence of that protein.

Abundant cyanobacterial proteins

To provide a point of reference for comparison, we used the same method to examine the distribution of Prochlorococcus and Synechococcus peptides in the mass spectrometric data. Prochlorococcus is the dominant phototroph in the Sargasso Sea, reaching maximum cell abundances of ∼105 cells per ml in deeper (80–200 m) summer waters (DuRand et al., 2001). Synechococcus generally blooms in spring surface water, but can range in abundance from 103 to 104 cells per ml during stratified periods and can comprise a significant fraction of the picoplankton in the 1-m surface of the Sargasso Sea (DuRand et al., 2001). Approximately twice as many Synechoccocus spectra and peptides were detected compared with Prochlorococcus, even though Prochlorococcus strain MIT9312 (used as the query genome for database construction) is thought to be more abundant that Synechococcus sp. in summer surface water in the Sargasso Sea. We attribute this to the larger size of Synechococcus cells and natural variation in microbial community composition.

Like SAR11, the most frequently detected cyanobacterial proteins were involved in transport functions (Supplementary Tables S2 and S3). However, the overall number and variety of transporters detected was much lower than in SAR11, suggesting that Prochlorococcus and Synechococcus devote a lower percentage of their overall energy expenditure to nutrient acquisition.

One of the most abundant transporters detected in both of the cyanobacteria was involved in urea uptake. Synechococcus proteins for two urease subunits, which are necessary for urea degradation, were also detected (Collier et al., 1999). This suggests that urea, the most common form of dissolved organic nitrogen in the open ocean (Anita et al., 1991) is an important nitrogen source for Synechococcus. Other Synechococcus proteins involved in nitrogen metabolism that were detected include nitrate and nitrite transporters, cyanate hydratase, and the nitrogen regulatory protein P-II (Table 3 and Supplementary Table S3). Prochlorococcus proteins for urea degradation were not detected even though they are present in the genome, but proteins for ammonium and oligopetide transport were detected (Table 3 and Supplementary Table S2), supporting the previous data suggesting that Synechococcus can grow on a wide range of nitrogen sources, whereas high-light adapted Prochlorococcus strains preferentially use ammonium and organic sources of nitrogen (Moore et al., 2002; Zubkov et al., 2003).

Both Prochlorococcus and Synechococcus also expressed proteins known to be induced during phosphate depletion. These included phosphate and phosphonate uptake proteins and the sulfolipid (UDP-sulfoquinovose) biosynthesis protein (Guler et al., 1996; Van Mooy et al., 2006). In addition, a PhoH-like phosphate starvation-inducible protein encoded by Prochlorococcus, but not Synechococcus, was also detected.

Proteins involved in photosynthesis and carbon fixation were frequently detected (Table 3 and Supplementary Tables S2 and S3). In addition to general photosystem proteins, proteomic evidence of the pigment differences between Prochlorococcus and Synechococcus were observed with chlorophyll and carotenoid synthesis proteins detected for Prochlorococcus and phycobilisome proteins detected for Synechococcus. Carboxysome proteins, which are induced by inorganic carbon limitation and are thought to be signaled by high light (McKay et al., 1993; Woodger et al., 2003) were also observed, with peptides for the Synechococcus orthologs appearing more frequently. The carboxysome allows cyanobacteria to overcome inorganic carbon limitation by concentrating CO2 for its conversion to organic carbon by the enzyme ribulose 1, 5 bisphosphate carboxylase/oxygenase (RuBisCO), which was also detected.

The observation of abundant damage-control systems in the cyanobacteria paralleled similar findings in SAR11. However, in addition to chaperones that refold damaged proteins, proteins focused on the prevention of oxidative damage from photosynthesis by-products were also highly represented by peptide spectra (i.e., relatively abundant). In particular, thioredoxin, thioredoxin peroxidase and superoxide dismutase were among the most frequently detected cyanobacterial proteins.

Proteomic analyses of microbial communities

Post-transcriptional mechanisms, such as transcript degradation or inactivation by riboswitches, can lead to differences between transcript levels and actual metabolic activity. Proteomic analyses examine proteins, the final product of all levels of gene expression, and thus provide information that is often different from the information provided by the measurements of gene expression (Gygi et al., 1999). Although our experiments were not designed to detect post-translational modifications, certain modifications that affect protein function can also be detected using proteomic analysis. Genomic, transcriptomic and proteomic analyses are complementary, each contributing to our understanding of how microbial communities respond to their environment. Transcriptomic and proteomic analyses build on genomics by describing different levels of regulation for microbial responses to environmental perturbations. Recent transcriptomic studies have been effective at illuminating the transcriptional level of gene expression by identifying gene transcripts in marine and freshwater ecosystems, (Poretsky et al., 2005; Frias-Lopez et al., 2008).

Conclusion

The metaproteomes of SAR11, Prochlorococcus and Synechococcus provided ample evidence of cellular adaptations to an extreme environment in which cells are subjected to damage by light and oxidative stresses, while competing for essential nutrients that are at extremely low concentrations. Multiple chaperones and proteins involved in mediating oxidative stress were found in conjunction with overwhelmingly abundant proteins for nutrient acquisition in SAR11 and proteins for photosynthesis and carbon fixation in Prochlorococcus and Synechococcus. In the case of SAR11, the very high spectral counts for periplasmic substrate-binding proteins suggests that in conjunction with maintaining a small cell size and large periplasmic space, the cells express abundant transport proteins to maximize nutrient uptake activity. These adaptations are likely the means by which SAR11 organisms have evolved to become dominant prokaryotes in nutrient-depleted environments. Our findings may focus attention on transport systems as the boundary that defines the interaction of cells with their chemical environment.