Proteorhodopsins (PRs) (Béjà et al., 2000, 2001) are bacterial retinal-binding membrane pigments that belong to the microbial rhodopsin superfamily (Spudich et al., 2000) and are predicted to have an important role in supplying light energy for microbial metabolism in marine ecosystems (Béjà et al., 2000, 2001; Sabehi et al., 2005; Martinez et al., 2007; Walter et al., 2007). PRs have been observed in different ocean regions (Béjà et al., 2000, 2001; de la Torre et al., 2003; Man et al., 2003; Sabehi et al., 2003, 2004, 2005; Venter et al., 2004; Frigaard et al., 2006) and are found in diverse taxonomic backgrounds, including the ubiquitous marine gammaproteobacterial SAR86 (Béjà et al., 2000; Sabehi et al., 2004, 2005) and alphaproteobacterial SAR11 (Giovannoni et al., 2005a, 2005b; Sabehi et al., 2005) groups, as well as in marine Bacteroidetes (Venter et al., 2004; Gómez-Consarnau et al., 2007), planktonic Archaea (Frigaard et al., 2006) and other microbial taxa (de la Torre et al., 2003; Sabehi et al., 2003, 2005; Venter et al., 2004; McCarren and DeLong, 2007). While previous work did not detect light enhanced growth in PR-containing SAR11 (Giovannoni et al., 2005a), or in the gammaproteobacterial SAR92 (Stingl et al., 2007) isolates grown in seawater, significant enhancement of both growth rate and yield was recently reported in PR-expressing marine Bacteroidetes (Gómez-Consarnau et al., 2007).

PR spectral tuning was first observed among genes amplified directly from the environment by using highly specific polymerase chain reaction (PCR) primers (Béjà et al., 2001) to the gammaproteobacterial SAR86 group. These genes produced pigments of two distinct absorption spectra where ‘green-absorbing’ PR (GPR) and ‘blue-absorbing’ PR (BPR) types shared >78% of their amino-acid residues (200 out of 247 are identical). Moreover, a single amino-acid change at position 105 (leucine for green, glutamine for blue) functions as a spectral tuning switch to account for most of the spectral difference (Man et al., 2003). Here, we sought to determine how common spectral tuning is among PRs from diverse taxonomic backgrounds, and to what extent their distribution correlates to distinct oceanic regimes by comparing PR sequence diversity from different depths and different seasons from two oligotrophic seas, the eastern Mediterranean and Sargasso Seas.

Materials and methods

Sample collection, DNA preparation and storage

Environmental samples were collected from the BATS station (31° 40′N, 64° 10′W) in the Sargasso Sea (mixed BATS114a station (March 1998) and stratified BATS118 station (July 1998)) and from the H01 station (32° 54′N, 34° 55′E) in the Mediterranean Sea (mixed H01 station in January 2006 and stratified H01 station in May 2003) from depths of 0, 40, 80 m and 0, 20, 55 m, respectively (see Supplementary Figure S1 for temperature profiles collected in the different stations). BATS DNA was prepared from 50 to 70 l according to Gordon and Giovannoni (1996), while Mediterranean Sea DNA was prepared from 20 l according to Massana et al. (1997). All samples were collected on 0.2 μm-pore-size filters. Samples were stored in liquid nitrogen or in a −80°C freezer before and after DNA extraction.

PR PCR amplification

PR fragments (330 bp) were amplified by PCR from DNA extracts using a subset of the degenerate primers previously used by us to amplify environmental PRs (Sabehi et al., 2005). The degenerated primers RYIDWLfwd, (5′-MGNTAYATHGAYTGGYT-3′) and GWAIYPrev, (5′-GGRTADATNGCCCANCC-3′), targeting the conserved RYIDWL and GWAIYP regions in the PR proteins, respectively, using a high-fidelity proof reading polymerase mixes BIO-X-ACT (Bioline, Berlin, Germany). PCRs were performed in a total volume of 25 μl containing 10 ng of template DNA, 200 μ M dNTPs, 2.6 mM MgCl2, 0.8 μ M primers and 1.2 U of DNA polymerase. The amplification conditions comprised steps at 95°C for 4 min, and 30 cycles at 94°C for 30 s, 52°C for 30 s, and 68°C for 30 s followed by one step of 2 min at 68°C. PCR products were cloned using the QIAGEN-PCR cloning kit (Qiagen, Hilden, Germany).

PR phylogeny

The PR tree was constructed according to Sabehi et al. (2005). PR proteins were identified in GenBank including predicted proteins from the Sargasso Sea assemblies (Venter et al., 2004) using BLAST (Altschul et al., 1990) searches with representatives of previously identified PR-like protein families as query sequences. PR proteins were aligned using CLUSTALx (Thompson et al., 1997), and a neighbor-joining phylogenetic tree was inferred using the neighbor programs of PAUP* (Swofford, 2002).

Estimation of sampling efficiency

The efficiency of sequence recovery from the different samples was estimated based on DNA and protein sequences, respectively. In both cases, non-parametric richness estimators were calculated to estimate the total number of operational taxonomic units (OTUs) defined as clusters of different DNA or protein sequence similarity thresholds. These estimates were then compared to the observed number of OTUs to determine the percent coverage. The OTUs were derived from the complete data set after which the individual samples were broken out so that OTUs are defined consistently and can be traced across all samples. We report the richness obtained by using the abundance-based coverage estimator (ACE) richness estimator since it consistently gave the highest diversity estimates and can thus be treated as an upper bound of diversity estimates. For DNA-based estimates, OTUs were assembled at 1%, 5% and 20% DNA sequence divergence using CLUSTERER (Klepac-Ceraj et al., 2006). For different samples, total coalescence of the OTUs occurred at between 40% and 50% DNA sequence divergence. The EstimateS program (Colwell, 2005) was used to determine non-parametric richness estimators of total diversity in each sample. The second method used protein alignments to create distance measures, employing a JTT dynamic matrix model. Such correction does not assign constant weights to amino-acid changes; however, a distance of 0.01–0.02 roughly corresponds to a single amino-acid change, and a distance of 2.81 was sufficient to coalesce all the OTUs. The number of OTUs was estimated for JTT distances of 0.02, 0.4 and 0.6 – roughly equivalent to 1%, 20% and 30% amino-acid difference. The distance matrix was created using PROTDIST within the Phylip package (Felsenstein, 1989) and then imported into DOTUR to calculate the non-parametric richness estimates (Schloss and Handelsman, 2005).

Irradiance measurements in the Sargasso and Mediterranean Seas

Irradiance at 5 and 40 m depth in the Sargasso B114a BATS station were measured at 12 wavelengths (410, 441, 465, 488, 510, 520, 555, 565, 589, 625, 665 and 683 nm). The data was extracted from the Bermuda Bio-Optics Project ( and was collected from the monthly BBOP casts at station B114a (12 March 1998). Irradiance, at 5 and 55 m depth from the Mediterranean Sea station To (32° 09′N 34° 14′E) was measured at six wavelengths (412, 443, 490, 555, 665 and 694 nm). The Mediterranean spectrums (3 October 2006) were measured further south to the station where the DNA samples were retrieved due to instrument availability.

Accession numbers

The sequences reported in this study are deposited with GenBank under accession numbers DQ203297DQ203852, DQ339158DQ339245, DQ339247DQ339288, DQ339290DQ339341, DQ339343DQ339417, DQ422150DQ422447.

Results and discussion

Diverse PR sequences were observed in both Mediterranean and Sargasso Sea samples using newly designed degenerate PR primers (Sabehi et al., 2005) (Figure 1 and Supplementary Figure S2). The PR gene fragments fell into several families not restricted to the PR groups previously reported using non-degenerate PCR primers (Béjà et al., 2001; Sabehi et al., 2003) and the overall diversity of the PR sequences was comparable to that previously detected in the Sargasso Sea metagenome (Venter et al., 2004). No apparent bias was also observed regarding any potential preference of our primers toward low G+C amplicons (Suzuki and Giovannoni, 1996) (Supplementary Figure S2), further supporting the adequacy of these primers to sample across different subpopulation of the PR family. However, while new groups were retrieved in samples from both Mediterranean and Sargasso Sea environments, some others previously reported from the Sargasso Sea metagenome project were not detected (Venter et al., 2004). This difference may at least be partly explained due to the different collection time of the samples (March and July 1998 vs February 2003 (Venter et al., 2004)), as marine microbial populations are known to change in a seasonal manner (see Giovannoni and Rappé (2000) for a review).

Figure 1
figure 1

Relationship between Mediterranean Sea and Sargasso Sea GPR and BPR proteins retrieved from different depths and seasons. A neighbor-joining phylogenetic tree of Mediterranean PR protein sequences amplified from 0, 20 and 55 m and Sargasso PRs from 0, 40 and 80 m from mixed and stratified samples. Names of the PR sequences were removed for clarity. Environmental sequences are represented by blue or green bands corresponding to their predicted absorption maxima to illustrate their distribution in different depths and seasons. The two subfamilies previously reported to be spectrally tuned (Béjà et al., 2001) are marked in blue and green boxes. SAR86 and SAR11 PR families, as well as few different PR representative's names from the Pacific Ocean (Béjà et al., 2001), Sargasso Sea shotgun-sequencing project (Venter et al., 2004), Mediterranean Sea bacterial artificial chromosomes (BACs) (Sabehi et al., 2003, 2004, 2005), Pelagibacter ubique (Giovannoni et al., 2005a) and marine Archaea (Frigaard et al., 2006) are indicated to serve as orientation marks. Different groups discussed in the text are marked with gray boxes while groups in which spectral tuning is suggested are marked with an empty box. Pyrocystis lunula (Okamoto and Hastings, 2003) was used to root the tree and is not shown. A modified B&W version of this figure, which could be visualized by people suffering from color blindness, is presented in Supplementary Figure S5.

Traditionally, rarefaction analysis has been used to estimate the proportion of the natural population that has been sampled at various sites. This method becomes more accurate in estimating the total diversity the more samples are taken. However, it is required that the samples be uniform from within the same population. In the current case, there are two concerns. First, the primers do not represent the full degeneracy available within the presumed conserved amino-acid sequences (GWAIP and RYIDWL). Thus, the real entity being sampled is potentially a subset of the PRs present in the environment. The assumption that these primers adequately represent the PR diversity at those sites is thus noted. Second, no repeated samples were taken from the same depth, location and time. This presents two options for rarefaction analysis: estimate the unsampled population from a single sample at each location or assume that the samples are from a relatively unstructured population and use them as 12 replicates from a single source. In this study, each of these methods was used.

To estimate how well the PR diversity in the clone libraries has been sampled, we clustered the sequences into OTUs and applied the ACE non-parametric richness estimator. Treating all samples as a uniform population shows an estimated coverage of 43%, 59% and 79% for OTUs defined as clusters of 1%, 5% and 20% sequence divergence, respectively (Table 1). Similar results were obtained for protein sequence-based OTUs where the coverage estimates rise up to 93% for distances roughly equivalent to amino-acid sequence divergence of 30% (Supplementary Table S1). We further calculated OTU diversity estimates for each individual sample to determine whether assessing the sequences as a single population would introduce a systematic bias in the estimates (Table 2). This showed varying degree of coverage but good agreement with the overall picture of sample coverage of 50% of OTUs with internal divergence of 1% DNA sequence. Thus, the analyses suggest that roughly half of the sequence diversity was sampled when OTUs are stringently defined but also indicates that at amino-acid differences of 30% nearly complete coverage was achieved.

Table 1 Coverage estimates in combined samples based on comparison of number of OTUs calculated by the ACE non-parametric richness estimator and OTUs detected in samples for varying percent DNA sequence differences
Table 2 Comparison of diversity estimates and observed numbers of OTUs in individual samples by the ACE non-parametric richness estimator

∫-LibShuff (Schloss et al., 2004) was employed to estimate whether the populations from the different samples were overlapping or non-overlapping. This analysis (Supplementary Figure S3) indicates that while most samples are drawn from distinct populations of PRs, samples at different depths during the times of ocean mixing (January/March, respectively) were drawn from statistically indistinguishable populations. Samples from the same locations and times at the stratified time points appear to have significantly more structure, particularly when comparing the extremes of depth. These results indicate that a more detailed analysis of molecular variance will have a large amount of variance with which to work; further the limited degree of similarity among the samples presents a challenge to that analysis. It is also probable that a large fraction of PR sequences in these environments remain unknown, stimulating further research to shed light on this interesting group of proteins. Analysis of distribution of these OTUs and their richness in different locations awaits more complete sampling.

While some PR families were detected in both the Mediterranean Sea and the Sargasso Sea (see for example groups III and VI in Figure 1), others were unique to one location (groups I, II, VII and VIII in Figure 1). However, geographic isolation does not appear to play a strong role in PR distribution since representatives of most families appear in both locations. This may be the result of high potential for lateral transfer of PR genes (Frigaard et al., 2006; Sharma et al., 2006; Martinez et al., 2007; McCarren and DeLong, 2007); alternatively, it may be possible that PR genes reside in many widely distributed ecological generalists in addition to locally adapted specialists.

Similarly, some PR families were distributed differently with respect to seasons within a given location. However, a much stronger seasonal pattern is evident in the Mediterranean than in the Sargasso Sea consistent with stronger seasonal variation in the former than in the latter. Some families were abundant in winter samples (e.g. group II in Figure 1) or summer samples only (e.g. group IV in Figure 1), while others were detected in both seasons. The composition of the microbial community harboring PR genes may be strongly influenced by water column stratification. During the summer stratification, gradients in light spectrum and intensity, temperature, salinity and nutrient concentration may develop, while conditions during winter are much more uniform due to deep mixing of the water column. This uniformity is reflected in the distribution of the different PR families as the families detected in winter samples are found at all three depths (Figure 1).

Light intensity in the water column declines with increasing depth, and is absorbed or scattered by seawater, yellow substance dissolved matter and particles, including phytoplankton (Thurman and Trujillo, 1999). Red light, absorbed mainly by water molecules, disappears in the first few meters; the green light penetrates further but not to deep water, while blue light is maintained throughout the euphotic zone (Figure 2). Therefore, PR sequences were further sorted based on their ‘blue’ and ‘green’ signatures (i.e. leucine or glutamine at position 105). Eight environmental sequences (0.7%) were found to contain a methionine in this position, which preliminary data indicate to absorb in the green range (Gómez-Consarnau et al. (2007) and Sabehi and Béjà, data not shown). Methionine contains a non-polar side chain similar to leucine, which might explain the similarity in green light absorption. However, sequences with methionine at position 105 were not included in the spectral tuning statistics because no further spectral analyses were performed on cells containing those exact PR sequences and the inference is thus relatively uncertain.

Figure 2
figure 2

Relative irradiance in the Sargasso and Mediterranean Seas. Relative irradiance at 5 and 40 m depth in the Sargasso B114a BATS station (bold lines). Irradiance is plotted relative to irradiance at 441 nm; and relative irradiance, at 5 and 55 m depth at the Mediterranean Sea T0 station (thin lines). Irradiance is plotted relative to irradiance at 443 nm.

PR sequences from the Mediterranean Sea sample collected in May 2003 showed the expected stratification with respect to depth; GPRs were found mainly in surface waters (0 and 20 m, 61% and 44% respectively), while BPRs were found at all depths (Figure 3). The detection of both BPRs and GPRs in Mediterranean surface waters suggests that both types can contribute to growth advantage form utilization of light in the blue and green range and that both pigment types may thus stably coexist. A similar explanation was recently suggested by Stomp et al. (2004) when reporting the coexistence of closely related picocyanobacteria of the Synechococcus group. During the winter mixing, the bacterial population would be primarily exposed to blue light conditions as they cycle between deeper and surface water. Consistent with this expectation, BPR dominate and are stably distributed with depth in samples collected from the Mediterranean Sea during January 2006 when the water column mixes deeply.

Figure 3
figure 3

GPRs and BPRs depth distribution in the Mediterranean and Sargasso Seas. Pie charts represent percentages of GPRs (marked in green) and BPRs (marked in blue) at different depths in the stations sampled. Data extracted from the Sargasso Sea shotgun-sequencing project (Venter et al., 2004) are presented on the right. The percentages presented in the different pies are (GPRs%, BPRs%, n=number of clones), (61, 39, n=147), (44, 56, n=169), (9, 91, n=75) and (36, 64, n=90), (31, 69, n=103), (41, 59, n=103) in the 0, 20 and 55 m samples of the stratified and mixed Mediterranean station, respectively. In the Sargasso Sea station, only BPRs were detected except for the stratified 40 m sample, which contained 0.01% GPRs (one out of total 414 sequences). In the data from the Sargasso Sea metagenome, 2% GPRs were detected (n=291). A modified B&W version of this figure, which could be visualized by people suffering from color blindness, is presented in Supplementary Figure S6.

Phylogenetic analysis of the PR sequences retrieved from the Mediterranean sample (n=396) suggests that spectral tuning has arisen independently multiple times (Figure 1). The previously reported spectral tuning in SAR86 group family from the Pacific Ocean (Béjà et al., 2001) was shown to be driven by Darwinian selection (Bielawski et al., 2004) and we speculate that similar analyses will show that the same selection applies to other PR families with spectral tuning reported here. Of special interest is the spectral tuning observed within the cosmopolitan SAR11 family (Figure 4). In this abundant family, BPRs and GPRs share as much as 86% of their amino acids (e.g. Pelagibacter GPR vs Sargasso Sea EAI06928 BPR), a value somewhat higher than observed in the SAR86 group (with a maximum of 78% identity; Béjà et al., 2001). Interestingly, several BPRs and GPRs containing SAR11 members also share common gene organization in the proximity of the PR gene (a ferredoxin and a thioredoxin reductase, compare, for example, Pelagibacter ubique and eBAC86H08 gene organizations on the right panel of Figure 4).

Figure 4
figure 4

SAR11-like GPRs and BPRs. Neighbor-joining phylogenetic tree of PR proteins from the SAR11 family. Next to each BAC (Sabehi et al., 2005), Sargasso Sea shotgun-sequencing singleton (Venter et al., 2004) or Pelagibacter ubique (Giovannoni et al., 2005a) PR sequence is a schematic representation of the surrounding PR neighboring proteins. PR sequences retrieved via PCR in this study are marked by boldface letters. BPRs are marked in white letterheads on blue background while GPRs are marked with black letterheads on green background.

In contrast to the observation from the Mediterranean waters, no ‘green’ signatures were observed in the Sargasso Sea samples despite similarity in patterns of downwelling irradiance or relative irradiance with depth (Figure 2 and Supplementary Figure S4). All sequences (n=414, collected at the same station in different seasons) contained the ‘blue’ signature only, irrespective of depth and stratification condition they were collected from (mixed BATS 114a station (March 1998) and stratified BATS 118 station (July 1998); see Supplementary Figure S1 for temperature profiles collected in the different stations). We performed a similar analysis on the data from the Sargasso shotgun-sequencing project (Venter et al., 2004; collected in February 2003 from 5 m surface water), which confirmed these observations. Out of 291 PR containing the ‘105 position’ in the Sargasso Sea shotgun-sequencing data, only six were predicted to absorb in the green range while the remaining 285 sequences contained a glutamine in the ‘105 position’ (98% blue PRs in the Sargasso shotgun-sequencing project). In the Sargasso Sea (in the selected collection days), the light energy peak in 5 and 40 m is in the blue range with maximal intensity near 440 nm (blue shifted compared to Mediterranean waters; see Figure 2 and Supplementary Figure S4). At 5 m, there is about 82% at 525 nm and it might be that even 20% reduction in the green light range is enough for the BPRs to outcompete the GPRs under these conditions in the Sargasso Sea. It could also be that over time (and mixing), cells will be exposed to deeper waters and one would assume that evolutionary selection would lead to exclusively blue-tuned PRs as being advantageous under the light available in the Sargasso Sea. However, it is somewhat puzzling that GPRs appear underrepresented at this station when green light is still available. This might suggest that BPRs harboring Sargasso Sea bacteria have additional adaptations, which make them abundant in this station in comparison to GPR harboring bacteria found in the Mediterranean Sea or in the Pacific Ocean (Béjà et al., 2001).

Several BPR families were distributed differently in the water column at both locations even when there was little or no GPR–BPR stratification, suggesting depth-specific adaptations in addition to differentiation in absorption spectrum. Some groups occurred primarily in surface samples (e.g. group VI in Figure 1) while others in deep waters (e.g. group I in Figure 1); however, some groups were found throughout the euphotic zone (e.g. group IV in Figure 1). The GPR families were mainly found in the surface with some minor stratification where some were found at both 0 and 20 m (e.g. group VIII in Figure 1), others at 20 m only (e.g. group V in Figure 1), and still others were found along the water column (e.g. group VII in Figure 1). The different distribution among PR families that absorb the same wavelength in the water column may result from competition between the bacteria harboring these PRs. However, the comparison among the different locations suggests that genomic background may be of higher importance in determining the distribution of the different groups.

Overall, the data show that spectral tuning is a common trait shared by different groups of bacteria carrying PR pigments and is widespread in environments where different light quantities and qualities are present. However, our data from the Sargasso Sea, where only BPRs are observed, suggest that either GPR are competitively disfavored under conditions of the Sargasso Sea or PR distribution is complicated by genomic background correlated to other, yet unknown, environmental parameters.

Note added in proof

The distribution of different surface water GPR and BPR variants across the Sorcerer II Global Ocean Sampling expedition transect (Northwest Atlantic through Eastern Tropical Pacific) was recently released. GPR variants were found to be highly abundant in the North Atlantic samples and in non-marine environments, while BPR variants dominated in the remaining mostly open ocean samples (Rusch DB, Halpern AL, Heidelberg KB, Sutton G, Williamson SJ, Yooseph S et al. (2007). The Sorcerer II Global Ocean Sampling expedition: I, The Northwest Atlantic through the Eastern Tropical Pacific. PLoS Biol 5: e77).