Most of the genes required for the formation of bacteriochlorophyll-containing photosystems in aerobic, anoxygenic, phototrophic (AAP) bacteria are clustered in a contiguous, 45-kilobase (kb) chromosomal region (superoperon)6. These include bch and crt genes coding for the enzymes of the bacteriochlorophyll and carotenoid biosynthetic pathways, and the puf genes coding for the subunits of the light-harvesting complex (pufB and pufA) and the reaction centre complex (pufL and pufM). To better describe the nature and diversity of planktonic, anoxygenic, photosynthetic bacteria, we screened a surface-water bacterial artificial chromosome (BAC) library9 prepared from surface-water marine bacterioplankton, for clones containing pufL and pufM10,11 genes.

Several puf-containing BAC clones were identified in screens using primers10 designed to amplify almost the entire pufL and pufM region (1520–1580 nucleotides). In addition, several coastal α-proteobacterial strains previously isolated off the Oregon coast were also screened (R2A strains)12. The pufM phylogenetic tree (Fig. 1a) encompassed two principal clades, one containing α-3 and α-4 proteobacteria and one containing α-1, α-2, β- and γ-proteobacteria representatives, in agreement with an earlier study10. The BAC clones fell into three groups on the basis of pufM gene phylogeny (Fig. 1a). One group (BAC clones 30G07, 56B12 and 60D04) was most closely related to pufM from α-proteobacteria isolates R2A62 and R2A84 (ref. 12) (Roseobacter-like, Fig. 1b), and was placed in the group containing α-3 Proteobacteria. The other two groups (29C02, 39B11 and 24D02, 52B02, 65D09) branched together, and were most similar to the freshwater β-proteobacterium Rhodoferax fermentans (Fig. 1a, b). puf genes were also amplified by polymerase chain reaction (PCR) from DNA extracts of a mixed bacterioplankton assemblage, sampled at the same location as where the BAC library originated: Monterey Bay, California. These Monterey Bay pufM sequences clustered with BAC clones 30G07, 56B12 and 60D04, the α-proteobacteria isolates R2A62 and R2A84 (env0m2, env20m1 and env20m5), and the group composed of Roseobacter isolates (env0m1). The phylogenetic relationships of pufM genes were generally consistent with those determined from ribosomal RNA gene sequences, with the exception of the pufM cluster containing the α-1, α-2, β- and γ-proteobacteria (Fig. 1b).

Figure 1: Phylogenetic relationships of pufM gene (a) and rRNA (b) sequences of AAP bacteria.
figure 1

a, b, Evolutionary distances for the pufM genes (a) were determined from an alignment of 600 nucleotide positions, and for rRNA genes (b) from an alignment of 860 nucleotide sequence positions. Evolutionary relationships were determined by neighbour-joining analysis (see Methods). The green non-sulphur bacterium Chloroflexus aurantiacus was used as an outgroup. pufM genes that were amplified by PCR in this study are indicated by the env prefix, with ‘m’ indicating Monterey, and HOT indicating Hawaii ocean time series. Cultivated aerobes are marked in light blue, bacteria cultured from sea water are marked with an asterisk, and environmental cDNAs are marked in red. Photosynthetic α-, β- and γ-proteobacterial groups are indicated by the vertical bars to the right of the tree. Bootstrap values greater than 50% are indicated above the branches. The scale bar represents number of substitutions per site.

To identify groups actively expressing photosynthetic genes in natural populations, we used PCR with reverse transcription (RT-PCR) to identify photosynthetic-operon messenger RNAs from the same environment. The pufL/M primer set failed to amplify any complementary DNA. However, a different primer targeting a smaller fragment (156 nucleotides) of the pufM gene revealed five different groups of pufM in a Monterey Bay pufM cDNA gene library (cDNAs in Fig. 1). One cDNA group clustered with the aerobic marine phototroph Roseobacter denitrificans, another with strain R2A84, and one was most similar to the pufM sequence from a γ-proteobacterium, Thiocystis gelatinosa. Two pufM cDNA groups clustered with the different BAC clones, implying that these BAC inserts originate from bacteria that were actively expressing photosynthetic genes.

To better characterize the bacteria from which these photosynthetic operons originated, the BAC inserts (29C02, 41 kb; 60D04, 103 kb; 65D09, 87 kb) were fully sequenced and the photosynthetic operons compared with those of cultured proteobacteria. The operon organization of BAC clone 65D09 (and 29C02; data not shown), which falls inside the α-1/α-2/β/γ pufM cluster, is considerably different from the operon organization of any previously reported photosynthetic organism (Fig. 2). Clones 65D09 and 29C02 are similar in their photosynthetic operon organization (data not shown), and when compared with both α- or β-proteobacterial (Rhodobacter sphaeroides13 and Rubrivivax gelatinosus14, respectively) operon organization, more closely resemble the β-proteobacterial operon (Fig. 2). BAC clone 60D04, which is related to α-3 proteobacteria (on the basis of pufM sequences), also differs significantly in gene arrangement from any other photosynthetic operon organization reported to date. It most closely resembles the photosynthetic operons from Rhodobacter capsulatus8 and R. sphaeroides13,15, in both organization and gene content (it contains the bchIDO and crtD genes, which are missing in R. gelatinosus14 and BAC clones 29C02 and 65D09). The superoperonal gene arrangement, crtEFbchCXYZpuf and bchFNBHLMlhaApuhA13,14,15,16,17, found in R. sphaeroides, R. capsulatus and R. gelatinosus are conserved among the naturally occurring planktonic bacterial genomes (Fig. 2), suggesting that the photosynthetic apparatus in the bacteria from which the BACs originated are functional.

Figure 2: Schematic comparison of photosynthetic operons from R. gelatinosus (β-proteobacteria), R. sphaeroides (α-proteobacteria) and uncultured environmental BACs.
figure 2

ORF abbreviations use the nomenclature defined in refs 13, 14 and 24. Predicted ORFs are coloured according to biological category: green, bacteriochloropyll biosynthesis genes; orange, carotenoid biosynthesis genes; red, light-harvesting and reaction centre genes; and blue, cytochrome c2. White boxes indicate non-photosynthetic and hypothetical proteins with no known function. Homologous regions and genes are connected by shaded vertical areas and lines, respectively.

We compared several photosynthetic genes found on the oceanic bacteriochlorophyll superoperons with characterized homologues from cultured photosynthetic bacteria. The relationships among Bchla biosynthetic proteins BchB (a subunit of light-independent protochlorophyllide reductase) and BchH (magnesium chelatase) were determined18 (Fig. 3). Similar to pufM relationships, BchB and BchH proteins from BAC clone 60D04 were most similar to homologues from R. capsulatus and R. sphaeroides (Fig. 3). Bchla biosynthetic proteins from BAC clones 29C02 and 65D09 were most closely related to those of R. gelatinosus (no sequences are yet available from a photosynthetic γ-proteobacterium) and Rhodopseudomonas palustris, again in agreement with the phylogenetic relationships of the pufM sequences (Fig. 1).

Figure 3: Phylogenetic analyses of BchB and BchH proteins.
figure 3

a, Phylogenetic tree for the BchB protein. b, Phylogenetic tree for the BchH protein. The BchH sequences from Chlorobium vibrioforme25and BchH2 and BchH3 from C. tepidum18 were omitted from the tree because these genes potentially encode an enzyme for bacteriochlorophyll c biosynthesis and are probably of distinct origin (J. Xiong, personal communication). Bootstrap values (neighbour-joining/parsimony method) greater than 50% are indicated next to the branches. The scale bar represents number of substitutions per site. The position of Acidiphilium rubrum (bold branch) was not well resolved by both methods.

Recent analyses suggest possible horizontal transfer of the photosynthetic gene cluster in purple bacteria14, and this possibility complicates definitive identification of the organismal origins of these operons based solely on photosyntethic gene analysis. However, gene assignment of open reading frames (ORFs) revealed that more than 75% of ORFs outside the photosynthetic superoperon on BAC 65D09 are most similar to proteins from γ-proteobacteria, whereas ORFs from BAC clone 60D04 are most similar to those of α-proteobacteria. The phylogenetic assignments on the basis of puf gene similarities and arrangement are therefore consistent with the chromosomal context external to the respective photosynthetic superoperons in the BAC clones analysed.

In this study, no sequences recovered in Monterey Bay waters were similar to those of Erythrobacter species—one of the more commonly cultured Bchla-containing bacteria recovered from the open ocean8. We therefore used the same pufM primers on bacterioplankton DNA extracts from waters of the central North Pacific Ocean (Hawaii ocean time series station19; envHOT clones in Fig. 1). One HOT environmental pufM group (represented by envHOT1 clone) clustered with sequences from α-3 proteobacteria isolates, and another group (envHOT2 and envHOT3) clustered with the BAC sequences related to the freshwater β-proteobacterium, R. fermentans. The results suggest that similar groups involved in oceanic aerobic anoxygenic photosynthesis are found in surface waters from both neritic and oceanic systems. These groups do not appear to be similar (at least with respect to their photosynthetic operon) to cultivated Erythrobacter species.

Recently, it has been suggested that cultivated Erythrobacter species (α-4 subclass of proteobacteria) may represent the predominant AAPs in the upper ocean8,20. Surprisingly, we were not able to retrieve photosynthetic operon genes belonging to this group in any of the samples analysed. Furthermore, very few sequences related to Erythrobacter species have been reported in 16S rDNA clone libraries constructed from marine plankton DNA, also suggesting that this group may not represent the predominant AAPs in the upper ocean. Some of the representative AAPs we found were related to cultured marine bacteria (Roseobacter and Roseobacter-like bacteria within the α-3 subclass of proteobacteria). Members of this group have also been frequently retrieved in 16S rRNA clone libraries and represent a large proportion of bacterioplankton rDNAs in coastal waters21. However, other groups (found in both BAC and cDNA libraries) were only distantly related to known anoxygenic phototrophs, and have never before been observed in marine plankton. This implies that in addition to Erythrobacter and Roseobacter species, other yet-to-be-cultivated bacteria (most likely related to β- or γ-proteobacteria) actively participate in oceanic aerobic, bacteriochlorophyll-based photosynthesis. The discovery of new marine AAP bacteria through culture-independent genomic analyses emphasizes the complementary nature of culture-based and cultivation-independent approaches, which taken together provide a much more comprehensive perspective than either does alone. Our new observations should help direct current efforts aimed at characterizing those microbes responsible for oceanic, bacteriochlorophyll-mediated photosynthesis, a newly recognized7,8 but poorly understood process in marine plankton.


BAC library and environmental cDNA preparation

The surface-water BAC library construction has been described previously9 and was prepared from sea water from station M2 (located approximately 45 km offshore of Moss Landing, California) pre-filtered through a GF/A glass fibre filter (approximate particle size less than 1.6 µm) to remove cell aggregates and larger eukaryotic phytoplankton cells. Sea water from the HOT station (22.4o N, 158.0o W) was collected on 17 March 1998. For preparation of environmental cDNA, sea water was collected from 0 m and 20 m on 26 April 2000 at station 4B off Moss Landing, California (36.77° N, 122.02° W) aboard the RV Western Flyer, and picoplankton were collected onto Sterivex Cartridges (Millipore) and stored as described previously21. After the initial treatment with proteinase K and SDS described by ref. 22, total RNA from 100 µl of the lysate from the Sterivex cartridges were further purified using the RNeasy tissue kit (Qiagen) and the protocol for lysed cells. To further remove DNA from the RNA preparations, the samples were treated by the DNAfree kit (Ambion) following the manufacturer's protocol. cDNAs were synthesized by reverse transcription using 2 µl of the RNA extracts using random hexamers as primers, using the TaqMan RT kit (Applied Biosystems) according to the manufacturer's protocol. Genomic DNA contamination of cDNA preparations was examined by 5′-nuclease assays comparing gene copies in cDNA preparations and in controls with no reverse transcriptase added. These control assays tested for the presence of both rRNA21 and protein-encoding genes (M. T. Suzuki et al., unpublished results). In both assays we observed no signal in controls without reverse transcriptase.

The annotated BAC insert sequences are available at the Monterey Bay Coastal Ocean Microbial Observatory site at

pufL and pufM amplification

Primers used in this study were pufL, forward (5′-CTKTTCGACTTCTGGGTSGG-3′)10; pufM, reverse (5′-CCATSGTCCAGCGCCAGAA-3′) (a modification of the primer reported by ref. 10); and pufM, forward (5′-TACGGSAACCTGTWCTAC-3′).

Phylogenetic analysis

pufM sequences from the current study combined with sequences from public databases were translated and the protein sequences aligned using the pileup program of the Wisconsin package (GCG). DNA sequences and protein alignments were imported into a database using ARB software ( We aligned DNA sequences based on the protein alignment. Evolutionary distances were calculated with the dnadist program of the PHYLIP package23, using the Kimura 2-parameter model. Phylogenetic trees were inferred using the neighbour program of the PHYLIP package. To evaluate the reliability of the branching patterns, 100 random bootstrap re-samplings were performed using the program seqboot, with subsequent phylogenetic analyses performed as above. Ribosomal RNA phylogenetic analyses were performed as described above, on alignments encompassing 860 nucleotide sequence positions. For analysis of the shorter pufM cDNA sequences obtained through RT-PCR, a neighbour-joining tree was imported into the ARB database, and the cDNAs were added to tree using the ARB_PARSIMONY program, without local optimization and using a mask that included only those positions encompassing the cDNA sequences.