The marine unicellular cyanobacterium Prochlorococcus is the smallest-known oxygen-evolving autotroph1. It numerically dominates the phytoplankton in the tropical and subtropical oceans2,3, and is responsible for a significant fraction of global photosynthesis. Here we compare the genomes of two Prochlorococcus strains that span the largest evolutionary distance within the Prochlorococcus lineage4 and that have different minimum, maximum and optimal light intensities for growth5. The high-light-adapted ecotype has the smallest genome (1,657,990 base pairs, 1,716 genes) of any known oxygenic phototroph, whereas the genome of its low-light-adapted counterpart is significantly larger, at 2,410,873 base pairs (2,275 genes). The comparative architectures of these two strains reveal dynamic genomes that are constantly changing in response to myriad selection pressures. Although the two strains have 1,350 genes in common, a significant number are not shared, and these have been differentially retained from the common ancestor, or acquired through duplication or lateral transfer. Some of these genes have obvious roles in determining the relative fitness of the ecotypes in response to key environmental variables, and hence in regulating their distribution and abundance in the oceans.
As an oxyphototroph, Prochlorococcus requires only light, CO2 and inorganic nutrients, thus the opportunities for extensive niche differentiation are not immediately obvious—particularly in view of the high mixing potential in the marine environment (Fig. 1a). Yet co-occurring Prochlorococcus cells that differ in their ribosomal DNA sequence by less than 3% have different optimal light intensities for growth6, pigment contents7, light-harvesting efficiencies5, sensitivities to trace metals8, nitrogen usage abilities9 and cyanophage specificities10 (Fig. 1b,c). These ‘ecotypes’—distinct genetic lineages with ecologically relevant physiological differences—would be lumped together as a single species on the basis of their rDNA similarity11, yet they have markedly different distributions within a stratified oceanic water column, with high-light-adapted ecotypes most abundant in surface waters, and their low-light-adapted counterparts dominating deeper waters12 (Fig. 1a). The detailed comparison between the genomes of two Prochlorococcus ecotypes we report here reveals many of the genetic foundations for the observed differences in their physiologies and vertical niche partitioning, and together with the genome of their close relative Synechococcus13, helps to elucidate the key factors that regulate species diversity, and the resulting biogeochemical cycles, in today's oceans.
The genome of Prochlorococcus MED4, a high-light-adapted strain, is 1,657,990 base pairs (bp). This is the smallest of any oxygenic phototroph—significantly smaller than that of the low-light-adapted strain MIT9313 (2,410,873 bp; Table 1). The genomes of MED4 and MIT9313 consist of a single circular chromosome (Supplementary Fig. 1), and encode 1,716 and 2,275 genes respectively, roughly 65% of which can be assigned a functional category (Supplementary Fig. 2). Both genomes have undergone numerous large and small-scale rearrangements but they retain conservation of local gene order (Fig. 2). Break points between the orthologous gene clusters are commonly flanked by transfer RNAs, suggesting that these genes serve as loci for rearrangements caused by internal homologous recombination or phage integration events.
The strains have 1,352 genes in common, all but 38 of which are also shared with Synechococcus WH8102 (ref. 13). Many of the 38 ‘Prochlorococcus -specific’ genes encode proteins involved in the atypical light-harvesting complex of Prochlorococcus, which contains divinyl chlorophylls a and b rather than the phycobilisomes that characterize most cyanobacteria. They include genes encoding the chlorophyll a/b-binding proteins (pcb)14, a putative chlorophyll a oxygenase, which could synthesize (divinyl) chlorophyll b from (divinyl) chlorophyll a15, and a lycopene epsilon cyclase involved in the synthesis of alpha carotene16. This remarkably low number of ‘genera defining’ genes illustrates how differences in a few gene families can translate into significant niche differentiation among closely related microbes.
MED4 has 364 genes without an orthologue in MIT9313, whereas MIT9313 has 923 that are not present in MED4. These strain-specific genes, which are dispersed throughout the chromosome (Fig. 2), clearly hold clues about the relative fitness of the two strains under different environmental conditions. Almost half of the 923 MIT9313-specific genes are in fact present in Synechococcus WH8102, suggesting that they have been lost from MED4 in the course of genome reduction. Lateral transfer events, perhaps mediated by phage10, may also be a source of some of the strain-specific genes (Supplementary Figs 3–6).
Gene loss has played a major role in defining the Prochlorococcus photosynthetic apparatus. MED4 and MIT9313 are missing many of the genes encoding phycobilisome structural proteins and enzymes involved in phycobilin biosynthesis15. Although some of these genes remain, and are functional17, others seem to be evolving rapidly within the Prochlorococcus lineage18. Selective genome reduction can also be seen in the photosynthetic reaction centre of Prochlorococcus. Light acclimation in cyanobacteria often involves differential expression of multiple, but distinct, copies of genes encoding photosystem II D1 and D2 reaction centre proteins (psbA and psbD respectively)19. However, MED4 has a single psbA gene, MIT9313 has two that encode identical photosystem II D1 polypeptides, and both possess only one psbD gene, suggesting a diminished ability to photoacclimate. MED4 has also lost the gene encoding cytochrome c550 (psbV), which has a crucial role in the oxygen-evolving complex in Synechocystis PCC6803 (ref. 20).
There are several differences between the genomes that help account for the different light optima of the two strains. For example, the smaller MED4 genome has more than twice as many genes (22 compared with 9) encoding putative high-light-inducible proteins, which seem to have arisen at least in part through duplication events15. MED4 also possesses a photolyase gene that has been lost in MIT9313, probably because there is little selective pressure to retain ultraviolet damage repair in low light habitats. Regarding differences in light-harvesting efficiencies, it is noteworthy that MED4 contains only a single gene encoding the chlorophyll a/b-binding antenna protein Pcb, whereas MIT9313 possesses two copies. The second type has been found exclusively in low-light-adapted strains21, and may form an antenna capable of binding more chlorophyll pigments.
Both strains have a low proportion of genes involved in regulatory functions. Compared with the freshwater cyanobacterium Thermosynechococcus elongatus (genome size <2.6 megabases)22, MIT9313 has fewer sigma factors, transcriptional regulators and two-component sensor-kinase systems, and MED4 is even more reduced (Supplementary Table 1). The circadian clock genes provide an example of this reduction as both genomes lack several components (pex, kaiA) found in the model Synechococcus PCC7942 (ref. 23). However, genes for the core clock proteins (kaiB, kaiC) remain in both genomes, and Prochlorococcus cell division is tightly synchronized to the diel light/dark cycle24. Thus, loss of some circadian components may imply an alternative signalling pathway for circadian control.
Gene loss may also have a role in the lower percentage of G + C content of MED4 (30.8%) compared with that of MIT9313 (50.74%), which is more typical of marine Synechococcus. MED4 lacks genes for several DNA repair pathways including recombinational repair (recJ, recQ) and damage reversal (mutT). Particularly, the loss of the base excision repair gene mutY, which removes adenosines incorrectly paired with oxidatively damaged guanine residues, may imply an increased rate of G•C to T•A transversions25. The tRNA complement of MED4 is largely identical to MIT9313 and is not optimized for a low percentage G + C genome, suggesting that it is not evolving as fast as codon usage.
Analysis of the nitrogen acquisition capabilities of the two strains points to a sequential decay in the capacity to use nitrate and nitrite during the evolution of the Prochlorococcus lineage (Fig. 3a). In Synechococcus WH8102—representing the presumed ancestral state—many nitrogen acquisition and assimilation genes are grouped together (Fig. 3a). MIT9313 has lost a 25-gene cluster, which includes genes encoding the nitrate/nitrite transporter and nitrate reductase. The nitrite reductase gene has been retained in MIT9313, but it is flanked by a proteobacterial-like nitrite transporter rather than a typical cyanobacterial nitrate/nitrite permease (Supplementary Fig. 4), suggesting acquisition by lateral gene transfer. An additional deletion event occurred in MED4, in which the nitrite reductase gene was also lost (Fig. 3a). As a result of these serial deletion events MIT9313 cannot use nitrate, and MED4 cannot use nitrate or nitrite9. Thus each Prochlorococcus ecotype uses the N species that is most prevalent at the light levels to which they are best adapted: ammonium in the surface waters and nitrite at depth (Fig. 1a). Synechococcus, which is the only one of the three that has nitrate reductase, is able to bloom when nitrate is upwelled (Fig. 1a), as occurs in the spring in the North Atlantic3 and the north Red Sea26.
The two Prochlorococcus strains are also less versatile in their organic N usage capabilities than Synechococcus WH8102 (ref. 13). MED4 contains the genes necessary for usage of urea, cyanate and oligopeptides, but no monomeric amino acid transporters have been identified. In contrast, MIT9313 contains transporters for urea, amino acids and oligopeptides but lacks the genes necessary for cyanate usage (cyanate transporter and cyanate lyase) (Fig. 3a). As expected, both genomes contain the high-affinity ammonium transporter amt1 and both lack the nitrogenase genes essential for nitrogen fixation. Finally, both contain the nitrogen transcriptional regulator encoded by ntcA and there are numerous genes in both genomes, including ntcA, amt1, the urea transport and GS/GOGAT genes (glutamine synthetase and glutamate synthase, both involved in ammonia assimilation), with an upstream NtcA-binding-site consensus sequence.
The genomes also have differences in genes involved in phosphorus usage that have obvious ecological implications. MED4, but not MIT9313, is capable of growth on organic P sources (L. R. Moore and S.W.C., unpublished data), and organic P can be the prevalent form of P in high-light surface waters27. This difference may be due to the acquisition of an alkaline phosphatase-like gene in MED4 (Supplementary Fig. 5). Both genomes contain the high-affinity phosphate transport system encoded by pstS and pstABC28, but MIT9313 contains an additional copy of the phosphate-binding component pstS, perhaps reflecting an increased reliance on orthophosphate in deeper waters. MED4 contains several P-related regulatory genes including the phoB, phoR two-component system and the transcriptional activator ptrA. In MIT9313, however, phoR is interrupted by two frameshifts and ptrA is further degenerated, suggesting that this strain has lost the ability to regulate gene expression in response to changing P levels.
Both Prochlorococcus strains have iron-related genes that are missing in Synechococcus WH8102, which may explain its dominance in the iron-limited equatorial Pacific2. These genes include flavodoxin (isiB), an Fe-free electron transfer protein capable of replacing ferredoxin, and ferritin (located with the ATPase component of an iron ABC transporter), an iron-binding molecule implicated in iron storage. Additional characteristics of the iron acquisition system in these genomes include: an Fe-induced transcriptional regulator (Fur) that represses iron uptake genes; numerous genes with an upstream putative fur box motif that are candidates for a high-affinity iron scavenging system; and absence of genes involved in Fe–siderophore complexes.
Prochlorococcus does not use typical cyanobacterial genes for inorganic carbon concentration or fixation. Both genomes contain a sodium/bicarbonate symporter but lack homologues to known families of carbonic anhydrases, suggesting that an as yet unidentified gene is fulfilling this function. One of the two carbonic anhydrases in Synechococcus WH8102 was lost in the deletion event that led to the loss of the nitrate reductase (Fig. 3a); the other is located next to a tRNA and seems to have been lost during a genome rearrangement event. Similar to other Prochlorococcus and marine Synechococcus, MED4 and MIT9313 possess a form IA ribulose-1,5-bisphosphate carboxylase/oxygenase, rather than the typical cyanobacterial form IB. The ribulose-1,5-bisphosphate carboxylase/oxygenase genes are adjacent to genes encoding structural carboxysome shell proteins and all have phylogenetic affinity to genes in the γ-proteobacterium Acidithiobacillus ferroxidans15, suggesting lateral transfer of the extended operon.
Prochlorococcus has been identified in deep suboxic zones where it is unlikely that they can sustain themselves by photosynthesis alone29, thus we looked for genomic evidence of heterotrophic capability. Indeed, the presence of oligopeptide transporters in both genomes, and the larger proportion of transporters (including some sugar transporters) in the MIT9313 strain-specific genes (Supplementary Fig. 2), suggests the potential for partial heterotrophy. However, neither genome contains known pathways that would allow for complete heterotrophy. They are both missing genes for steps in the tricarboxylic acid cycle, including 2-oxoglutarate dehydrogenase, succinyl-CoA synthetase and succinyl-CoA-acetoacetate-CoA transferase.
Cell surface chemistry has a major role in phage recognition and grazing by protists and thus is probably under intense selective pressure in nature. The two Prochlorococcus genomes and the Synechococcus WH8102 genome show evidence of extensive lateral gene transfer and deletion events of genes involved in lipopolysaccharide and/or surface polysaccharide biosynthesis, reinforcing the role of predation pressures in the creation and maintenance of microdiversity. For example, MIT9313 has a 41.8-kilobase (kb) cluster of surface polysaccharide genes (Fig. 3b), which has a lower percentage G + C composition (42%) than the genome as a whole, implicating acquisition by lateral gene transfer. MED4 has acquired a 74.5-kb cluster consisting of 67 potential surface polysaccharide genes (Supplementary Fig. 6a) and has lost another cluster of surface polysaccharide biosynthesis genes shared between MIT9313 and Synechococcus WH8102 (Supplementary Fig. 6b).
The approach we have taken in describing these genomes highlights the known drivers of niche partitioning of these closely related organisms (Fig. 1). Detailed comparisons with the genomes of additional strains, such as Prochlorococcus SS120 (ref. 30), will enrich this story, and the analysis of whole genomes from in situ populations will be necessary to understand the full expanse of genomic diversity in this group. The genes of unknown function in all of these genomes hold important clues for undiscovered niche dimensions in the marine pelagic zone. As we unveil their function we will undoubtedly learn that the suite of selective pressures that shape these communities is much larger than we have imagined. Finally, it may be useful to view Prochlorococcus and Synechococcus as important ‘minimal life units’, as the information in their roughly 2,000 genes is sufficient to create globally abundant biomass from solar energy and inorganic compounds.
Genome sequencing and assembly
DNA was isolated from the clonal, axenic strain MED4 and the clonal strain MIT9313 essentially as described previously4. The two whole-genome shotgun libraries were obtained by fragmenting genomic DNA using mechanical shearing and cloning 2–3-kb fragments into pUC18. Double-ended plasmid sequencing reactions were carried out using PE BigDye Terminator chemistry (Perkin Elmer) and sequencing ladders were resolved on PE 377 Automated DNA Sequencers (Perkin Elmer). The whole-genome sequence of Prochlorococcus MED4 was obtained from 27,065 end sequences (7.3-fold redundancy), whereas Prochlorococcus MIT9313 was sequenced to ×6.2 coverage (33,383 end sequences). For Prochlorococcus MIT9313, supplemental sequencing (× 0.05 sequence coverage) of a pFos1 fosmid library was used as a scaffold. Sequence assembly was accomplished using PHRAP (P. Green). All gaps were closed by primer walking on gap-spanning library clones or PCR products. The final assembly of Prochlorococcus MED4 was verified by long-range genomic PCR reactions, whereas the assembly of Prochlorococcus MIT9313 was confirmed by comparison to the fosmid clones, which were fingerprinted with EcoRI. No plasmids were detected in the course of genome sequencing, and insertion sequences, repeated elements, transposons and prophages are notably absent from both genomes. The likely origin of replication in each genome was identified based on G + C skew, and base pair 1 was designated adjacent to the dnaN gene.
The combination of three gene-modelling programs, Critica, Glimmer and Generation, were used in the determination of potential open reading frames and were checked manually. A revised gene/protein set was searched against the KEGG GENES, Pfam, PROSITE, PRINTS, ProDom, COGs and CyanoBase databases, in addition to BLASTP against the non-redundant peptide sequence database from GenBank. From these results, categorizations were developed using the KEGG and COGs hierarchies, as modified in CyanoBase. Manual annotation of open reading frames was done in conjunction with the Synechococcus team. The three-way genome comparison was used to refine predicted start sites, add additional open reading frames and standardize the annotation across the three genomes.
The comparative genome architecture of MED4 and MIT9313 was visualized using the Artemis Comparison Tool (http://www.sanger.ac.uk/Software/ACT/). Orthologues were determined by aligning the predicted coding sequences of each gene with the coding sequences of the other genome using BLASTP. Genes were considered orthologues if each was the best hit of the other one and both e-values were less than e-10. In addition, bidirectional best hits with e-values less than e-6 and small proteins of conserved function were manually examined and added to the orthologue lists.
Phylogenetic analyses used PAUP*, logdet distances and minimum evolution as the objective function. The degree of support at each node was evaluated using 1,000 bootstrap resamplings. Ribosomal DNA analyses used 1,160 positions. The Gram-positive bacterium Arthrobacter globiformis was used to root the tree.
This research was funded by the Biological and Environmental Research Program of the US Department of Energy's Office of Science. The Joint Genome Institute managed the overall sequencing effort. Genome finishing was carried out under the auspices of the US Department of Energy by the University of California, Lawrence Livermore National Laboratory. Computational annotation was carried out at the Oak Ridge National Laboratory, managed by UT-BATTELLE for the US Department of Energy. Additional support was provided by the DOE, NSF and the Seaver Foundation to S.W.C., the Israel–US Binational Science Foundation to A.F.P. and S.W.C., and FP5-Margenes to W.R.H. and A.F.P. We thank the Synechococcus WH8102 annotators (B. Palenik, B. Brahamsha, J. McCarren, E. Allen, F. Partensky, A. Dufresne and I. Paulsen) for their help with curating the Prochlorococcus genomes and E. V. Armbrust and L. Moore for critical reading of the manuscript.
About this article
Nature Reviews Microbiology (2005)