Most species of picoplanktonic marine cyanobacteria currently known belong to two genera: Synechococcus and Prochlorococcus. Members must have the ability to acquire major nutrients and trace metals at the submicromolar concentrations found in the oligotrophic open seas. Their light-harvesting apparatus is uniquely adapted to the spectral quality of light in the ocean3,4. Of the two major marine unicellular genera, Synechococcus is usually less abundant in very oligotrophic environments, but has a broader global distribution1,4. A great deal of genetic diversity exists within the genus Synechococcus, with strains probably adapted to specific ecological niches3,5. Furthermore, one group of strains seems to have adapted to the oligotrophic marine environment by developing a new form of swimming motility not seen so far in any other prokaryotic group6.

On the basis of the genome of Synechococcus sp. strain WH8102 (hereafter referred to as WH8102) and comparing it to the related genomes of two Prochlorococcus strains2 we were able to define 1,314 open reading frames (ORFs) common to all three genomes (about half of the genome) and 736 ORFs found only in WH8102 (about a third of the genome) that indicate the specific ecological strategies of WH8102 relative to coexisting Prochlorococcus (see Methods, Fig. 1 and Supplementary Table 1). Here we show how the genome reveals some of the interactions of WH8102 with its environment (nutrients, light, toxins) and with other organisms, especially phages.

Figure 1: The genome of WH8102 can be divided into three categories: ORFs found in WH8102 and both related Prochlorococcus genomes (region 1); ORFs found in WH8102 and only one Prochlorococcus genome (region 2); and ORFs not found in the two Prochlorococcus genomes (region 3).
figure 1

These regions can be subdivided based on whether or not an ORF has a BLAST hit to freshwater Synechocystis sp. strain PCC6803. The lighter shading of each region represents the fraction not in PCC6803. As shown, examination of ORFs in each subcategory has provided insights into the evolutionary adaptation of marine Synechococcus.

The WH8102 genome contains 16 probable or possible phage integrases—enzymes that function as site-specific DNA recombinases7 (Supplementary Table 2). In WH8102, many of these occur adjacent to or near transfer RNAs and in regions with an anomalously low percentage of G + C content (Table 1a and Fig. 2). These regions of low G + C percentage also show atypical trinucleotide composition (data not shown). In addition, possible phage integrase regulators (SYNW2105, SYNW1660, SYNW1665) are also found. Thus, Synechococcus has regions that greatly resemble pathogenicity islands—regions that are often mobilized between strains of pathogenic bacteria7. Hence, although the genome of WH8102 does not contain prophages or plasmids, it does seem to have been, in its evolutionary past, extensively altered through horizontal gene transfer, possibly due to phages or plasmids. In contrast, far fewer potential phage integrases are found in Prochlorococcus (four in MIT9313 and one in MED4).

Table 1 Atypical regions of G + C per cent in the WH8102 genome
Figure 2: The chromosome of Synechococcus sp. strain WH8102.
figure 2

The following description is of the eight circles, with the first circle being the outer circle, and the others progressing inwards. First circle, predicted coding regions on the plus strand coloured by functional category: white, hypothetical; black, unassigned and other; dark red, energy metabolism; green, photosynthesis; blue, DNA replication and repair; cyan, fatty acid metabolism; magenta, biosynthesis of cofactors; yellow, cellular processes; pale green, transport and binding; sky blue, translation; orange, regulatory functions; brown, amino acid biosynthesis; pink, cell envelope; grey, conserved hypothetical; medium red, transcription; light red, purines and pyrimidines; pale pink, central metabolism. Second circle, predicted coding regions on minus strand (same colour scheme as in the first circle). Third and fourth circles, 736 ‘characteristic’ genes not found in two Prochlorococcus strains (region 3 in Fig. 1), plus and minus strands, respectively. Fifth and sixth circles, predicted phage integrases on plus and minus strands, respectively. Seventh circle, G + C content (deviation from average); eighth circle, G + C skew curve in purple and olive. Phage integrases are often associated with low G + C regions. Low G + C regions often contain WH8102 characteristic ORFs. Scale (in bp) is indicated along the outside of the circle.

Other regions of low G + C percentage not associated with phage integrases also show atypical trinucleotide composition, suggestive of recent horizontal gene transfer (Table 1b). These regions encode genes involved in broadening the range of nitrogen substrates that can be used by WH8102, as well as some encoding transport capabilities. Furthermore, several of the genes found in these regions are homologues of ORFs involved in the carbohydrate modification of the cell envelope (including glycosyltransferases, and homologues of genes involved in the synthesis of sialic acid). One hypothesis for the function of these glycosyltransferases is that they may be required in constructing the motility apparatus of this organism, as at least one of its components is glycosylated8. Another possibility is that WH8102 may use these envelope-modifying genes to change its cell surface characteristics to help it evade grazers and other predators such as phages. Cell surface properties are known to affect grazing rates in the marine environment9.

An examination of the genome of WH8102 provides an indication of the uniqueness of Synechococcus swimming motility. None of the proteins (motor, flagellar) associated with other forms of prokaryotic motility was found, with the exception of six ORFs associated with type IV-pilus-dependent motility (homologues of pilB, -C, -D, -Q and -T). Orthologues of these are also present in MIT9313, but not MED4. Nevertheless, these ORFs in WH8102 do not encode the full complement of genes required for pilus assembly and function, and pilin subunit homologues are absent. Pili or surface-associated twitching have not been observed in WH8102.

Recent studies using transposon mutagenesis (J.M. and B.B., manuscript in preparation) coupled with that of motility mutant swmA8 indicate that genes required for motility are found in at least two widely separated regions (Fig. 3). The second region contains SwmB (SYNW0953), a very large ORF (10,791 amino acids) that constitutes more than 1% of the genome size and is currently one of the longest bacterial ORFs ever reported. Notably, it is found in one of the unusually low G + C percentage regions (Table 1b).

Figure 3: Organization of two chromosomal regions in WH8102 that contain motility genes.
figure 3

a, The region containing swmA and two other ORFs (a predicted glycosyltransferase and a sulphotransferase) appears to have been inserted in the ribH/tRNA-Gly region, or conversely, deleted in the two Prochlorococcus genomes. b, swmB and several other ORFs have been inserted between the rsuA and malQ regions. The double lines in swmB indicate that it is not drawn to scale, as it is approximately 20 times larger than malQ. The ORFs are colour-coded by predicted function as described in Fig. 2.

Transport in WH8102 accounts for about 5–6% of the predicted ORFs, similar to most other bacterial genomes10. Compared with other genomes10, transporter capability is heavily biased towards the use of ABC transporters with about 60% of the ORFs encoding ABC transporter components. A distinct bias against P-type ATPase transporters is found, with only one such transporter, for copper, compared with nine in PCC6803, a model unicellular freshwater cyanobacterium. This one P-type ATPase may be conserved due to the use of copper in plastocyanin, an electron transfer protein that can substitute for an iron-containing cytochrome in photosynthesis.

Notably, WH8102 has multiple channels for transporting major seawater ions and multiple transporters or ABC-type solute-binding proteins for several major nutrients; for example, there are multiple solute-binding proteins for phosphate and two for urea. WH8102 seems to have an independent transporter for urea (SYNW2455), reinforcing its importance as a nitrogen source for cyanobacterial growth in oligotrophic environments. The multiple transporters in WH8102 may have different affinities and be regulated differently depending on nutrient concentrations.

One of the surprises from our analyses of the genome of WH8102 is the prediction that Synechococcus can use some new organic compounds as nitrogen and phosphorus sources. As inorganic nitrogen and phosphorus are often thought to be limiting in the marine environment, these potential sources are of particular interest. Amino acid and oligopeptide transporters are found, suggesting that Synechococcus may have the ability to use these ubiquitous compounds in sea water; transport of a few amino acids has also been demonstrated11. In addition, genes for the transport of cyanate and its breakdown by cyanase appear to be present in WH8102 and in Prochlorococcus MED4—in fact, WH8102 grows on cyanate as a sole nitrogen source (B.P., unpublished data). The genes for cyanate use have been characterized for the freshwater cyanobacterium Synechococcus elongatus PCC7942 but cyanate remains uncharacterized as a nitrogen source in aquatic environments12.

Similarly, genes for the transport of phosphonates (compounds with C–P bonds) are present in WH8102 and are found in the two sequenced Prochlorococcus genomes as well. WH8102 grows on phosphonate as a sole phosphorous source (B.P., unpublished data). Phosphonates are known to be produced by some major eukaryotic phytoplankton groups such as the coccolithophorids, and were recently reported to be an important fraction of total phosphate in sea water13. WH8102 also has multiple alkaline phosphatases (SYNW0120, SYNW0196, SYNW2391 and SYNW2390) that could be used to obtain phosphate from other organic phosphorus sources in its environment. Thus genome analyses are further dispelling the classical concept of cyanobacteria as being plant-like and dependent solely on inorganic forms of nutrients14.

In addition, a number of conserved systems for exporting compounds (for example, multidrug efflux systems) are found both in the ABC transporter family and the MFS transporter family. WH8102 has a larger number of efflux transporters in the ABC family compared with Prochlorococcus. These results suggest that marine cyanobacteria, despite living in extremely oligotrophic conditions, may still find themselves in the position of needing to export ‘toxins’ produced by other microorganisms. Antagonistic interactions between pelagic bacteria have been reported recently15. Exposure to toxins may be greater for motile Synechococcus than for other marine cyanobacteria, as they may be chemotactic towards marine particles where higher localized concentrations of heterotrophic bacteria release nitrogenous coumpounds11.

WH8102 also has efflux pumps for metals (SYNW1472 and SYNW0900) that are lacking in both Prochlorococcus strains. Characterizing these further may put a mechanistic basis behind previous observations16 that Synechococcus seems to be more resistant to copper compared with Prochlorococcus, and that this resistance may help explain the seasonal cycles of these organisms in the Sargasso Sea. WH8102 also has predicted genes for the reduction of arsenate to arsenite (SYNW1767) and its efflux (SYNW1039). It has been hypothesized that arsenate is a competitor for phosphate and that systems would be needed to deal with this compound, especially in low-phosphate waters17.

WH8102 has more capacity for sodium-driven transport than freshwater cyanobacteria such as PCC6803 (see Methods and Fig. 1), with transporters of the alanine/glycine:cation (sodium) symporter family (SYNW0828) and of the neurotransmitter:sodium symporter family (SYNW0699). It also has two transporters from the solute:sodium symporter family (SYNW2455, SYNW0619) compared with one in PCC6803.

In contrast to freshwater cyanobacteria, WH8102 has two potential transporters (SYNW1915, SYNW1916 and SYNW1917, and SYNW0229) for glycine betaine and related compounds found in marine waters. Adjacent to the ABC transporter but on the opposite strand are enzymes predicted to synthesize glycine betaine from glycine (SYNW1914, SYNW1913) using a pathway only reported before from an extremely halophilic proteobacterium18. When a freshwater Synechococcus, strain PCC7942, was genetically engineered to make glycine betaine, it became more halotolerant19.

Despite the importance of iron as a limiting nutrient in the oceans, WH8102 does not have a detectable system for siderophore synthesis and uptake. However, it does have strategies for iron conservation such as using plastocyanin (copper) for electron transport and a cobalt-dependent ribonucleotide reductase (SYNW1692) rather than the Fe-containing one found in many other cyanobacteria. Another example of iron conservation in WH8102 is its predicted nickel superoxide dismutase (SOD). Multiple SOD types exist for removing photosynthetically produced superoxide radicals including ones using iron, manganese or copper-zinc as metal cofactors. Unlike the freshwater PCC6803, the marine cyanobacteria WH8102, both sequenced Prochlorococcus species and Trichodesmium, a marine N2-fixing cyanobacterium (, are predicted to use a new nickel SOD—seen recently in Streptomyces—as their only SOD, thus saving iron and manganese for other uses (see Supplementary Fig. 1).

In comparison with the sequenced Prochlorococcus strains, WH8102 is a transport generalist. It has predicted transporters for the efflux of chromate (SYNW1323) and arsenite (SYNW1039) that are found in MED4 but not MIT9313. It shares with MIT9313, but not MED4, the ability to use the sodium symporters mentioned above. In addition, WH8102 has transporters that are not found in Prochlorococcus, and that are predicted to be involved in the uptake of nitrate, a quaternary ammonium group (R - N + (CH3)3) compound such as sarcosine, another nitrate-like compound, metals (magnesium/cobalt/nickel), and in cation efflux. This may be a characteristic of marine Synechococcus in general or it may be a characteristic of motile Synechococcus.

In WH8102 the major components of photosynthesis and respiration are well conserved and are usually most closely related to those of other cyanobacteria. Notable exceptions are genes in WH8102 and Prochlorococcus implicated in carboxysome structure and assembly, including those encoding the subunits of ribulose-1,5-bisphosphate carboxylase. This is thought to be due to a horizontal gene transfer event20.

As in most cyanobacteria (other than marine Prochlorococcus and two other known prochlorophytes), Synechococcus harvests light using phycobilisomes, which are multisubunit complexes binding different types of phycobilins. Two interesting observations differentiate the use of phycobilisomes by WH8102 from those of other cyanobacteria analysed so far. Nowhere in the WH8102 genome are there homologues of the cpcC and cpcD genes, which in freshwater cyanobacteria are known to encode two types of phycocyanin-associated LR linker polypeptides. These linkers are necessary for the correct assembly of phycocyanin discs in the phycobilisome rods21. Their absence in WH8102 suggests that there is a single disc of phycocyanin, as is the case in mutants of Synechococcus PCC7002 in which the cpcC gene has been inactivated21. The genome thus provides a basis for the interpretation of absorbance spectra, where reduced phycocyanin (orange-light absorbing) relative to phycoerythrin (blue-light absorbing) probably represents an adaptation to the oligotrophic marine environment where blue light is particularly important3. In addition, the genome of WH8102 also lacks homologues of nblA and nblB, two genes implicated in the degradation of phycobilisomes during nutrient stress in cyanobacteria22,23. Thus, phycobilisome degradation may not occur or may be under the control of other genes in WH8102.

Whereas Synechococcus possesses homologues of the low-affinity bicarbonate transport mechanism in PCC6803, it lacks homologues of ndhD3, ndhF3 and chpY, genes implicated in high-affinity transport in the same organism20. Their absence might be an adaptation to the marine habitat, where bicarbonate (2 mM) is probably rarely limiting. Notably, WH8102 has two predicted carbonic anhydrases (SYNW0897 and SYNW2467) whereas Prochlorococcus has none, although these genes can be highly divergent and difficult to predict. SYNW2467 is adjacent to the genes encoding nitrate reductase. Synechococcus WH8102 can use nitrate for growth in contrast to the two Prochlorococcus strains2, and this carbonic anhydrase may have been lost with the loss of nitrate usage. Although intriguing, a specific connection between nitrate usage and carbonic anhydrase has not been shown.

One way that bacteria sense and respond to their environment is by using two-component regulatory systems consisting of a sensor kinase and a response regulator. In PCC6803 there are nearly 40 sensor kinase and response regulator pairs ( In contrast, WH8102 has only five sensor histidine kinases and nine response regulators, of which one, SYNW1598, may be a pseudogene as it is missing conserved functional residues24. Even accounting for a smaller genome size, WH8102 as well as the two Prochlorococcus species have fewer systems for responding to environmental changes using these gene families. Furthermore, as there are fewer sensors than response regulators, there seems to be an economy of regulation in which some sensors may transmit information to more than one response regulator.

In addition to the principal RNA polymerase sigma factor sigA (SYNW1783), WH8102 encodes five type II sigma factors, typical of cyanobacteria in general25. WH8102 however has only one homologue of the type III sigma factor (SYNW1232). This is a low number compared with the three to five seen in other sequenced cyanobacteria (PCC6803, PCC7120 and Thermosynechococcus; One hypothesis for the minimal regulatory machinery (two-component systems and sigma factors) in Synechococcus and Prochlorococcus is that they have evolved in an open ocean environment that is relatively constant, thus they do not need a regulatory system that could modulate their gene expression to a more variable environment. Alternatively, a minimal regulatory system could be the result of an ecological strategy of only some marine cyanobacteria.

On the basis of its genome, Synechococcus WH8102 is clearly more nutritionally versatile and a ‘generalist’ compared with its Prochlorococcus relatives. As the genus Prochlorococcus seems to have evolved only once, it may have gone through an evolutionary ‘bottleneck’ in which its capabilities were originally limited to those of a particular strain followed by subsequent acquisition of new abilities. Alternatively, Synechococcus may be more subject than Prochlorococcus to horizontal gene transfer from phages, as seen by the presence of more phage integrases. It is possible that not all Synechococcus are more versatile in their transport abilities, just the strains that are motile. Partial or complete genomes of additional marine cyanobacteria from this group will help answer these questions.


Genome sequencing

Genomic DNA was isolated from WH8102 as reported previously26. Whole-genome shotgun libraries were obtained by fragmenting genomic DNA using mechanical shearing and cloning 2–3-kilobase fragments into pUC18. Double-ended plasmid sequencing reactions were carried out using PE BigDye Terminator chemistry (Perkin Elmer) and sequencing ladders were resolved on PE 377 Automated DNA Sequencers (Perkin Elmer). As the first genome drafted during the start-up of the microbial sequencing effort at the J.G.I. Production Sequencing Facility in Walnut Creek, California, this genome was sequenced to unusually high coverage. The whole-genome sequence of WH8102 was obtained from 66,550 reads with an average read length for this project of >575 base pairs (bp) per read for 16-fold redundancy. Sequence assembly was accomplished using PHRAP (P. Green). All gaps were closed by primer walking on gap-spanning library clones or PCR products. The overall genome structure was verified by long-range genomic PCR reactions. The two tandem repeats were resolved by combining information from individual clones, single-nucleotide polymorphism analysis and PCR. Only after this region was finished was it discovered that a single, long ORF was preserved.

Genome analysis

For genome analyses, the combination of three gene modelling programs—Critica27, Glimmer28 and Generation (—was used in the determination of potential coding sequences. These assignments were further checked manually. A revised gene/protein set was searched against the KEGG GENES, Pfam, PROSITE, PRINTS, ProDom and COGs databases, in addition to BLASTP against the NCBI non-redundant database. From these results, categorizations were developed using the KEGG and COGs hierarchies. Transfer RNAs were identified using tRNAscan-SE29. To identify regions of atypical nucleotide composition, the trinucleotide composition was determined.

Manual annotation of ORFs was carried out using Artemis, the Artemis Comparison Tool ( and Clustal W30. The results of the KEGG and other comparisons described above were examined manually to check automated product assignments and make additional assignments. The proteome sequences of WH8102 and Prochlorococcus (MED4 and MIT9313)2 were compared using the Artemis Comparison Tool. This program, in conjunction with Clustal W, was used for refining predicted start sites, adding ORFs not predicted by the gene modelling programs, and obtaining consistent annotation across three genomes. Manual annotation was done in conjunction with the Prochlorococcus annotation team. Transporters were analysed and annotated using methods described in ref. 10.

Pairwise BLAST analyses of three marine cyanobacterial genomes (WH8102 and Prochlorococcus marinus strains MED4 and MIT9313 (ref. 2)) against each other and a cut-off e-value of e-6, followed by additional manual curation including examination of gene context, were used to partition the genome of WH8102 into three categories: 1,314 ORFs found in all three genomes and predicted to be orthologues; 476 predicted orthologous ORFs found in WH8102 and one other Prochlorococcus genome; and 736 ORFs characteristic of WH8102 (not found in either of the other Prochlorococcus genomes). The latter category partially represents the ecological capabilities of this organism compared with Prochlorococcus (Supplementary Table 3).

Using pairwise BLAST analyses, the three categories of WH8102 ORFs were further subdivided based on whether or not an ORF was found in a model freshwater cyanobacterium Synechocystis PCC6803 (hereafter termed as PCC6803; see After examination of different cut-offs, BLAST analyses with a cut-off e-value of e-10 were used for this assignment. For the ‘core’ marine cyanobacterial genome of 1,314 ORFs, 1,112 (85%) are also found in PCC6803 (Fig. 1). This provides an estimate of the portion of the WH8102 genome that has been conserved in all cyanobacterial genomes so far from a primal cyanobacterial ancestor (and includes ORFs conserved in all bacterial taxa). This portion is different from a minimal bacterial genome or a minimal cyanobacterial genome, as horizontally acquired genes could carry out functions required for cell viability. The 15% of the marine cyanobacterial core not in PCC6803 was found to include some of the adaptations and evolutionary events that distinguish the marine Synechococcus/Prochlorococcus cyanobacterial lineage from other cyanobacterial lineages.

Of the 736 WH8102 characteristic ORFs not found in the Prochlorococcus genomes, 23% have related ORFs in PCC6803, using a BLAST cut-off e-value of e-10. This is not surprising as these partly represent the ability of both PCC6803 and WH8102, but not the Prochlorococcus strains, to create a functional phycobilisome for harvesting light, and a functional nitrate reductase with molybdenum cofactors for using nitrate as a nitrogen source. Forty-five per cent of these characteristic ORFs are hypothetical.

The name of author E. Allen is corrected from E. A. Allen to E. E. Allen, as in the print version.