Introduction

The unicellular cyanobacterium Prochlorococcus has a profound impact on marine carbon cycling because of its high abundance in tropical and subtropical oceans (Partensky et al., 1999). To better understand its ecological role, an integrative approach has been taken to investigate Prochlorococcus in terms of its genome content (Dufresne et al., 2003; Rocap et al., 2003; Kettler et al., 2007), gene expression (Martiny et al., 2006; Tolonen et al., 2006; Zinser et al., 2009; Thompson et al., 2011a), cell physiology (Moore and Chisholm, 1999; Moore et al., 2002; Moore et al., 2005), and geographic distribution (Bouman et al., 2006; Johnson et al., 2006; Zwirglmaier et al., 2008; Malmstrom et al., 2010). The emerging picture is of a diverse clade composed of phylogenetic sub-groups with distinct differences in light physiology (for example, high-light (HL) and low-light (LL)-adapted clades) and temperature optima (Johnson et al., 2006; Zinser et al., 2007). These connections between phylogeny and phenotype were uncovered through observations of natural communities as well as experimental manipulations of cultured isolates. Indeed, the ability to study Prochlorococcus in both the lab and the field has led to this organism becoming a model for exploring the links between diversity and ecological function in marine systems.

Analyses of environmental DNA sequences have recently revealed new clades of HL-adapted Prochlorococcus that currently lack cultured representatives (Rusch et al., 2010; West et al., 2011; Huang et al., 2012). More specifically, Rusch et al. (2010) identified two groups through phylogenetic analysis of several ‘core’ functional genes, that is, those common to all sequenced Prochlorococcus genomes, from the Global Ocean Survey (GOS) metagenomic data set, whereas West et al. (2011) recovered novel 16S rRNA sequences in a separate sampling expedition. In both cases, these new clades were abundant in high nutrient low chlorophyll (HNLC) regions of the ocean where low iron concentrations limit primary production (Martin et al., 1994). It remains unclear, however, if the same clades were identified in both studies since connections cannot be drawn between the core functional genes from one study and the 16S rRNA sequences from the other. More importantly, our understanding of how these HNLC clades might differ genetically from cultured clades of Prochlorococcus, or which factors control their geographic distribution, remains incomplete.

Without cultivated representatives, the ecological roles of HNLC clades can only be uncovered through examination of natural assemblages. Metagenomics is a powerful, culture-independent approach for exploring the genetic diversity of marine microbial assemblages (Venter et al., 2004; DeLong et al., 2006), and has already provided new insights into the nitrogen and phosphorous metabolism of wild Prochlorococcus populations (Martiny et al., 2009a, 2009b; Coleman and Chisholm, 2010). In fact, assembly of community DNA sequences into composite genomes has provided the first look into the genetic composition of HNLC clades (Rusch et al., 2010). However, these assemblies were composed entirely of genes previously observed in a variety of cultured Prochlorococcus strains, and no new genes were identified. This is surprising considering genomes of closely related Prochlorococcus isolates typically have hundreds of unique genes (Kettler et al., 2007). It is possible, even likely, that the new clades harbor genes not previously seen in cultured strains, but the inherent ambiguities of metagenomic assembly make it difficult to confidently assign them to HNLC clades.

Single-cell genomics facilitates genomic analysis of uncultured microbes while maintaining unequivocal links between gene sequences and their sources (Stepanauskas and Sieracki, 2007; Woyke et al., 2009), and thus is highly complementary to metagenomics. In this approach, individual cells are separated, often by fluorescence activated cell sorting, and their genomes amplified using a multiple displacement amplification (MDA) strategy that relies on Phi29 polymerase and random hexamers (Raghunathan et al., 2005; Ishoey et al., 2008; Rodrigue et al., 2009). These MDA reactions produce enough DNA for genome sequencing, thus enabling genomic analysis of specific microbes without the need for cultivation. Prochlorococcus is ideally suited for examination with both metagenomics and single-cell genomics as it is well represented in metagenomic surveys of marine communities (DeLong et al., 2006; Rusch et al., 2007), has 12 reference genomes from cultured strains that span the diversity of cultured clades to serve as templates for recruiting metagenomic reads (Kettler et al., 2007), and finally, it is easily identified and sorted based on its unique autofluoresence signature (Olson et al., 1990).

Here, we use a combination of single-cell genomics, metagenomics and environmental distribution data to explore the ecology of Prochlorococcus HNLC clades. More specifically, we examine natural communities from the Pacific Ocean along a transect from Fiji to Hawaii, and select individual cells from these uncultivated clades for whole-genome amplification and sequencing. Abundance levels of these clades are also determined by qPCR in both the Pacific and Atlantic oceans, revealing highly restricted geographic distribution patterns. Analysis of single-cell genome content and metagenomic composition provide insights into the genetic basis for their adaptation to low iron environments.

Materials and methods

Sample collection and single-cell whole-genome amplification

Samples for single-cell genomics, qPCR and flow cytometry were collected from four depths (15, 45, 100 and 150 m) at eight locations in April 2007 during the CMORE-BULA cruise from Fiji to Hawaii (Hewson et al., 2009). Accompanying physical, chemical and biological measurements can be found at the cruise website (http://hahana.soest.hawaii.edu/cmorebula), and corresponding metagenomic data on the CAMERA 2.0 Portal (http://portal.camera.calit2.net; CAM_PROJ_Bacterioplankton). Samples for qPCR and flow cytometry were also collected previously on a cruise from the United Kingdom to the Falkland Islands as part of the Atlantic Meridional Transect Project (AMT-13) (Johnson et al., 2006). Seawater samples for single-cell genomics were preserved with 10% glycerol, frozen in liquid nitrogen and kept at −80 °C. Flow cytometry samples were immediately fixed with glutaraldehyde (final concentration 0.125% v/v) for 10 m, frozen in liquid nitrogen and stored at −80 °C. Samples for qPCR were collected by filtering 100 ml of seawater onto 0.2 μm polycarbonate filters, rinsing in 3 ml wash buffer (10 mM Tris (pH 8.0), 100 mM EDTA, 0.5 M NaCl) and stored at −80 °C (Zinser et al., 2006).

Single-cell whole-genome amplification

Single-cell sorting and whole-genome amplification were performed as described previously (Rodrigue et al., 2009) with cells collected at Station 2 (7.5167 °S, 167.005°W) at a depth of 15 m. Briefly, Prochlorococcus cells were pre-sorted once to reduce contaminating free-DNA dissolved in the seawater before individual cells were deposited into the wells of a 384-well plate. A mild alkaline lysis was used to release cellular DNA, which was then amplified by MDA using Phi29 DNA polymerase and random hexamers. Successful reactions were identified by PCR screening of the 16S–23S ITS (Martiny et al., 2009c). Sanger sequences of the ITS regions from 94 individual cells were used to construct a neighbor-joining tree that included ITS regions from 13 previously sequenced Prochlorococcus genomes. Five cells from each of the two new clades were selected for an additional round of MDA amplification to generate sufficient DNA quantities for genome sequencing. Two microliters of the initial amplification were mixed with the appropriate buffer (40 mM Tris-HCl pH 7.5, 50 mM KCl, 10 mM MgCl2, 5 mM (NH4)2SO4 and 4 mM DTT), 50 μM phosphorothioate-protected random hexamers, 1 mM dNTPs and 2000 units of RepliPhi Phi29 DNA polymerase in a 50 μl reaction. The reaction was incubated at 30 °C for 10 h and heat-inactivated. The resulting DNA, 25 μg, was purified with a Qiagen (Valencia, CA, USA) QiaAmp column and directly used to prepare sequencing libraries.

Library preparation, genome assembly and annotation

Roche (Branford, CT, USA) 454-FLX Titanium libraries were prepared as described previously (Margulies et al., 2005; Rodrigue et al., 2009) for 10 selected Prochlorococcus cells. Five single cells (three from HL IV and two from HL III) were selected for deeper sequencing using overlapping paired-end reads of 144 nucleotides on the Illumina GAIIx platform. Corresponding PE reads were joined using the SHE-RA software to produce single composite reads of 180 bp (Rodrigue et al., 2010). These reads were assembled, along with the 454-FLX Titanium reads, in successive batches of 500 000 reads with the Roche gsAssembler version 2.0. The contigs obtained from the first group of 500 000 reads were used to filter redundant reads that were already part of this assembly. The rest of the assembly was performed on batches of 500 000 filtered reads. The resulting contigs were extended using a custom script developed at the DOE Joint Genome Institute (Hess et al., 2011). The extended contigs were next combined into a final assembly with Phrap (Gordon et al., 1998). The initial reads were then aligned against the contigs for quality assurance and the JGI Polisher software was further used to detect and correct potential errors in the assembly. Each assembly was then searched for non-Prochlorococcus contigs by using BLASTn against the NCBI nt database (Altschul et al., 1997). Contigs were considered from single cells if the best hit was Prochlorococcus at an e-value (1e−5, or if they had no hit but their GC content <40%.

Assembled contigs and raw sequencing reads from single-cell genomes are available through the ProPortal database (http://proportal.mit.edu). Contigs >200bp were deposited at DDBJ/EMBL/GenBank under the accession numbers ALPB 00000000-ALPL00000000. The version described in this paper is the first version, ALB*01000000.

Gene predictions

The gene prediction methods used with the single-cell Prochlorococcus genomes have been described in Kettler et al. (2007). Briefly, CRITICA version 1.05 (Badger and Olsen, 1999) was first trained using all available sequenced prokaryotic sequences from NCBI Genbank, including all complete sequenced Prochlorococcus and marine Synechococcus genomes. GLIMMER version 3.02 (Delcher et al., 1999) was then utilized to find additional coding genes that might have been missed. GLIMMER was trained using the predicted CRITICA gene models for the Prochlorococcus single cells and all coding genes found in the ProPortal database (Kelly et al., 2012). We observed that both CRITICA and GLIMMER performed poorly when contigs were short (5 kbp). In these cases, we searched the NCBI nr database using BLASTx for potentially missed known genes (at e-value (1e−5).

Counting HL III and HL IV clades and siderophore transporters in metagenomic data sets

Reads from the CMORE-BULA cruise and GOS were first matched against Prochlorococcus isolate genomes and Prochlorococcus single-cell contigs >500 bp using BLASTn. Those recruits with an e-value (1e−5) were then compared against NCBI’s nt database to confirm there were no better hits to other microbial sequences. Reads with identical bit scores to more than one genome were discarded. The number reads with top hits to HL III and HL IV were tallied, then normalized to the size of the single-cell genome recovered and the overall size of the metagenomic database from which the read came.

The abundance of siderophore transporter genes (tonB, tolQ, cirA, fepB, C, D) was determined by first matching metagenomic reads to transporter sequences from single-cell W2 using tBLASTx. Reads with an e-value (1e−5) were then compared by BLASTx with NCBI’s nr database. Reads with either a top hit to the W2 genes or to siderophore transport sequences in Prochlorococcus MIT9202 were identified as Prochlorococcus-like siderophore transporters. The small number of GOS reads that best matched siderophore transporters in other microbes but whose mate pair had a best hit to Prochlorococcus were also identified as Prochlorococcus-like siderophore transporters. Counts of transporters were normalized to a gene size of 1000 aa and the overall size of the metagenomic database from which the read came.

Phylogenetic analysis

Neighbor-joining trees of 16S–23S ribosomal ITS, 16S genes and functional gene rptS were constructed from 500 bootstrap iterations using MEGA 5.0 (Tamura et al., 2011). We used sequences from 12 previously described isolate genomes plus one undescribed isolate genome (MIT9202) in all trees. The ribosomal ITS tree also included ITS sequences from the 10 HL III/IV single-cell genomes that were sequenced as part of this study. When possible, we also included 16S and rptS sequences from single cells in the corresponding trees, but sequences were not available for every cell due to incomplete genome recovery. Finally, we included the same rptS sequences used by Rusch et al. (2010) and 16S rRNA sequenced used by West et al. (2011) in their phylogenetic analyses of the HL III and HL IV clades.

Quantitative PCR of HL III and HL IV

Primers targeting the ITS regions of HL III and HL IV were designed using ARB and a large database of environmental ITS sequences (Ludwig et al., 2004; Martiny et al., 2009c). Only one specific priming site was identified for HL IV, so this primer was used in conjunction with the more universal Prochlorococcus ITS-F primer (5′-CCGAAGTCGTTACTYYAACCC-3′). The specificity of the primers (HL IVr: 5′-CTCTTTGGAWAGGTGGTA-3′; HL IIIf: 5′-CGATCGGAACCTCTGATTTTCGA-3′; HL IIIr: 5′-TAACAGGAAGCTAGATTCTCCCA-3′) was tested against cultured isolates MIT9312, MED4, NATL2a and MIT9313. These tests confirmed specificity within the dynamic range of the assay (5–5 × 105 cell ml−1). Environmental DNA templates were collected and processed as described previously (Ahlgren et al., 2006; Zinser et al., 2006), and reactions were carried out in 384-well plates on a Roche Light Cycler 480 in 15 μl volumes. Primers were annealed at 50 °C for 45 s, followed by extension at 72 °C for 20 s and melting at 95 °C for 45 s. Quantification standards for HL III and HL IV covering a range from 5 to 5 × 105 cells ml−1 were generated from 10-fold serial dilutions of representative ITS sequences initially amplified from single-cell MDA products. Values following below the standard curve were set to the theoretical detection limit of 0.65 cells ml−1.

Expression of siderophore transporter genes during iron starvation

Prochlorococcus isolate MIT9202 was grown in PRO99 media (Moore et al., 2007) at 20 °C under a constant illumination of 27 μE m−2 s−1. The PRO99 medium was modified for trace-metal clean work through microwave sterilization of seawater, increased EDTA (11.7 μM), Chelex-100 treatment of major nutrients and soaking polycarbonate culture vessels in 0.1% Citranox, 10% HCl and pH 2 H2O for 24 h each (Saito et al., 2002).

To initiate iron starvation, an iron-replete culture was grown to mid-log phase and centrifuged at 7500 r.p.m. for 10 m at 21 °C. The cell pellet was rinsed twice in iron-free PRO99 media before being inoculated into duplicate vessels containing either iron-free or iron-replete media. With each rinse the iron concentration was reduced 1000-fold with the addition of 25 μl cell pellet to 25 ml iron-free media, resulting in FeCl3 concentrations of 1 μM in the Fe-replete cultures and 1 pM in the Fe-starved cultures. RNA and cell number samples were collected by centrifugation at various time points over 5 days following the initiation of iron stress. RNA samples were extracted using the mirVana miRNA kit (Ambion, Grand Island, NY, USA) and DNA removed with a 60-m Turbo DNAse (Ambion) incubation as described previously (Lindell et al., 2007).

Reverse transcription of cirA, fepB, fepC, fepD, tolQ, tonB and rnpB was carried out on 2 ng RNA using Super Script II (Invitrogen, Grand Island, NY, USA) following the manufacturer’s protocol along with no reverse-transcriptase controls for each gene. Reverse transcription reactions were diluted 1:5 with 10 mM Tris-HCl pH 8.0 before performing triplicate qPCR for each sample and gene with Qiagen Quantitect Sybr Green PCR kit following manufacturer’s protocol. An annealing temperature of 54 °C was used for all reactions. The delta-delta Ct method was used to calculate transcript abundances in each sample relative to rnpB, a housekeeping gene constitutively expressed in Prochlorococcus isolates MED4 and MIT9313 during iron starvation (Livak and Schmittgen, 2001; Thompson et al., 2011a).

Flow cytometric analysis of environmental populations and cultured isolates

An Influx flow cytometer (Becton Dickinson, Franklin Lakes, NJ, USA) was employed to measure cell numbers for the determination of relative cell size and chlorophyll per cell based on the unique autofluorescence and scatter signals of Prochlorococcus (Olson et al., 1990). When counting isolates, the relative cell size and chlorophyll per cell were approximated by normalizing forward-angle light scatter and red fluorescence per cell, respectively, to 2 μm-diameter Fluoresbrite beads (Polysciences, Warrington, PA, USA).

Results and discussion

Identification and sequencing of novel Prochlorococcus single cells

To examine the phylogenetic diversity and genetic makeup of Prochlorococcus at a site in the equatorial Pacific, we flow sorted hundreds of individual Prochlorococcus cells from a 15 m sample at station 2 along the CMORE-BULA transect (Figure 1a) and subjected them to whole-genome amplification. The internal transcribed sequence (ITS) region between 16S and 23S rRNA was sequenced from 94 individually amplified cells. Phylogenetic analysis of these ITS sequences revealed two clades that were distinct from cultured clades (Figure 2), and were later determined to belong to the uncultured HNLC clades (see below). Roughly half of the ITS sequences recovered in this sample belonged to these uncultured clades, while the rest matched the HL-adapted ecotype HL II; no sequences from HL I or the LL-adapted groups were detected.

Figure 1
figure 1

Geographic distribution of cells belonging to HL III and HL IV in the Atlantic and Pacific Oceans. Sampling locations of the CMORE-BULA (Pacific Ocean) and AMT-13 (Atlantic Ocean) expeditions (a). Individual Prochlorococcus cells where collected for single-cell whole-genome amplification at CMORE-BULA station 2 (7.5167 S, 167.005 W; identified by star). Depth distribution patterns of HL III and HL IV at CMORE-BULA station 2 (b). Error bars represent one s.d.. Integrated abundances (0–150 m) of total Prochlorococcus and HL III and HL IV along the Atlantic AMT-13 (c) and Pacific CMORE-BULA (d) transects were determined by flow cytometry and qPCR, respectively. Corresponding measurements of surface temperature and dissolved inorganic nutrient levels were collected as part the CMORE-BULA expedition (e), whereas dissolved inorganic nutrients and Fe levels in the Atlantic were collected during AMT-17, which followed a similar cruise track to AMT-13 at the latitudes provided (f).

Figure 2
figure 2

Phylogenetic affiliations of Prochlorococcus clades based on functional gene rptS (a), 16S rRNA (b), and ribosomal ITS (c) sequences as determined by neighbor-joining analysis (bootstrap=500). The prefix ‘GS’ identifies sequences from the GOS database identified as HNLC Prochlorococcus clades by Rusch et al. (2010), whereas the prefix ‘BIOS’ indicates novel 16S rRNA sequences found by West et al. (2011). Sequences from uncultured single cells from this study are marked with ‘W’ prefix. Note, functional gene and 16S sequences were not available for all single cells as genome recovery was incomplete. The names of Prochlorococcus clades vary within the literature. Here, HL I=eMED4, HL II=eMIT9312 and LL I–IV=eNATL, eSS120, eMIT9211 and eMIT9313. See Huang et al. (2012) for additional information on Prochlorococcus nomenclature.

New clades of Prochlorococcus were independently discovered in two recent studies of the Pacific and Indian Oceans. However, it was unclear whether these previously described clades are the same as connections could not be made between the core functional genes reported by Rusch et al. (2010) and the 16S rRNA sequences reported in West et al. (2011). The partial genomes assembled de novo by Rusch et al. (2010) from metagenomic sequences were not suitable for phylogenetic analysis because the genomes, and the genes comprising them, were composites derived from multiple organisms. In contrast, the single-cell approach described here provides unequivocal linkages among functional genes, 16S rRNA and ribosomal ITS sequences. Thus, we were able to compare our single cells with the gene sequences of the previous studies and determine if the functional genes identified by Rusch et al. (2010) are associated with the 16S rRNA sequences identified by West et al. (2011). Phylogenetic analyses confirm the same novel clades of Prochlorococcus were identified independently in all three cases (Figure 2). However, the designations for HNLC1 and HNLC2 do not agree between previous studies (for example, HNLC1 in Rusch et al. (2010) was labeled HNLC2 by West et al. (2011), and vice versa). For clarity and conformity with existing nomenclature, we refer to these clades as HL III and HL IV following Huang et al. (2012).

To explore the genetic makeup of the HL III and HL IV identified in our sample, 10 single cells from these groups were subjected to whole-genome sequencing using 454-Titanium sequencing technology, of which 5 cells were selected for additional sequencing on the Illumina (San Diego, CA, USA) platform (GAIIx) to improve genome recovery and assembly (Table 1). The total base pairs assembled into contigs for the five deeply sequenced genomes ranged from 0.77 to 1.27 Mbp, representing roughly 45–74% of a typical HL-adapted Prochlorococcus genome (Rocap et al., 2003; Kettler et al., 2007). The genomes from the 10 single cells collectively contained total of 7535 predicted genes, of which 6131 had homologs in sequenced Prochlorococcus isolates. The single-cell genomes also revealed 394 new genes that had never before been seen in Prochlorococcus (Table 1), which stands in contrast to the consensus metagenomics assemblies reported previously (Rusch et al., 2010). That is a 4.6% increase in the pan-genome of Prochlorococcus conferred by the analysis of only these 10 new partially assembled genomes. While most of these genes do not yet have a proposed function (Supplementary Table 1), their identification highlights the power of single-cell sequencing for revealing organismal diversity by identifying the source-organism of genes from environmental samples. Insights into the ecology of HL III and HL IV provided by a subset of these new genes are discussed below.

Table 1 Assembly statistics of Prochlorococcus single-cell genomes

Abundance and distribution of HL III and HL IV

To explore the potential ecological differences between HL III, HL IV and known Prochlorococcus ecotypes, we analyzed the abundance and distribution of HL III and HL IV using qPCR in archived samples from the Pacific and Atlantic Oceans (CMORE-BULA and AMT-13, respectively). Depth profiles of the two clades in both the tropical Pacific and Atlantic oceans revealed the typical patterns associated with HL-adapted ecotypes (West et al., 2001; Ahlgren et al., 2006; Johnson et al., 2006; Zinser et al., 2007), that is, relatively high abundance in well-lit surface waters and low abundance below 100 m (Figure 1b), a result consistent with depth profiles in the southwestern Pacific (West et al., 2011). Interestingly, HL III and HL IV always displayed strikingly similar depth distributions within the same location, and their abundances did not differ by >2.3-fold at any depth in the upper 100 m. HL III and HL IV also had identical geographic distributions. Both clades were restricted to equatorial regions in the Atlantic and Pacific (Figures 1c and d), and, when present, accounted for 5–20% of all Prochlorococcus detected by flow cytometry. Their abundance along the CMORE-BULA transect in the tropical Pacific Ocean appears to be even higher based on the number of metagenomic sequencing reads assigned to HL III and HL IV. Examining metagenomic libraries collected at the same stations where qPCR abundances were determined, we found 17–85% of total Prochlorococcus DNA sequences had best hits to HL III and HL IV (Supplementary Table 2), suggesting that abundances derived from qPCR are likely lower bound estimates.

The similar distribution patterns of HL III and HL IV stand in contrast to those of fellow HL-adapted clades HL I and HL II. It is unusual for HL I and HL II ecotypes to be equally abundant at the same location, except at ‘cross-over’ points in their distributions (for example, latitudes −30 S and 35 N in Supplementary Figure 1B), as they have evolved to thrive under different environmental conditions, particularly with regard to temperature (Johnson et al., 2006; Zinser et al., 2007). Interestingly, the evolutionary distance between HL III and HL IV is greater than that between HL II and HL I, as determined from ribosomal and protein coding sequences (Figure 2). We might expect this greater divergence to manifest clear phenotypic differences that would be revealed in their environmental distributions, as is seen with HL II and HL I, yet no clear differences between the distributions of HL III and HL IV were observed along the oceanic gradients we examined. So while both HL III and HL IV differ from HL II and HL I ecotypes in terms of phylogeny and biogeography, it is not yet clear whether they each represent distinct ecotypes of Prochlorococcus or if they are sub-groups of a larger ecotype adapted to equatorial regions.

Although the ecological differences between HL III and HL IV eludes us because of their similar distributions along environmental gradients, comparing them with other HL-adapted clades reveals some interesting differences. First, their distribution in both the Atlantic and Pacific suggests HL III and HL IV are adapted to warm waters. That is, the clades were only abundant in locations with surface temperatures 26 °C (Figure 1), a pattern that also holds with observations from the southwestern Pacific (Supplementary Figure 2) and Indian Ocean (Rusch et al., 2010; West et al., 2011). The HL I clade, in contrast, is adapted to cold waters, while HL II thrives in both moderate and high temperature waters (Supplementary Figure 1) (Johnson et al., 2006; Zinser et al., 2007). Second, HL III and HL IV were only found where inorganic phosphorus levels were relatively high (>100 nM in this study and West et al., 2011) (Figure 1; Supplementary Figure 2), whereas HL II and HL I inhabit both high and low phosphorous environments, including the Sargasso and Mediterranean Seas where dissolved inorganic phosphorous levels are often <1 nM (Wu et al., 2000; Marty et al., 2002). Finally, HL III and HL IV appear to be restricted to waters with low iron concentrations. In the Atlantic, HL III and HL IV were only found just south of the equator where measurements along a similar cruise track detected dissolved iron levels of 0.01 nM (Moore et al., 2009), while HL II dominated waters from 30 °N to 30 °S where dissolved iron concentrations range from 0.01 to 1 nM (Figure 1) (Johnson et al., 2006; Moore et al., 2009). HL III and HL IV were also only found in waters with 0.01 nM dissolved iron in the southeastern Pacific (Supplementary Figure 2) (Blain et al., 2008; West et al., 2011). Dissolved iron concentrations were not determined during our Pacific sampling expedition nor at GOS sample sites from which Rusch et al. (2010) initially hypothesized the HL III and HL IV clades inhabit low Fe waters, but simulated data from a global chemistry model presented by Rusch et al. (2010) suggest that our equatorial Pacific samples also came from low iron environments. Thus, inference from environmental distributions suggests that HL III and HL IV are most likely to appear in warm, low iron and high phosphorous environments, a combination that severely limits their geographic distribution.

The impact of inorganic nitrogen concentrations on HL III and HL IV distributions remains unclear. For example, nitrate+nitrite concentrations were typically high at locations dominated by HL III and HL IV along the CMORE-BULA transect (Figure 1e), whereas West et al. (2011) found no correlation between HL III/IV abundance and nitrate+nitrite in the southeastern Pacific. Nitrite reductase genes were identified in the genomes of LL I and LL IV isolates (Rocap et al., 2003; Kettler et al., 2007), nitrate reductase genes were discovered in natural Prochlorococcus communities (Martiny et al., 2009b), and nitrate uptake by Prochlorococcus was measured in the Sargasso Sea (Casey et al., 2007). These data indicate some Prochlorococcus utilize nitrate and nitrite as N sources. However, we found no evidence of nitrate or nitrite transporters, reductases or molybdopterin biosynthesis genes within the HL III and HL IV single-cell genomes, a result consistent with metagenomic assemblies generated by Rusch et al. (2010). In addition, genes belonging to the Prochlorococcus-like nitrate assimilation gene cluster were not detected in the CMORE-BULA metagenomic data set at stations 2–6 where HL III and HL IV were most abundant (Figure 1c; Supplementary Table 2), suggesting these genes were not missed in the single cells because of incomplete assembly. Four reads with best hits to the Prochlorococcus-like nitrate assimilation gene cluster were found at station 7 where HL III/IV accounted for only 17% Prochlorococcus based on metagenomic counts, and 68 Prochlorococcus-like nitrate assimilation genes were detected at station 1 where HL III/IV accounted for 1% of Prochlorococcus (Supplementary Table 2). These data suggest nitrate and nitrite are unlikely to be important N sources to members of the HL III and HL IV clades in the environments we examined.

Adaptations to low iron environments

Iron is a critical micronutrient for photoautotrophic organisms like Prochlorococcus that use it primarily in the reaction centers of their photosynthetic and electron transport proteins (Raven et al., 1999; Jordan et al., 2001). Genomic analysis of Prochlorococcus isolates suggests they harvest trace amounts of free ferric iron (Fe3+) from the environment to meet their needs (Dufresne et al., 2003; Rocap et al., 2003; Kettler et al., 2007). However, scavenging free Fe3+ can present a challenge in waters far away from iron sources as most dissolved iron in the ocean is bound to organic ligands (Rue and Bruland, 1995; Van Den Berg, 1995; Wu and Luther, 1995), a form that Prochlorococcus is not known to exploit. In regions with low dissolved iron concentrations, such as the equatorial Pacific and southern equatorial Atlantic, the growth of Prochlorococcus can even be limited by Fe levels (Mann and Chisholm, 2000). Therefore, we might expect members of HL III and HL IV, which appear to be restricted to low iron environments, to have specific adaptations enabling them to cope with these conditions. Based on metagenomic reconstructions, Rusch et al. (2010) hypothesized that the loss of Fe-containing genes encoding PTOX (plastoquinol terminal oxidase) and cytochrome Cm from both HL III and HL IV reduced their iron requirements and represented an adaptation to low Fe environments. However, it is unclear whether the loss of a few Fe-containing proteins could substantially lower Fe requirements as most Fe is used in photosynthetic and electron transporter proteins (Raven et al., 1999; Jordan et al., 2001), which were not missing in the metagenomic assemblies.

Analysis of our single-cell genomes revealed that members of the HL IV clade harbor genes for acquiring Fe bound to organic ligands. More specifically, three of the five partial HL IV genomes (W2, W4 and W12) contain genes for Ton-dependent siderophore acquisition (tonB, tolQ, cirA and fepBCD) (Figure 3a). In this pathway, proteins TolQ, R and TonB channel energy generated by proton-motive force to CirA, an Fe-ligand binding protein in the outer membrane, to power transport of the Fe-ligand into the periplasm (Ratledge and Dover, 2000; Postle and Kadner, 2003). The Fe-ligand complex is then transported across the cytoplasmic membrane by FebB, Cand D, where Fe can be stripped from the ligand and utilized. The presence of this transport system indicates that members of the HL IV clade can access the larger pool of organic-bound Fe to help satisfy their requirements, an adaptation that would be particularly useful in low iron environments. No evidence of siderophore synthesis pathways was found in these genomes, suggesting that they do not make their own siderophores. However, it is possible that the genes for siderophore production could be found in the missing parts of the incomplete single-cell genomes, or that these cells use genes and pathways yet to be characterized.

Figure 3
figure 3

Ton-dependent siderophore acquisition genes observed in Prochlorococcus isolate MIT9202 and HL IV single-cell genomes W2, W4 and W12 (a). The contigs from single-cell W2 (contig 41; 69 073 bp) and W4 (contig 50; 27 014 bp) are longer than the region depicted in the panel, but only the regions of interest are presented. The entire single-cell W12 contig 44 (5547 bp) is represented. A red arrow indicates the position of siderophore transport genes in a genomic island in Prochlorococcus MIT9202 (b). Following Kettler et al. (2007), genomic islands are identified as regions with high numbers of genes ‘gained’ as departure from the last common ancestor shared with Synechococcus.

Siderophore transporter genes were not identified in members of the HL III clade even though they were equally abundant to HL IV cells in iron poor waters. One interpretation is that the HL III clade uses an alternative strategy for coping with low iron concentrations. Partial genomes from HL III single cells lacked Fe-containing genes PTOX and cytochrome Cm, which would be consistent the hypothesis that this clade minimizes Fe demand by decreasing the number of Fe-containing proteins (Rusch et al., 2010). However, gene absence must be interpreted cautiously when dealing with incomplete single-cell genomes and metagenomic reconstructions. Indeed, PTOX, cytochrome Cm and Ton-dependent siderophore transporters were missed in the metagenomic assembly of the HL IV clade (Rusch et al., 2010), but were found in the single-cell genomes. Thus, it is quite possible that Ton-dependent siderophore transporters, or even other Fe acquisition genes, are encoded in the missing portions of our HL III genomes. To address this point, we examined mate pair reads from the GOS data set where one read mapped to a Prochlorococcus-like siderophore transport gene and the other mapped outside this suite of siderophore genes. Focusing specifically on GOS sites 39–46, where HL III and HL IV dominate Prochlorococcus populations (Figure 4a), we found 5 of the 19 mates had best hits (blastn) to HL III single-cell genomes (Supplementary Table 3). This suggests that members of HL III clade may also access ligand-bound iron using siderophore transporters.

Figure 4
figure 4

Abundance of Prochlorococcus-like siderophore transporters (a) and HL III and HL IV (b) in metagenomic data sets collected at the GOS and CMORE-BULA sites. Counts of siderophore transporter genes (tonB, tolQ, cirA and fepBCD) were normalized for gene length to 1000 aa, whereas the combined count of reads matching HL III and HL IV were normalized to the size of the recovered genomes. All data were normalized to the total database size at each location.

To better understand the importance of siderophore transporters to HL III and HL IV, we examined the abundance of these specific genes (that is, Prochlorococcus-like tonB, tolQ, cirA and fepBCD) in metagenomic samples collected along the BULA transect from Fiji to Hawaii, and at all GOS sampling locations. The abundance of Prochlorococcus-like siderophore transporters is clearly elevated at the specific locations in the Pacific and Indian Oceans where HL III and HL IV are also abundant (Figure 4). This suggests uptake of organic-bound iron may be a common feature to these clades regardless of geographic location. However, the abundance of Prochlorococcus-like siderophore transporters was also elevated at a few locations where HL III/IV abundance was low, for example, other locations in the Indian Ocean and far eastern equatorial Pacific. This suggests Ton-dependent siderophore transport systems are not exclusive to HL III and HL IV, but instead are distributed among other Prochlorococcus ecotypes. In support of this observation, many of the GOS mate reads paired with Prochlorococcus-like siderophore transporters had top hits to the cultured HL-adapted clades (Supplementary Table 3). Finally, the same suite of transporters was also found in cultured strain Prochlorococcus MIT9202 (Figure 3), a previously undescribed member of the HL II ecotype isolated from the equatorial Pacific Ocean (an HNLC region). Thus, Prochlorococcus-like siderophore transport genes are not exclusive to HL III and HL IV. Like P and N-related genes in Prochlorococcus (Martiny et al., 2006, 2009b; Coleman and Chisholm, 2010), their distribution in the oceans seem to be influenced by habitat and are not necessarily correlated with ecotype abundance.

Consistent with this image, the siderophore transporters in the cultured strain of Prochlorococcus MIT9202 from the Equatorial Pacific appear to be located in a genomic island, which constitutes a site of frequent horizontal gain and loss in Prochlorococcus (Figure 3b) (Coleman et al., 2006; Kettler et al., 2007). Similarly, two phage integrase genes flank the suite of siderophore transport genes in the W2 single-cell genome, while another phage integrase sits adjacent to this suite of genes in the W4 single-cell genome (Figure 3a). Owing to the small size of W12 contig 44, we were unable to determine if phage integrase genes were adjacent to the suite of siderophore transporters. The location of these transport genes in MIT9202, and the close proximity of phage integrase genes in multiple HL IV genomes, suggest this pathway may have been acquired by virus-mediated horizontal gene transfer, and possibly exchanged among Prochlorococcus clades inhabiting low Fe environments.

Expression of siderophore acquisition genes

Although we could not directly test the role of the siderophore transporters in the Fe metabolism of HL III and HL IV, we were able to examine their response to Fe stress in laboratory experiments with Prochlorococcus isolate MIT9202. More specifically, we measured expression levels of the siderophore transport genes tonB, tolQ, cirA and fepB, C, D in cultures grown under Fe-replete and Fe-limited conditions. Although total cell numbers increased steadily over 120 h of Fe starvation, significant reductions in chlorophyll fluorescence and forward scatter, which is related to cell size, were observed (Figure 5), indicating the MIT9202 cells were experiencing significant physiological stress. The gene encoding the outer membrane Fe-binding component (cirA) was upregulated after 48 h of Fe starvation (t-test; P<0.05), whereas the expression levels of the other genes in the Ton-dependent system were not significantly different under iron starvation (Figure 5d). This suggests MIT9202 may respond to low Fe concentrations by increasing siderophore transport proteins in the outer membrane to facilitate transport of Fe-ligands into the periplasm, possibly because this step represents a bottleneck in Fe acquisition. Members of HL III and HL IV presumably do the same.

Figure 5
figure 5

Abundance, mean forward light scatter per cell, and mean chlorophyll fluorescence per cell in Prochlorococcus MIT9202 under iron starvation conditions (ac). Time 0 h indicates when cultures were transferred from Fe-replete (Fe+) conditions to Fe-starved (Fe−). Samples were taken at 48 and 120 h to quantify expression of siderophore acquisition genes tonB, tolQ, cirA, fepB, C, D by qPCR (d). Expression is reported as fold-change under Fe-starved conditions relative to expression under Fe-replete conditions. A reference line indicating no change in expression level (that is, fold-increase of 1) is provided. Error bars in all panels represent 1 s.d. Significantly different expression levels (P-value<0.05) are indicated with an *.

Detection of a potential prophage in HL III and HL IV

Viruses are thought to have an important role in the exchange of genetic material between Prochlorococcus cells (Lindell et al., 2004; Thompson et al., 2011b), as highlighted above in the case of siderophore transporter genes. Thus far, only lytic phages infecting Prochlorococcus have been reported (Sullivan et al., 2003, 2005), although a putative integrase gene along with a perfect 42 bp attP site corresponding to a Prochlorococcus tRNA-Leu have been observed in the genome of the podovirus P-SSP7 (Sullivan et al., 2005), suggesting this phage may also be lysogenic. Several attempts to isolate P-SSP7 lysogens at this position in the genome of the Prochlorococcus MED4 strain under laboratory conditions have been unsuccessful (unpublished results). However, we observed a partial P-SSP7-like prophage at the predicted locus in our single-cell genome W8, a cell that belongs to the HL III clade. The 25 kbp contig is composed of 21 predicted genes from the host followed by the tRNA-Leu and nine phage genes. The junction between the host and prophage gene is supported by several sequencing reads, and it is thus unlikely that the contig is chimeric (Figure 6). As predicted for P-SSP7, the first viral gene in the contig encodes an RNA polymerase, suggesting that the integrase and attP sequence would be located immediately upstream in the bacteriophage DNA molecule. The rest of the prophage genome is missing from this single-cell assembly, so it remains unclear if it encodes a functional virus. Still, the presence of a classically lytic phage integrated into its host genome is intriguing and could represent an adaptation to stressful conditions. For example, a mechanism could have evolved for the originally lytic phage to sense certain environmental parameter and become lysogenic until more favorable conditions are found. The presence of functional integrases in phage genomes could contribute to lateral gene exchange such as the transfer of siderophore acquisition genes between Prochlorococcus cells as discussed above.

Figure 6
figure 6

Cyanophage genes integrated into single-cell Prochlorococcus genome W8, a member of the HL III clade. The putative integration site of phage genes occurs at tRNA-Leu, identified in panel (a). Multiple reads span the integration site, supporting the assembly results (b). Uneven coverage along the assembled contig, including near the putative integration site, is characteristic of templates generated by MDA.

Conclusions

Single-cell genomics is a powerful approach that, in combination with genomics and metagenomics, enhances our ability to understand the structure and function of genetic diversity among microbes. Using this method, we were able to uncover siderophore transporters that could be unambiguously assigned to Prochlorococcus, providing insights into how Prochlorococcus is adapted to low Fe concentrations. Our work also further characterized two newly discovered Prochlorococcus clades, suggesting that dissolved phosphorus concentrations, in addition to Fe and temperature, may also have an important role in control the abundance of HL III and HL IV. In addition, the detection of a partial P-SSP7-like prophage in one of the single-cell genomes provides the most direct evidence to date that marine cyanophage do have lysogeny as part of their life cycle. Although this study combines genome insights with data from quantitative distribution, further work will be needed to obtain cultured isolates and understand the molecular basis for the restricted distribution of HL III and HL IV.