Introduction

The prokaryotic world has conventionally been divided into autotrophs and heterotrophs based on carbon source, and phototrophs and chemotrophs as dictated by energy source. In the past several decades we have learned that a number of bacteria, archaea and even eukarya do not easily fit into these categories and are capable of using a mix of energy and carbon sources, a trophic strategy known as mixotrophy. Examples include photoautoheterotrophy, photoorganoheterotrophy and chemolithoautoheterotrophy (Perez and Matin, 1980; Jones, 2000; Zubkov, 2009). In marine systems, bacterivory by photosynthetic protists is now known to be common in eukaryotes such as dinoflagellates (Jones, 2000). According to Hartmann et al. (2012), the majority of bacterivory in the Atlantic may be carried out by phototrophs. Similarly, the ocean is populated by up to 11% aerobic anoxygenic photoheterotrophic bacteria that combine phototrophy and carbon fixation with organic compound uptake (Kolber et al., 1999; Sieracki et al., 2006; Jiao et al., 2007; Kirchman et al., 2014). Heterotrophic bacteria that utilize light to pump protons via rhodopsin are abundant globally as well (Béjà et al., 2000, 2001; Rusch et al., 2007), making up an estimated 13% of the photic zone bacteria in the Mediterranean and Red Seas (Sabehi et al., 2005), 50% in the Sargasso Sea (Campbell et al., 2008), and 48% in 116 marine and terrestrial samples examined by Finkel et al. (2013). In addition to using proteorhodopsin, some of these bacteria can fix up to 30% of their total carbon (Palovaara et al., 2014), suggesting a significant contribution to global carbon fixation.

Some freshwater cyanobacteria have long been known to employ photoautoheterotrophy (Rippka, 1972; Joset-Espardellier et al., 1978; Chen et al., 1991; Paerl, 1991; Zubkov, 2009) and evidence is growing that marine picocyanobacteria—the most abundant marine phototrophs (Partensky et al., 1999)—also have this capacity. The dominant genera, Prochlorococcus and Synechococcus, are known to take up amino acids (Church et al., 2004; Zubkov et al., 2003, 2004, 2008; Michelou et al., 2007; Mary et al., 2008; Gómez-Pereira et al., 2013; Evans et al., 2015), glucose (Gómez-Baena et al., 2008; Muñoz-Marín et al., 2013) and dimethylsulfoniopropionate (Vila-Costa et al., 2006), and analysis of 12 genomes has shown that certain strains have genes for amino acid, sugar, oligopeptide and phosphonate uptake (Rocap et al., 2003; Martiny et al., 2006; Kettler et al., 2007).

The ability of these picocyanobacteria to take up organic compounds raises the question of how mixotrophic capacity is distributed phylogenetically within this group. Prochlorococcus and marine Synechococcus (hereafter designated marine picocyanobacteria) can be divided into phylogenetic clusters that generally correspond to physiologically distinct ecotypes (West and Scanlan, 1999; West et al., 2001; Ahlgren et al., 2006; Johnson et al., 2006; Zinser et al., 2006, 2007; Malmstrom et al., 2010; Sohm et al., 2015). In Prochlorococcus these ecotypes have different light and temperature optima, which results in a partitioning of the water column with depth, and different relative abundances along latitudinal gradients (Moore et al., 1998; Moore and Chisholm, 1999; Rocap et al., 2003; Johnson et al., 2006; Zinser et al., 2007). Synechococcus ecotypes can be defined by open ocean and coastal phylogenetic clusters (Dufresne et al., 2008; Ahlgren and Rocap, 2012) as well as by temperature- and nutrient concentration-related groups (Sohm et al., 2015). Because of this ecotype partitioning along phylogenetic lines, if picocyanobacterial mixotrophic capacity is partitioned by phylogenetic group it suggests a role for mixotrophy in niche adaptation. In flagellates for example, mixotrophy can provide alternative sources of energy in light-limiting conditions and alternative sources of nitrogen or phosphorous in nutrient-limiting conditions (Rothhaupt, 1996a, b). Furthermore, if mixotrophy is also universally distributed among marine picocyanobacteria, it indicates that it has a more central role in the ecology of these genera than previously thought.

In order to determine the extent of mixotrophic ability among picocyanobacteria in the global oceans, we examined a collection of 67 Prochlorococcus and Synechococcus isolate reference genomes (Supplementary Table S1). Because these strains were isolated from a limited number of locations, we also examined data from the largest marine metagenomic survey to date, the Tara Oceans expedition (Karsenti et al., 2011; Sunagawa et al., 2015). This census of the ocean microbiome allowed us to estimate the global picocyanobacterial genetic capacity for mixotrophy and examine how it is distributed among different oceanographic regimes at a global scale.

Materials and methods

Isolation and sequencing of isolate genomes

Sixty-seven Prochlorococcus and Synechococcus genomes were used as reference genomes in this study (Supplementary Table S1). The genomes of MIT1306, MIT1312, MIT1318, MIT1320, MIT1323, MIT1327, MIT1342, MITS9504, MITS9508 and MITS9509 are described by Cubillos-Ruiz (2015) and Thompson (2015) and are publically available in the National Center for Biotechnology Information (NCBI) Genbank database.

Sampling

The Tara Oceans expedition is the first global oceanographic expedition to combine analysis of nutrient concentration, temperature, salinity and particulate data with deep sequencing of environmental DNA for metagenomic reconstruction at multiple depths (4 to ~800 m, Supplementary Table S2, http://doi.pangaea.de/10.1594/PANGAEA.840721). The samples used in this study were taken at 68 sites in the Atlantic, Pacific, Indian and Southern Oceans, as well as in the Mediterranean and Red Seas (Supplementary Table S2, Sunagawa et al., 2015). Sample collection and preparation were described previously (Logares et al., 2014; Pesant et al., 2015; Sunagawa et al., 2015). The current analysis focuses on 139 DNA samples from the two smallest size fractions (0.2–1.6 μm and 0.2-3 μm). These samples were taken from up to three different depths: ~5 m, the deep chlorophyll maximum, and the mesopelagic zone.

Metagenomic DNA extraction, sequencing and assembly

DNA extraction and Illumina sequencing were described previously (Logares et al., 2014; Sunagawa et al., 2015). A total of 7.2 terabases were sequenced and processed using the MOCAT software package (Kultima et al. 2012) to yield metagenomic assemblies and gene predictions as summarized in Sunagawa et al. (2015). In order to estimate gene abundances, high-quality reads were mapped onto a non-redundant reference database, the Ocean Microbial Reference Gene Catalog (OM-RGC) including the genes from the Tara Oceans expedition, the Global Ocean Sampling expedition (Yooseph et al., 2007), the Moore Marine Microbial Sequencing project, the Moore Viral Genomes, the Pacific Ocean Virome study (Hurwitz and Sullivan, 2013) and the NCBI Viral Reference Genomes data set (Sunagawa et al., 2015). Mapped reads had a minimum of 95% nucleotide identity to a reference gene and a minimum length of 45 bp. Gene abundances were estimated from read depths that were then normalized by the reference gene length and the total number of bases per sample in order to take into account sequencing depth.

Extraction of 16S mitags/OTU classification

Reads mapped to 16S ribosomal RNA (rRNA) sequences were extracted from the metagenomic reads as described previously (Logares et al., 2014) and were designated 16S mitags. Sequences with 100 or more high-quality bases were then mapped via UCLUST (v. 1.2.22) (Edgar, 2010) to an abridged SILVA database (v. 108 clustered at 97% nucleotide identity). Reads were required to have 97% or higher identity to the reference sequence, and were assigned to their best hit reference sequence. OTU abundances were estimated based on read counts per 16S rRNA gene. These raw counts were then normalized to the amount of DNA sequencing per sample by dividing them by the total number of 16S mitags for each sample. All data and count tables are available at: http://ocean-microbiome.embl.de/companion.html.

Taxonomic classification of assembled contigs

In order to classify assembled contigs, the non-redundant set of proteins of the OM-RGC was compared to UniProt using Rapsearch2 (Ye et al., 2011). Hits with an E-value <10−3 were kept and the last common ancestor was determined for each gene as described by Hingamp et al. (2013).

Quantification of taxonomic groups

In order to quantify Prochlorococcus ecotype and Synechococcus cluster abundances in each sample, we considered using abundances of single-copy genes that had been used for this purpose previously by Li et al. (2010) and Martiny et al. (2009). Most of the genes from the Li et al. analysis failed to properly separate Synechococcus clusters, whereas abundances based on the eight single-copy genes used by Martiny et al. failed to correlate well with flow cytometry counts. Thus, we selected only the rpoC1 RNA polymerase gamma subunit gene and the psbO photosystem II manganese-stabilizing protein gene from the Li et al. analysis for quantification. These gene abundances correlate well with flow cytometry counts (Pearson correlations from 0.43 to 0.54) and maximum likelihood trees of their nucleotide sequences generally agree with picocyanobacterial internally transcribed spacer trees, allowing for minor topological differences (Supplementary Figure S1). The rpoC1 gene has previously been used to classify ecotypes of both Prochlorococcus and Synechococcus (Palenik, 1994; Ferris et al., 1998; Mühling et al., 2005, 2006), and psbO is a photosystem II protein that is not homologous to other known proteins, making it an ideal candidate for quantification of cyanobacterial genomes (Raymond and Blankenship, 2004). The relative abundances of rpoC1 and psbO in Tara Oceans samples were highly correlated (0.84 Pearson r2). On the basis of the maximum likelihood trees of these genes and those found in the reference genomes, we were able to assign Tara Oceans single-copy genes to specific ecotypes or clusters.

Identification of transporter genes

In order to identify transporter genes, we utilized the Transporter Classification Database (TCDB http://www.tcdb.org/) (Saier et al. 2006). We specifically looked for (oligo)peptide, amino acid and sugar uptake transporters, excluding peptide transporters involved in signaling. A list of the TCDB families used is given in the Supplementary Materials (Supplementary Table S3). A blastp homology search was carried out against the TCDB and hits were retained that had more than 30% amino-acid identity over 70% or more of the reference gene length with an E-value of less than 1 × 10−5. The Pro1404 glucose transporter (Gómez-Baena et al., 2008) was manually added to the database. We recognize that a 30% amino-acid identity threshold may not always be high enough to indicate identical substrate specificity. Thus, the substrate specificity of the transporters in this study should be considered putative. We chose to use the 30% threshold because the TCDB is greatly lacking in transporter genes from cyanobacteria. A higher threshold would likely miss a substantial number of transporter genes in marine picocyanobacteria.

For the reference genomes, genes were clustered into orthologous groups, CyCOGs, as described by Kelly et al. (2012). A list of CyCOG clusters of genes used in these analyses is provided in Supplementary Table S4. For the Tara Oceans OM-RGC, transporter gene abundances were normalized to 16S mitag abundances, assuming one copy of the 16S rRNA gene per genome for Prochlorococcus and two copies for Synechococcus. Samples with low abundance of 16S rRNA genes in Prochlorococcus or Synechococcus were omitted from the analyses if they contained less than 10 × average read coverage per SILVA accession identified in the sample.

Gene context

Genes consistently found in clusters across prokaryotic taxa are frequently functionally related (Dandekar et al., 1998; Huynen et al., 2000; Rogozin et al., 2004; Karimpour-Fard et al., 2008; Yelton et al., 2011). In order to further support the annotation of organic compound transporter genes, we examined their gene context to determine if they were associated with other subunits of the same transporters or metabolic genes that use the transporter substrates. For the two ABC transporters examined (3.A.1.3.18 and 3.A.1.5.-), most subunit genes were found in gene clusters (Supplementary Table S5). Overall three subunit genes were missing from three separate genomes for the 3.A.1.5.- peptide transporter and two subunit genes were missing from two genomes for the 3.A.1.3.18 amino-acid transporter. In addition, both the glucose:H+ symporter, glcP (2.A.1.1.32) and the recently discovered Pro1404 glucose porter (Gómez-Baena et al., 2008) were found in clusters of genes with related functions. glcP is adjacent to a sugar porin (1.B.19.1.4) in most Prochlorococcus genomes and two out of three Synechococcus genomes with this gene (Supplementary Figure S2). This porin is homologous to the glucose inducible sugar porin oprB in Pseudomonas (Hancock and Carey, 1980; Saravolac et al., 1991).

The Pro1404 glucose porter is a major facilitator superfamily transporter that has been implicated in glucose uptake in Prochlorococcus (Gómez-Baena et al., 2008; Muñoz-Marín et al., 2013). This transporter was consistently found in all picocyanobacterial reference genomes in a gene cluster with a glycogen debranching enzyme (E.C. 3.2.1.-) (Supplementary Figure S2), which functions in glycogen degradation to glucose monomers in Escherichia coli (Jeanningros et al., 1976; Dauvillée et al., 2005) and has also been shown to affect glycogen branching patterns in Synechococcus elongatus PCC7942 (Suzuki et al., 2007). Thus, we expect that the proteins coded by the Pro1404 gene and the glycogen debranching enzyme gene both function in providing the cell with glucose.

Statistical and phylogenetic analyses

Kolmogorov–Smirnov and Shapiro–Wilk tests indicated that the transporter gene abundance data were not normally distributed. Thus we used nonparametric tests in all statistical analyses unless otherwise noted. The correlations used were Spearman correlations and hypothesis tests between populations were Mann–Whitney–Wilcoxon tests. Phylogenetic trees were made with the RaxML software v. 7.3.0 (Stamatakis, 2006) and are maximum likelihood trees with 100 bootstraps. Evolutionary reconstruction of ancestral traits was carried out with the ace method of the ape (v. 3.3) R statistical package (Team RDC, 2012). The evolutionary model used assumes equal rates of transitions from one state to another and uses maximum likelihood ancestral state estimation (Cunningham et al., 1998).

Results and discussion

Mixotrophic capacity in cultured strains

Genetic capacity for organic compound uptake and degradation

In order to determine if isolate picocyanobacteria have the capacity for mixotrophy we looked for genes for uptake and degradation of organic compounds. All 67 picocyanobacterial reference genomes contained transporters for amino acid, sugar and peptide uptake, indicating a universal capacity for mixotrophy (Figure 1), and suggesting a persistent advantage conferred by organic compound uptake across the marine environments where these picocyanobacteria live. Genes specific to degradation of the sugars and amino acids taken up by these transporters were also identified (Supplementary Table S6). The presence of these genes along with transporter genes suggests that these strains have the ability to break down organic compounds for use in central carbon, and in some cases nitrogen, metabolism. Unsurprisingly, glucose degradation genes were identified in all genomes along with glucose transporter genes. Of the 36 cases where alanine transporter genes were found, 35 also had an alanine degradation gene, alanine dehydrogenase. This enzyme is reversible, but has been implicated in alanine degradation in Synechococcus elongatus (Lahmi et al., 2006). The glutamate porter is found in 56 genomes, but the glutamate degradation dehydrogenase is found in only the low-light IV (LLIV) Prochlorococcus and the CC9605 Synechococcus strains. Only genomes that contain the glutamate and alanine transporters contain the degradation genes with one exception, WH5701, which contains an alanine dehydrogenase. This suggests that these strains are not only capable of taking up amino acids, but can also degrade them to obtain ammonium for use in biosynthesis. The co-occurrence of the degradation genes with the transporters also is consistent with the annotation of these genes as glutamate and alanine transporters.

Figure 1
figure 1

Average number of organic compound transporter genes per genome by Prochlorococcus ecotype and Synechococcus group, based on reference genomes. HL, high-light adapted; LL, low-light adapted. Error bars are s.d.’s. Individual bar height indicates number of transporters per genome.

Mixotrophic capacity by phylogenetic group

Within the Prochlorococcus ecotypes, there is a trend of reduction in gene number of amino-acid transporters from the common ancestor with Synechococcus to the high-light II (HLII) ecotype (Figure 1). This trend suggests gradual gene loss during the genome streamlining that began during the divergence from Synechococcus and involved genes that do not confer strong selective advantages in new niches (Partensky et al., 2010; Sun and Blanchard, 2014). Because of this streamlining, transporter gene numbers are correlated with genome size (amino acid and sugar transporters Pearson correlation 0.63 and 0.53, P-value <0.01). The hypothesis of gradual organic compound transporter gene loss is supported by evolutionary reconstruction of the number of amino acid and sugar transporters from the common ancestor of Prochlorococcus to the extant strains and is consistent with the amino-acid transporter gene trees in these strains (Supplementary Figures S3 and S4).

These results suggest a reduced selection pressure for mixotrophic capacity in cells that dominate surface waters, where the environment is characterized by higher light intensities and lower nutrient concentrations than deeper euphotic zone waters. The increase in amino acid and sugar transporter genes in cells that are most prevalent in low light, open ocean environments is consistent with the use of organic compounds to supplement energy and carbon under light limitation, but also indicates that despite the potential for nitrogen limitation, surface oligotrophic ocean waters do not favor the maintenance of a full suite of organic compound uptake genes. This may be because many of the substrates for these transporters are found at such low concentrations in low nutrient surface ocean waters (Keil and Kirchman, 1999; Kaiser and Benner, 2008) that the energy and nutrients required to maintain these transporters is greater than the advantage they confer.

We next looked for patterns in the distribution of organic compound transporter genes among Synechococcus phylogenetic clusters. All major Synechococcus groups have more genes per genome for uptake of organic compounds than HLI, HLII and LLI Prochlorococcus ecotypes (Figure 1). The 5.1A Synechococcus subcluster has fewer organic uptake genes than 5.2 and 5.3 Synechococcus genomes (Figure 1, Supplementary Table S7). The 5.1A group has previously been shown to dominate the Synechococcus population in oligotrophic environments (Dufresne et al., 2008). However, recent work indicates that adaptation to oligotrophic waters may not be a characteristic of the large 5.1A and 5.1B groups, but rather a clade-level adaptation (Zwirglmaier et al., 2008; Ahlgren and Rocap, 2012; Huang et al., 2012). Thus we looked for the specific clades that dominated oligotrophic samples in the Tara data set-waters with <0.1 mg chlorophyll a per m3 as defined by Behrenfeld and Falkowski (1997). Clades II and III (both 5.1A clades) dominated the Synechococcus community in oligotrophic versus mesotrophic waters (P-values <0.05 and <0.01, respectively). The reference genomes from these oligotrophic clades had fewer organic compound transporter genes versus the other Synechococcus groups, though this difference was not significant (Figure 2a). We made the same comparison between coastal and open ocean samples, as determined by Longhurst biome (Clade II dominated open ocean waters in the Tara data set; P-value <0.05). In that case, coastal reference genomes on average contained more organic compound transporter genes than open ocean genomes (Figures 2b; P-value <0.05).

Figure 2
figure 2

Average number of organic compound transporters. Synechococcus clades were assigned to open ocean versus coastal sites (a) and oligotrophic versus meso- and eutrophic sites (b) based on abundance of their single-copy genes in the Tara data. The average number of transporter genes per genome for reference genomes in the assigned clades are shown above. Clade II was the only open ocean clade. All other clades are coastal clades. The oligotrophic clades are clades II and III. All other clades are meso/eutrophic clades. Error bars are s.d.’s. Individual bar height indicates number of transporters per genome.

Mixotrophic potential in picocyanobacteria in the global oceans

Comparison of reference genome data set with wild populations

The availability of global metagenomic data allowed us to test whether our picocyanobacterial reference isolate genomes were representative of abundances of organic compound transporter genes globally. Specifically we aimed to determine whether global averages of organic compound uptake genes per genome were similar to the average numbers of these genes in our reference data set. To this end, we estimated the average number of transporter genes per Prochlorococcus and Synechococcus genome in the global Tara data, normalized to 16S mitag abundances, and compared these estimates to normalized transporter gene numbers in reference genomes. Transporter numbers in reference genomes were averaged within each ecotype and then normalized by multiplying them by the proportion of their respective ecotype in the Tara data set, based on single-copy gene abundances. The results indicate that the Prochlorococcus reference data set is adequately representative of in situ populations (Figure 3). The overall average number of organic compound uptake genes in the Tara Synechococcus data was similar to those in reference genomes. However, the number of amino acid uptake genes was higher in the in situ data, whereas the number of sugar and peptide transporters was lower. This suggests that the Synechococcus reference genomes do not cover the diversity seen globally. Certain taxa within the Synechococcus genus that are not represented in reference genomes likely have widely varying numbers of organic compound transporters. It is also possible that taxonomic identification failed to identify in situ Synechococcus transporter genes because of the inadequacy of the reference data set, but this scenario is unlikely, due to the low homology threshold used (30% amino-acid identity) in annotating the metagenomic genes.

Figure 3
figure 3

Average number of transporter genes per genome in the Tara data set versus weighted average by ecotype in the reference genome data set. Tara refers to the overall average number of genes per genome (16S mitag) in the metagenomic data. Reference refers to the estimated average number of genes per genome based on averages of the reference genomes by ecotype weighted by the proportion of each ecotype in the Tara data set. Error bars are s.d.’s. Individual bar height indicates number of transporters per genome.

Geographic distribution of organic compound transporters

The global nature of the Tara Oceans data allowed us to assess the worldwide geographic distribution of picocyanobacterial mixotrophic capacity as we are defining it. We determined that organic compound uptake transporters assigned to Prochlorococcus and Synechococcus were ubiquitous in the data set (Figure 4). This finding provides the strongest evidence to date that the capacity for mixotrophy is the dominant trophic strategy in marine picocyanobacteria. Every sample that contained Prochlorococcus or Synechococcus 16S rRNA genes also contained picocyanobacterial peptide uptake transporter genes. There was only one sample that contained Prochlorococcus 16S genes but no Prochlorococcus amino acid or sugar uptake transporter genes—a sample from one of the three strong oxygen minimum zone samples. Similarly, one Southern Ocean sample lacked Synechococcus organic compound transporter genes. These oxygen minimum zone and Southern Ocean samples consistently had among the lowest number of Prochlorococcus and Synechococcus nutrient transporters (data not shown). Oxygen minimum zones similar to the one in question are dominated by novel LLV and LLVI Prochlorococcus lineages (Lavin et al., 2010) not present in current reference genome data sets. Organic compound uptake transporter genes may not have been detected for this reason or because of some unknown ecological difference between oxygen minimum zones and other open ocean habitats.

Figure 4
figure 4

Number of picocyanobacterial organic compound transporters per genome in the Tara Oceans data set. Ocean Data View v. 4.6.2 (Schlitzer, 2002) projection using DIVA gridding with 30 by 30 scale length. Stations are in black.

In addition to their ubiquity, picocyanobacterial organic compound transporters also represent a large portion of the total organic compound transporters in samples dominated by Prochlorococcus. In these samples, up to 13.8% of amino-acid transporters, 31.1% of peptide transporters and 6.4% of sugar transporters were assigned to picocyanobacterial taxa.

No clear differences in transporter abundances by ocean were recognizable. However, three ocean ‘hot spots’ were identified where organic uptake transporter gene abundances were very high. These stations were characterized by very low (<3%) HLII single-copy gene abundances as a proportion of all Prochlorococcus single-copy genes. The Prochlorococcus single-copy genes from each station were as follows: 95% LL and HLI strains at Station 68, 98% HLI and unassigned HL strains at Station 94 and 95% unassigned HL strains at Station 128.

Relationship of transporter gene abundances to environmental parameters

To explore the hypothesis that organic compound uptake is more advantageous to picocyanobacteria under low light and relatively higher nutrient conditions, we examined relationships between number of organic compound uptake transporter genes and environmental variables including depth, nitrate and nitrite concentration, proximity to the coast, and the proportions of the different picocyanobacterial ecotypes present. We controlled for high light ecotype abundance with partial Spearman correlations because this ecotype is strongly correlated with depth and nitrate concentration (−0.5 and −0.35 correlation, respectively; P<1e−4) and because high light ecotype single-copy genes are on average 20 times more abundant than low light ones in this data set (at depths of 50 m or more). It is important to note that the proportion of HLI and HLII genes varied between samples even when holding the total abundance of HL single-copy genes constant. We found a significant positive correlation between nitrate concentration and amino-acid transporter gene abundances (Supplementary Table S8). As low light ecotypes are rare in the Tara data, the correlation with nitrate may be due to the differential distribution of HLI and HLII ecotypes at the surface. Because there is no clear mechanistic link between increased nitrate concentrations and organic compound uptake, it is likely that the correlation is actually related to another covarying environmental variable. One possibility would be the concentrations of amino acids themselves, which are likely to be more expendable in higher nitrate environments.

In addition to correlations with nitrate concentrations, significant negative correlations were found between the proportion of single-copy genes from the HLII ecotype, and the number of organic compound transporters per genome. The ecotype correlations were consistent with results from the isolate genomes where HLII genomes have the smallest number of organic compound uptake genes (Figure 1). Finally, the mean number of transporters per Synechococcus genome was higher in coastal waters (based on Longhurst biome) than in the open ocean (Table 1).

Table 1 Difference between transporter genes per genome in coastal vs open ocean picocyanobacteria (Wilcox test)

In order to estimate the average number of organic compound uptake genes per phylogenetic group in natural populations, we assigned single copy cyanobacterial genes to groups with maximum likelihood trees (Supplementary Figure S1). We then estimated group averages by taking the mean of the number of transporters per genome in samples containing more than 50% single-copy genes from this phylogenetic group. Because LLII, III and IV ecotypes were generally very low abundance, we were not able to estimate in situ averages for them. The data show a trend of increasing number of amino-acid transporter genes per genome from the HLII to the LLI Prochlorococcus ecotypes and no clear trend for sugar and peptide transporters, consistent with results from cultured strains (Table 2). The organic compound uptake transporter ‘hot spots’ previously identified are also consistent with this trend, as they are characterized by an unusually low proportion of HLII single-copy genes.

Table 2 Estimates of global averages (±standard deviation) of organic compound uptake transporter genes per genome, by substrate, for different Prochlorococcus ecotypes

Diversity of Tara Oceans picocyanobacterial organic compound transporters

The elevated numbers of organic compound transporters in Prochlorococcus and Synechococcus that live in lower light, higher nutrient environments relative to other strains could be indicative of several different types of evolutionary strategies. Genome streamlining within the Prochlorococcus lineage may have eliminated paralogous transporter genes that served to increase expression in a common ancestor. Alternatively, multiple transporters may serve to transport the same general substrate (for example, amino acids) but different specific substrates (for example, glutamate versus aspartate), allowing the cells to take advantage of whichever resource is currently available. To differentiate between these possibilities, we looked for paralogs. This analysis found only one paralogous transporter gene in the reference genomes: the 2.A.27.2.1 probable Glu/N-acetylglutamate uptake porter (Supplementary Figure S3). In all other cases, higher numbers of transporters per genome were associated with a higher diversity of transporter classes and substrates (Figure 5). The same is true of the metagenomic data: samples with high numbers of picocyanobacterial transporters per genome also have a more diverse set of picocyanobacterial transporter classes and putative substrates (Figure 6; Spearman P-value <0.05). Collectively, these observations suggest that picocyanobacterial strains with more transporter genes have the capacity to take up a wider variety of organic substrates. The association of increased diversity with increased number of transporter genes indicates that the cyanobacteria with more of these genes have a more generalist trophic strategy.

Figure 5
figure 5

Classification of organic compound transporter genes in reference genomes. Average number of transporter genes of each TCDB transport system per Prochlorococcus ecotype or Synechococcus cluster. Error bars are s.d.’s. (a) Amino-acid transporters, (b) peptide transporters and (c) sugar transporters.

Figure 6
figure 6

Diversity of organic compound transporter genes in Tara Oceans metagenomic data. Number of transporter genes per Prochlorococcus or Synechococcus genome in each sample versus number of transporter gene systems or families in the sample, as defined by the TCDB. The line shows the mean number of transporters per genome. (a) Prochlorococcus amino-acid transporter gene families. (b) Synechococcus amino acid transporter gene families. (c) Prochlorococcus peptide transporter gene system. (d) Synechococcus peptide transporter system.

Conclusions

Picocyanobacterial mixotrophy genes are more abundant in low light-adapted ecotypes of Prochlorococcus and coastal groups of Synechocococcus. The differences among phylogenetic groups can be attributed to the loss of a diverse subset of transporter genes as picocyanobacteria expanded to new, high light, more oligotrophic habitats. Because of this evolutionary trajectory, we infer that a mixotrophic strategy is most advantageous to marine picocyanobacteria in low light, mesotrophic environments such as the deep euphotic zone above the deep chlorophyll maximum. These environments have higher nutrient concentrations than the surface ocean, but are still likely nutrient limited ted. ndances (Cullen, 2015). Under these conditions, cyanobacteria can supplement their energy stores with organic carbon compounds and take advantage of alternative nitrogen sources that are readily available in amino acids and peptides. Results for Prochlorococcus are compelling, but the lack of a good reference database necessitates further work on Synechococcus.

In addition to demonstrating a clear relationship between phylogeny and capacity for organic compound uptake in the genomes of cultured isolates, we have also determined that mixotrophic capacity is ubiquitous in the Tara Oceans metagenomic data set, covering all major oceans. We conclude from the near-universal distribution of picocyanobacterial genes involved in organic compound transport, that mixotrophy is a widespread strategy among these phototrophs. Given that picocyanobacteria are estimated to contribute 25% of global marine net primary productivity (Flombaum et al., 2013), their potential contribution to the assimilation of organic carbon could be significant. Furthermore, because picocyanobacteria and obligate heterotrophs make up the vast majority of marine prokaryotic communities, we postulate that almost all prokaryotes in the surface oceans are heterotrophs or mixotrophs, a finding that calls for revision of oceanic carbon and energy flux estimates between trophic levels.