Introduction

Members of the acI lineage within the phylum Actinobacteria are an intriguing group of free-living ultramicrobacteria that dominate many freshwater ecosystems (Newton et al., 2011), including high-nutrient eutrophic (Wu et al., 2007), low-nutrient oligotrophic (Humbert et al., 2009) and dystrophic lakes (Newton et al., 2006). They are also abundant in some marine estuaries (Glockner et al., 2000). Moreover, as a lineage, acI shows a smaller seasonal abundance variation when compared to other major freshwater bacteria (Allgaier and Grossart, 2006; Salcher et al., 2010; Newton et al., 2011; Eckert et al., 2012; Rösel and Grossart, 2012). Due to their high abundance and high metabolic activity (Warnecke et al., 2005; Allgaier and Grossart, 2006; Salcher et al., 2010), the acI bacteria likely have a critical role in carbon and other nutrient cycling in freshwater systems, yet their genomic features remain elusive. This is in part due to the historical lack of isolated representatives and limited genomic data.

On the basis of 16S rRNA gene sequence analysis, the acI lineage comprises three distinct clades (A, B and C; Newton et al., 2011). Each of these clades has been further subdivided into ‘tribes,’ members of which share 97% 16S rRNA gene sequence identity (in total 13 such tribes). The most abundant and prevalent tribes are acI-A1, acI-A6, acI-A7 and acI-B1 (Newton et al., 2011). The tribes seem to niche partition based on pH, with some clades (for example, acI-A1 and acI-B2) preferring slightly more acidic environments (Newton et al., 2007).

Several single-cell targeted studies have attempted to elucidate acI’s capability to take up and consume specific carbon sources, using fluorescent in situ hybridization (FISH) combined with catalyzed reporter deposition and microautoradiography (MAR). These studies have shown that acI members can take up glucose (Buck et al., 2009), leucine (Buck et al., 2009; Perez et al., 2010; Salcher et al., 2010; Eckert et al., 2012), acetate (Buck et al., 2009), thymidine (Perez et al., 2010), N-acetylglucosamine (NAG) (Beier and Bertilsson, 2011; Eckert et al., 2012) and di-NAG (Beier and Bertilsson, 2011; Eckert et al., 2012; Tada and Grossart, 2014). A recent MAR-FISH study performed on fall samples from Lake Zürich in Switzerland showed that acI bacteria consumed an amino acid mixture as well (Salcher et al., 2013). The acI in this study did not take up acetate, fructose, arginine, aspartate, glutamate, glutamine, serine, glycine or alanine. However, due to the inherent limit of phylogenetic resolution associated with rRNA-targeted FISH probes, many of these studies generally cannot attribute the ability to uptake key substrates to clades or tribes within the lineage. Only a few studies have used FISH probes that differentiate between members of the acI-A and acI-B clades. One such study, performed by Buck et al., showed evidence of substrate-based niche partitioning, where acI-B-positive cells could consume acetate but acI-A-positive cells could not (Buck et al., 2009).

A few recent studies have used (meta)genomics-based analysis to study acI Actinobacteria. Ghai et al. analyzed metagenomic data from lakes, estuaries (Ghai et al., 2012), and rivers (Ghai et al., 2011), and found that acI members have a lower than expected GC genomic content. Martinez-Garcia et al. (2012) used single-cell genomics and found that the origins of more than 80% of rhodopsins found among lake bacteria belong to the actinobacteria. Another study reported the first nearly complete acI genome from the acI-B1 tribe obtained using single-cell genomics (Garcia et al., 2013). In short, the genome was small (estimated to be just over 1 Mbp) and low GC (42%). Metabolic reconstruction indicated that members of acI-B1 are facultative aerobes with a capability for taking up and metabolizing pentoses such as xylose. The authors also confirmed an actinorhodopsin gene in the genome.

Here, we greatly expand on the single-cell genome-based analysis (Garcia et al., 2013) by analyzing 10 additional acI single-amplified genomes (SAGs) from three different acI tribes and four different lakes. The additional genomes allowed us to confirm key gene functions and genomic features (for example, GC content) that are conserved within the lineage while also revealing clade-specific differentiation. We also compare the gene content of acI genomes to that of cultivated members of their parent order, Actinomycetales, and to that of other abundant freshwater bacteria. This kind of analysis was not possible with only one draft genome (the previously published acI-B1 SAG called SCGC AAA027-L06). Our data show that acI members have unique traits that likely provide a competitive advantage when scavenging for energy, carbon, nitrogen and phosphorus in freshwater habitats. The presence of several genes involved in the uptake and metabolism of organic nitrogen compounds suggests that N-rich organic matter may be a significant source of carbon, nitrogen and energy for acI biosynthesis.

Materials and methods

SAG generation and selection

Water samples (1 ml) were collected from the upper 0.5 to 1 m of each of four lakes (Mendota, Sparkling, Damariscotta, Stechlin) and cryopreserved (Supplementary Table S1), as previously described (Martinez-Garcia et al., 2012; Garcia et al., 2013) (Supplementary Online Material). Bacterial SAGs were generated and identified at the Bigelow Laboratory Single Cell Genomics Center (SCGC; http://www.bigelow.org/scgc), as detailed in Martinez-Garcia et al. (2012) (Table 1).

Table 1 AcI SAG metadata and overall features

Ten SAGs from lakes Mendota, Sparkling and Damariscotta were selected during the SAG library screening step described in Martinez-Garcia et al. (2012). Partial 16S rRNA genes amplified previously (Martinez-Garcia et al., 2012) were phylogenetically classified to the freshwater ‘tribe’ level by insertion into reference trees in the ARB software package (Ludwig et al., 2004; Newton et al., 2011). SAGs were selected in order to compare among acI tribes and source lakes. The one SAG from Lake Stechlin was selected from a separate library constructed at the SCGC as described in Garcia et al. (2013) and its 16S rRNA gene was 100% identical to the acI-B1 SAG previously analyzed (AAA027-L06). Phylogenetic reconstruction was conducted by maximum likelihood (RAxML; Stamatakis et al., 2008) with 1000 bootstrap runs on the CIPRES web portal (www.phylo.org) using near full-length reference 16S rRNA gene sequences from a manually curated alignment (Newton et al., 2011) and a 50% base frequency filter (total 1402 positions; Figure 1). For four SAGs only short fragments (400 bp) were available. Bootstrap values are indicated above nodes with greater than 50% support and the scale bar represents 10 base substitutions per 100 nt positions.

Figure 1
figure 1

Phylogenetic placement of the SAGs within the acI lineage and relative to other sequenced actinobacterial genomes, and to the previously sequenced AAA027-L06 SAG based on nearly full-length 16S rRNA gene sequences. When only short (400 bp) amplified gene fragments were available (Martinez-Garcia et al., 2012), sequences were added after tree construction within the ARB Software using the maximum parsimony criterion (Ludwig et al., 2004). These shorter sequences are noted along with their corresponding accession number. Shaded sequences are from SAGs.

Genome sequencing, assembly, contamination detection and annotation

The draft assemblies of the single-cell genomes were generated at the DOE Joint genome Institute (JGI) using the Illumina technology. An Illumina standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform by pooling libraries for approximately 10 SAGs per lane. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. Raw Illumina sequence data were filtered for known Illumina sequencing and library preparation artifacts and then screened and trimmed according to the k-mers present in the dataset. Reads representing highly abundant k-mers were removed such that no k-mers with a coverage of more than × 30 were present after filtering. Contigs with an average k-mer depth of less than × 2 were removed. The following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet version 1.1.04 (Zerbino and Birney, 2008). The VelvetOptimiser script (version 2.1.7) was used with default optimization functions (n50 for k-mer choice, total number of base pairs in large contigs for cov_cutoff optimization). (2) 1 to 3 kbp simulated paired end reads were created from Velvet contigs using the wgsim software. (3) the normalized Illumina reads were assembled together with simulated read pairs using Allpaths-LG (version 41043) (Gnerre et al., 2011).

We employed a combination of tetramer principal component analysis and blast searches against reference databases to identify contigs that may originate from DNA contaminants (Woyke et al., 2009). No putative contaminants were found in any of the assemblies.

Genes were identified using Prodigal (Hyatt et al., 2010). The predicted CDSs (coding DNA sequences) were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, KEGG, Clusters of Orthologous Group (COG) and InterPro databases. The tRNAScan-SE tool (Hacker and Kaper, 2000) was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA (Pruesse et al., 2007). Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching genomes for the corresponding Rfam profiles using INFERNAL (Makarova et al., 1999). Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG; Markowitz et al., 2012) platform developed by the Joint Genome Institute, Walnut Creek, CA, USA (http://img.jgi.doe.gov).

Genome completeness and size estimates

Genome size and completeness were estimated using a conserved single-copy gene (CSCG) set that has been determined from all finished actinobacterial genome sequences (n=151) in the IMG database (Markowitz et al., 2012). The set consists of 158 CSCGs that were found to occur only once in at least 95% of all genomes by analysis of an abundance matrix based on hits to the protein family (Pfam) database (Punta et al., 2012). Hidden Markov models of the identified Pfams (Supplementary Table S2) were used to search all SAG assemblies by means of the HMMER3 software (Eddy, 2011). Resulting best hits above the trusted cutoff (TC field as provided in the HMM files from Pfam) were counted and the completeness was estimated as the ratio of found CSCG to total CSCGs in the set after normalization to 95%. The estimated complete genome size was then calculated by dividing the estimated genome coverage by the total assembly size.

Comparative analysis

Annotations with the COGs of proteins for each SAG and comparison genome from the Actinomycetales order, Polynucleobacter genus, and freshwater Alphaproteobacteria LD12 tribe were downloaded from the IMG website (http://img.jgi.doe.gov/; Supplementary Table S3). A presence/absence COG list was generated and used to determine COG prevalence (% of genomes containing a certain COG). The data were used to determine COG prevalence (% of genomes containing a certain COG) in acI as well as the other organisms of interest. The COG list was then sorted based on difference in prevalence (% prevalence in acI divided by % prevalence in other group) between acI and a group of interest. COGs at the top of this sorted list had the greatest difference in prevalence between acI and the group of interest and were considered overrepresented in acI.

Average protein identity between SAG pairs was calculated using all-versus-all blastp from predicted coding sequences in all 11 SAGs to identify best hits for each SAG pair. Results were parsed using custom perl scripts to identify reciprocal blast best hits that had alignments over at least 50% of both the query and the subject sequence lengths. Averages were calculated for each pair based on the reciprocal blast best hit results.

Codon bias

For each genome, we calculated the frequency of each amino acid codon out of the total codons encoding the corresponding amino acid, once in the ribosomal protein genes and once in the rest of the genes. We then calculated the average difference between codon frequencies in the ribosomal proteins and in the rest of the proteins in the genomes. In order to calculate the genome, CAIave (Codon Adaptation Index) for each organism we computed the average of CAI values overall genes in the genome (Rocha, 2004). The CAI of a gene was computed as described by (Sharp and Li, 1987).

Results and discussion

Genome statistics and phylogenetic affiliation

We sequenced ten acI SAGs and analyzed them along with the one previously sequenced SAG (Garcia et al., 2013). These 11 SAGs belong to three different acI tribes and originate from four different lakes. A phylogenetic tree of these SAGs is presented in Figure 1. Our genome completeness estimates ranged from 34% to a nearly complete genome. Only four of the 158 Pfams used for genome completion estimates were absent in all 11 SAGs (Supplementary Table S2). Back calculating the estimated genome size of the SAGs, it could be observed that all 11 of them were small (1–2 Mbp) and had relatively low GC% (40 to 48%). This low GC% had been observed already in metagenomic assemblies (Ghai et al., 2012). The previously analyzed acI-B1 genome also determined low GC content (42%) and small genome size (1.16 Mbp; Garcia et al., 2013). Our comparative genomic analysis confirms that these features are broadly representative of the acI lineage and further distinguishes among the acI tribes, with acI-B1 having the lowest GC (Table 1 and Supplementary Table S4). The average genome completeness was 68% (based on the fraction of CSCGs). Therefore, we would expect to encounter a core, single-copy gene in about seven of the 11 SAGs.

acI metabolism: how does the acI lineage make a living in freshwater?

Carbon and energy

The previously published AAA027-L06 acI-B1 genome indicated a facultative aerobic lifestyle with mostly complete archetypal central metabolism (glycolysis, pentose phosphate pathway and citrate cycle), oxidative phosphorylation machinery, and the ability to ferment pyruvate (Garcia et al., 2013). These and other features were largely conserved among our ten newly sequenced acI SAGs (Figure 2), with exceptions attributable either to incomplete genome recovery or niche diversification among clades or tribes (discussed further below). As noted for AAA027-L06, the SAGs seem capable of metabolizing glucose but lack an obvious glucose transport system. This is curious in light of MAR-FISH-based studies clearly demonstrating the ability of acI to incorporate glucose (Buck et al., 2009; Salcher et al., 2013). However, most SAGs did contain ABC-type sugar transport components (for example, COG1653) (Table 2 and Supplementary Table S5) and using the Transporter Classification Database (Saier et al., 2009), a periplasmic component with closest match to the glucose-binding protein in Thermus thermophilus (TC#3.A.1.1.24) was found in eight SAGs but was missing in AAA027-L06. Thus, we cannot confirm nor refute the ability of all acI members to take up glucose based on genomic evidence alone. Other transporters that the 11 acI SAGs have in common, include ABC-type transporters for ribose/xylose/arabinose/galactoside, polyamines, dipeptides and branched-chain amino acids (Table 2). The polyamine transporters are likely being used for putrescine uptake, as the SAGs harbor downstream pathways for its eventual conversion to succinate via the transamination pathway (Dasu et al., 2006; Chou et al., 2008); (Figure 2). Genes for the uptake of carboxylic acids in acI-A1, acI-A7 and acI-B1 were not found in any of the sequenced genomes, consistent with MAR-FISH-based studies (Buck et al., 2009; Salcher et al., 2013).

Figure 2
figure 2

Central carbon metabolism and other relevant metabolic pathways identified in acI SAGs. Circles denote genes encoding the necessary enzymes are present in that clade (blue for acI-A and green for acI-B), with the size of the circle being proportional to the percentage of SAGs within that clade that were found to contain that gene. The presence and absence of genes was determined within the IMG environment based on KEGG annotations.

Table 2 Select DOC uptake COGs in acI, LD12, PnecC and Actinomycetalesa

Most SAGs also contained a putative cyanophycinase (COG4242), allowing acI to access this C- and N-storage compound synthesized by cyanobacteria. Homologs of cyanophycinase are typically found in cyanobacteria where they are used to break down an intracellular granular C and N storage molecule, cyanophycin (Richter et al., 1999). Cyanophycin synthetase and/or cyanophycinase homologs have been detected in about 10% of heterotrophic genomes analyzed (Krehenbrink et al., 2002; Fueser and Steinbuechel, 2007). The acI SAGs analyzed here have the genes required to break down cyanophycin granules and take up the resulting dipeptides and amino acids as a source of energy and N. However, it is unclear whether this cyanophycinase can be secreted, as we were unable to identify an obvious secretion signal peptide. Secreted cyanophycinases have been reported in both Gram negative and Gram positive soil bacteria (Obst et al., 2002, 2004; Sallam et al., 2011), but in these cases signal sequence cleavage was demonstrated or inferred. Although the SAG genomes encode canonical Sec pathway components as well as twin arginine translocation genes, the predicted cyanophycinase polypeptides do not have these signal translocation sequences. Similarly, sortases that would anchor secreted proteins to the cell wall were predicted in the acI genomes, but the sortase motif was not found in the cyanophycinase polypeptides. We did not identify homologs of genes involved cyanophycin synthesis. The potential ability of acI to break down cyanophycin hints at a potential interaction between cyanobacteria (or other microbes making cyanophycin) and acI, whereby acI members acquire energy, C and N from this polymer synthesized by others. Curiously, we found two tandem copies of cyanophycin synthetase in each of the two Polynucleobacter genomes but not in acI. This has implications for potential interactions between acI and Polynucleobacter when they co-occur (that is, Polynucleobacter synthesizing cyanophycin and acI breaking it down). Co-occurrence of acI and Polynucleobacter sp. has already been observed in laboratory enrichments (Garcia et al., 2014). However, this kind of interaction is only speculative until experimental confirmation can be made.

Others have found that acI abundances change through the seasons with maxima in spring and fall (Allgaier and Grossart, 2006) and that acI abundance positively correlates with solar radiation (Warnecke et al., 2005), suggesting they gain some benefit from higher light intensities either through phototrophy or consumption of photochemically produced labile dissolved organic carbon (DOC). The latter possibility is contradicted by the apparent inability of acI to utilize carboxylic acids (see above), a major product of photochemical DOC reactions. Instead, previous work suggested that actinorhodopsins are broadly distributed in freshwater Actinobacteria and provide the potential for phototrophy (Sharma et al., 2008, 2009; Martinez-Garcia et al., 2012; Garcia et al., 2013; Salka et al., 2014). We confirmed the presence of actinorhodopsin in all but three of the eleven acI SAGs (AAA023-J06, AAA028-A23 and AB141-P03) making this gene a likely part of the broadly shared genes of the acI lineage. Moreover, enzymes required for synthesis of the presumed actinorhodopsin chromophore, retinal or its precursors were identified in 10 of the 11 acI SAGs, suggesting the likely assembly of functional rhodopsin in vivo. In the acI SAGs, the four enzymes that lead from farnesyl pyrophosphate, the ubiquitous sterol precursor, to β-carotene, the bicyclic conjugated chromophore, (Martinez et al., 2007) are most commonly encoded in a cluster. The final pathway enzyme, a β-carotene cleaving oxygenase, was somewhat surprisingly sequenced only in two of the 11 SAGs, and not co-located with the rhodopsin gene as it is in many marine proteobacteria (Martinez et al., 2007; Riedel et al., 2013; Vollmers et al., 2013). However, one of these SAGs places the oxygenase near a β-carotene pathway gene, likely indicating that it once composed part of a full pathway operon. The genes necessary for synthesis of a carotenoid glycoside ester sit just upstream of the cluster, suggesting a secondary chromophore for actinorhodopsin similar to the salinixanthin discovered hydrophobically clinging to xantho- and Gleobacter rhodopsins (Luecke et al., 2008; Imasheva et al., 2009). At the protein level, a change of bacteriorhodopsin’s tryptophan 138 to a glycine provides space for the four-keto ring of such a putative secondary chromophore, and six of the hydrophobic amino acid side chains that would interact with this putative rhodopsin-hugging photon-funnel are shared between the actinorhodopsins and xanthorhodopsin (Balashov et al., 2010). This has exciting implications for the role of a chromophore other than retinal in members of the acI lineage, though further experimental work is needed to confirm whether such a chromophore is actually synthesized by acI.

Rhodopsin in Pelagibacter ubique and other marine bacteria promotes survival during nutrient starvation periods (Gomez-Consarnau et al., 2007; DeLong and Beja, 2010; Gómez-Consarnau et al., 2010; Steindler et al., 2011); thus actinorhodopsin may serve a similar function in acI. In such an auxiliary system, it remains to be explored why there is not co-localization of the rhodopsin gene, the oxygenase gene and the core carotenoid synthesis operon, in acI genomes. Interestingly, actinorhodopsin expression in a German lake was not linked directly to sunlight but rather to a circadian schedule with a maximum expression rate just before dawn (Wurzbacher et al., 2012). Comparative genomic analysis also revealed the presence of genes involved in anaplerotic carbon fixation (carbonic anhydrase and phosphoenolpyruvate carboxylase). The acI SAGs lack the RuBisCO enzyme and other pathways for carbon fixation, but phosphoenolpyruvate carboxylase may provide them with the ability to synthesize oxaloacetate, an intermediate in the tricarboxylic acid cycle cycle that can be used to replenish precursors needed for growth (Figure 2). This may provide acI with the ability to grow photoheterotrophically using actinorhodopsin, though we can only speculate in the absence of experimental evidence. Notably, a marine Bacteroidetes (Polaribacter) lacking RuBisCO but harboring proteorhodopsin has been shown to increase CO2 fixation rates in the light (González et al., 2008).

Nitrogen, phosphorus and sulfur

As reported previously for the acI-B1 SAG AAA027-L06, the lineage appears to lack identifiable genes involved in sulfate, sulfite, nitrate and nitrite assimilation. Seven SAGs did harbor ammonia permease (COG0004), suggesting free ammonia is an N-source for some or all acI members. We found no evidence for urea transport or catabolism. The 11 SAGs do not harbor a complete urea cycle among them. However, genome content across the lineage shows the capability to consume N-rich carbon sources, including polyamines, cyanophycin, di- and oligopeptides and branched-chain amino acids such as leucine, isoleucine and valine. This has intriguing implications for acI’s success in freshwater, as it seems to obtain both C and N from the same substrate compounds. This observation would build on previous observations based on MAR-FISH using probes targeting the entire Actinobacteria phylum suggesting that acI contributes more than would have been expected based on its biomass to total amino acid turnover in lakes (Salcher et al., 2010). Indeed, genes involved in acquisition of these N-rich compounds also generally have high codon bias in the SAGs (Supplementary Table S6), indicating possibly high expression rates. Some prior research has focused on organic-N concentrations or sources for freshwater bacterioplankton. Buck et al. (2009) used MAR-FISH to test glucose, leucine and acetate assimilation in freshwater. They found that Actinobacteria had the highest percentage of cells incorporating leucine but not glucose or acetate, indicating a preference for nitrogen-rich organics. Salcher et al. (2013) also found that acI was active in leucine uptake. Some MAR-FISH studies have suggested that another nitrogen-rich compound, NAG may be an important acI substrate (Beier and Bertilsson, 2011; Eckert et al., 2012, 2013). However, we could not identify machinery to support the ability to take up and incorporate this substance. This may well simply be due to the fact that the biochemical pathways involved have not been identified. It is probable that one of the many COGs involved in the uptake of carbohydrates enables NAG uptake (Table 2).

We found no evidence of genes for sulfite/sulfate assimilation or reduction in the acI SAGs, as was previously reported for AAA027-L06, suggesting the proposed dependence on cysteine synthase for sulfur incorporation holds for the entire lineage (Garcia et al., 2013). Cysteine synthase was found in eight of the eleven SAGs. AcI’s apparent reliance on reduced sulfur compounds is an interesting similarity to that of the ubiquitous marine SAR11 clade, which was the first free-living bacteria recognized to rely on reduced sulfur for growth (Grote et al., 2012; Carini et al., 2013).

We searched for evidence of phosphorus acquisition strategies that may allow for high-affinity transport (Pst) or low-affinity high rate of transport (Pit). Six of the acI genomes have a PstSCAB transport system. Three other genomes had 75% of these genes; the partial gene absence was likely due to incomplete genome recovery. Only the two most incomplete SAGs were missing the majority of the Pst complex. The associated regulatory protein PhoU and phosphorus starvation activated protein PhoH were also found in most of the genomes. The presence of the Pst complex (in lieu of low-affinity Pit phosphorus transport complex) provides acI with the ability to acquire phosphorus in low-nutrient settings, further indicating that these organisms are well adapted to phosphorus limitation, which is a common feature of these and most freshwater lakes (Magnuson et al., 2006). We could not find any evidence for the ability to metabolize phosphonates in the acI SAGs.

Shared and differential gene content involved in heterotrophic growth among freshwater bacteria

We asked whether the differential gene content among sequenced freshwater bacterial genomes might also explain the ecological success of acI during growth. First, we examined DOC uptake Clusters of Orthologous Groups (COGs) for acI versus other Actinomycetales (424 genomes) and freshwater bacteria such as LD12 Alphaproteobacteria relative of the marine SAR11 (10 genomes) and Polynucleobacter sp. (two genomes; Table 2 and Supplementary Table S5). We found that several COGs involved in polyamine uptake and metabolism were more common in acI than in the other analyzed genomes. In fact, the 10 LD12 genomes did not contain any of these COGs. Other interesting COGs that were more common in acI were nucleotide transporters, several sugar transporters and two ribose/xylose transporters. Overall, acI had more COGs for general carbohydrate transport than LD12 and Polynucleobacter.

As described above, the acI genomes lacked carboxylic acid and dicarboxylic acid transporters while they were present in Polynucleobacter and LD12. Compared with the other freshwater bacteria, general amino acid uptake COGs were underrepresented in acI SAGs and COGs for di- and oligopeptides were overrepresented. The di- and oligo-peptide transporters may have some connection with the cyanophycinase genes found in acI (and not in LD12 or Polynucleobacter). COGs involved in lipid transport were underrepresented in acI (as compared with LD12 and Polynucleobacter), with only the glycerol transporter, COG1133 (long chain fatty acid transport) and COG2867 (Oligoketide cyclase/lipid transport protein) present.

Taken altogether, these findings provide genome-level confirmation of experimental observations (Salcher et al., 2013): the differences in types of DOC uptake genes found in each lineage indicate potential specialization in substrate preferences and ecological niches. AcI genomes indicate specialization in polyamines, di- and oligopeptides and carbohydrates (including pentoses). LD12 genomes indicate specialization in lipids and carboxylic acids, whereas Polynucleobacter specializes in general carbohydrates and carboxylic acids.

Comparative genomics hints at the basis for ecological differentiation within the acI lineage

We sought to determine whether differences in gene content might show potential niche separation between members of the acI lineage (for example, separation of acI-A vs acI-B) (Figure 3, Supplementary Table S7). It should be noted that this analysis is preliminary due to the incompleteness of the SAGs. Little is known about the ecology of the acI tribes as most previous studies lacked phylogenetic resolution to the tribe level. However, Newton et al. (2007) found that acI tribes differentiated based on pH with acI-B2 and acI-A1 correlating with acidic lakes and acI-B1 and acI-A6 correlating with alkaline lakes. The ratio of terrestrially sourced carbon to in-lake-derived carbon also seems to predict tribe distribution, with acI-B1, -A6, and -A7 generally preferring lakes with lower color-to-chlorophyll (CtCh) ratios as compared with other tribes (Jones et al., 2009). This ratio is calculated using measurements of chromophoric dissolved organic matter (by light absorbance at 440 nm) and chlorophyll-a concentrations, making it a useful proxy for the extent of allochthony of a lake ecosystem. Interestingly, acI-A1 is found in lakes with both low and high CtCh. As the tribes included in the current study are usually found in lakes with low CtCh, it remains to be learned whether other tribes (for example, acI-B2, acI-A4, acI-A5) have different gene content that relates to life at high CtCh.

Figure 3
figure 3

Distribution of 1193 COGs found in the acI lineage based on the analysis of 11 SAGs. Shared and differential COG content among acI tribes within the clades based on the comparison of two acI-A1 SAGs, three acI-A7 SAGs and six acI-B1 SAGs. Markedly, more COGs are shared among acI tribes than are unique to any particular tribe. We note that the probability of an acI ‘shared COG set’ gene being missing from all the genomes due to incomplete genome recovery is 0.000003% for all of acI, 0.003% in acI-B1, 2.7% in acI-A1 and 4.1% in acI-A7, for any single COG.

Members of the acI-A clade contained markedly more unique COGs than members of the acI-B clade (compare Supplementary Table S7 with Supplementary Table S8), suggesting they are more metabolically versatile. However, this might be explained by the fact that two acI-A tribes are represented in our SAG collection as compared with one acI-B tribe. Nevertheless, we found that the alpha and beta-galactosidases were common to acI-A1 and acI-A7, but were absent in all acI-B1 SAGs (Supplementary Table S7). The presence of these genes in acI-A SAGs suggests a specialization in breaking down poly- or oligo-saccharides as compared with acI-B. Interestingly, an overrepresentation of glycosidases such as alpha and beta-galactosidases was found in freshwater systems compared with marine systems (Eiler et al., 2013). This effect may be largely due to the presence of the ubiquitous acI-A clade in freshwater. Zakharova et al. (2013) studied bacteria associated with diatom degradation in the near bottom water layer of Lake Baikal and found high beta-galactosidase activity. They also found that Actinobacteria were a significant portion of the bacterial community. The acI-A clade also had two glycerol uptake COGs that were not found in acI-B1. The acI-A7 tribe had three COGs involved in amino acid uptake as well as an NAG kinase and oligopeptide transporter not found in acI-A1 or acI-B1. Both clades share what appears to be an ABC transport system for ribose (COG1172), but the acI-B1 genomes uniquely carried an auxillary component (ribose pyranase - COG1869) that may give this tribe access to certain pentose isomers (Supplementary Table S8). The acI-B1 SAGs did carry an aromatic ring-cleaving dioxygenase (DOPA 4,5-dioxygenase) that was not found in acI-A or any other Actinomycetales (COG3805) but is commonly found in fungal genomes. Homologs of this protein are key for the biosynthesis of pigments such as betalains (red and yellow pigments) in plants, suggesting the role of a unique pigment in acI-B1.

We note that based on the genome recoveries (Table 1), the probability of an acI ‘shared COG set’ gene missing from all the genomes due to incomplete recovery is 0.000003% in acI, 0.003% in acI-B1, 2.7% in acI-A1 and 4.1% in acI-A7, for any single COG. These probabilities were calculated by multiplying the probability of a gene missing in each genome (that is, same as percent of genome incompleteness). For example, the acI-A1 genome SCGC AAA027-M14 is only 41% complete, therefore the probability of a ‘shared COG set’ gene missing from the genome is 59%. The probability of the same gene missing from the acI-A1 genome SCGC AAA278-O22 is 4% and the probability of the gene missing from both genomes due to incompleteness is 2.7% (that is, 59 × 4%). These probabilities indicate that the ‘shared COG set’ based on the SAGs for acI and acI-B1 are essentially complete, but that the ‘shared COG set’ for acI-A and corresponding tribes is likely to expand as much as 4.1% with the sequencing of more genomes.

The average protein similarity between the acI SAGs indicated diversity within clades and a range in variation within tribes, at the protein-coding gene level (Supplementary Table S9). The average amino acid identity across tribes (65–70% acI-A1 vs acI-A7) indicate that the acI-A clade is similar to a traditional taxonomic genus, which has been estimated to have 65%–70% average protein identity (Konstantinidis and Tiedje, 2007). The comparison of amino acid identity across clades (60–65% amino acid identity acI-B1 vs acI-A1 and acI-A7) indicates that the acI lineage is comparable to a traditional taxonomic family.

Shared gene content within acI distinguishes the lineage from other Actinomycetales

Any genes that distinguish acI clades from their relatives within the Actinomycetales are good candidates for further inquiry into their role in acI ecophysiology, as they may explain why acI is such an omnipresent and successful resident of freshwater systems (Newton et al., 2011). However, we identified only two COGs that were found in acI to the exclusion of all other sequenced Actinomycetales: an aromatic ring-cleaving dioxygenase (COG3805) and an uncharacterized conserved protein (COG2859). The aromatic ring-cleaving dioxygenase (COG3805) was found only in acI-B1 and may provide the ability to consume difficult aromatic compounds for energy and C. Its best match by blastp was to Pfam PF08883 (DOPA 4,5-dioxygenase family protein) from the cyanobacterium Scytonema hofmanni (59% similarity). Homologs are usually involved in the synthesis of chromophores (betalamic acid), but we were unable to identify other genes in that pathway in acI-B1, therefore its function in acI-B1 is unknown. Overall, the phylogenetic distribution of best blast hits identified in IMG is mostly restricted to genes within other Actinobacteria.

We identified the top 25 COGs that are overrepresented in the acI SAGs as compared with the 424 Actinomycetales reference genomes available in the IMG database in March 2013 (Table 3). Bacteriorhodopsin (COG5524) was the most overrepresented COG in this analysis, and represents the putative light-harvesting protein previously identified as actinorhodopsin (Sharma et al., 2008, 2009; Martinez-Garcia et al., 2012; Wurzbacher et al., 2012). The aforementioned cyanophycinase (COG4242) was another COG overrepresented in the acI lineage. Other notable overrepresented COGs were those involved in the uptake and metabolism of polyamines (spermidine/putrescine) and amino acids as well as glycosyl hydrolases (carbohydrate breakdown), nicotinamide mononucleotide transporter (source of pyridine) and inorganic pyrophosphatase (breaks down pyrophosphate into two molecules of phosphate). Thus, acI genomes look like highly streamlined versions of ‘typical’ Actinobacteria that include some genes (with best matches to other phyla) that allow it to specialize by taking advantage of sunlight and N-rich organic compounds.

Table 3 COGs overrepresented in acI as compared withwith non-acI Actinomycetales (Actm)

Conclusions

Members of the acI lineage are clearly specialized relative to their parent order (Actinomycetales) and other sequenced freshwater bacteria. Their highly streamlined genomes and small cell size suggest they share broad niche dimensions with ultramicrobacteria such as freshwater members of the SAR11 clade. Although many characteristics of the SAGs analyzed here were consistent with the only other previously published acI genome (AAA027-L06) (Garcia et al., 2013), investigating three different tribes showed ecological differentiation among them. Confirmation of the characteristics discovered in AAA027-L06 within multiple SAGs representing three tribes provided more conclusive evidence of features conserved within the lineage. We identified features that hint at a preference for N-rich compounds (polyamines, di- and oligopeptides, branched-chain amino acids, cyanophycin) as well as the potential for some level of photoheterotrophic metabolism. Our findings form a rich foundation for further study of the acI lineage using techniques such as metatranscriptomics combined with experimentation.