Introduction

Studies on specific phages infecting dominant freshwater microbes remain scarce due to the obstacle of obtaining them in axenic culture. Only freshwater cyanophage genomes have been described, six from culture (Yoshida et al., 2008; Chenard et al., 2015) and a few from metagenomic assembly (Skvortsov et al., 2016). However, phages infecting all other major abundant phyla that are typical inhabitants of freshwaters (for example, Actinobacteria) remain totally unexplored. Metagenomics provides a culture-free approach to recover phage genomes and has been successful in several environments ranging from hypersaline (Garcia-Heredia et al., 2012) and marine waters (Mizuno et al., 2013a, 2013b) to the human gut (Dutilh et al., 2014).

Recovery of phage genomes

We were interested in a first glimpse of phages infecting freshwater Actinobacteria, the most abundant microbes in this habitat. We have assembled sequence data from a metagenome and also sequenced a fosmid library (ca. 5500 fosmids) from the freshwater reservoir Amadorio (Alicante, Spain; Supplementary Methods). We focussed only on assembled contigs >10 Kb (both from the metagenomic assembly and from the fosmids) that provided evidence of a circular genome (that is, exact overlapping ends of at least 50 bp). Only contigs that appeared to be Caudovirales phages were retained (as these can be reliably identified by characteristic genes (for example, capsid, terminase, portal protein and tail proteins). Finally, we selected phage contigs that specifically carried a whiB transcription factor found exclusively in genomes of Actinobacteria (Ventura et al., 2007), marking them as putative actinobacterial phages. This gene has been found in several actinobacterial phages, for example, mycobacteriophages (Morris et al., 2008) and streptomyces phages (Van Dessel et al., 2005). It has also been found in a genomic fragment of an uncultured phage predicted to infect the marine actinobacterium Ca. Actinomarina minuta (Ghai et al., 2013; Mizuno et al., 2013b). WhiB is involved in a wide range of cellular processes (Gomez and Bishai, 2000; Alam et al., 2007). In the mycobacteriophage TM4, whiB has been shown to function as a dominant negative regulator by binding specifically to the host whiB promoter resulting in an aberrant septum, similar to a host whiB gene knockout (Rybniker et al., 2010). It also leads to an alteration in cell wall lipids resulting in superinfection exclusion. Thus, we obtained eight complete phage genomes predicted to infect Actinobacteria (Supplementary Table S1).

The longest genome was 61.3 Kb and the smallest only 14.4 Kb (the smallest siphoviral genome described yet) and all others were 39–42 Kb. The GC content for all varied from 41 to 49%, as expected from the low-GC content of abundant freshwater Actinobacteria (Ghai et al., 2012). In addition, principal component analyses of oligonucleotide profiles and codon usages of the actinophage genomes displayed good correspondence with known freshwater actinobacterial genomes (Supplementary Figures S1 and S2). Large fractions of their genes (upto 40%) gave best BLAST hits to existing actinobacterial genomes (likely to phages inserted in them) or to actinobacterial phages. Specific comparisons with the latter suggested that these new phages are likely Siphoviruses (Supplementary Table S2). Whole-proteome comparisons revealed a total of four distinct groups (G1–G4), two groups of three phages each with >20% conserved proteins (G2 and G3) and two singletons (Figure 1; Supplementary Figures S3 and S4). Nucleotide identity was only detectable in few short sequence matches within a group, and was non-existent between groups. Phylogenetic analysis of the terminase large subunit (TerL; Casjens et al., 2005) confirmed the same clusters (G1–G4; Supplementary Figure S5). G2 and G3 terminases appear to be related to either siphoviruses or myoviruses, whereas G4 appeared related to lambda-like siphoviruses with 3′-cos ends supporting a siphoviral morphology. Overall, these terminase sequences represent at least three novel evolutionarily distinct lineages of phage terminases.

Figure 1
figure 1

Phage groups and complete genomes of G1 and G4 actinophages. (a) All-vs-all comparison matrix of all proteins in all eight phage genomes. A color scale for the percentage of conserved protein hits is shown below. The four groups of phages, G1, G2, G3 and G4 are shown alongside the matrix. (b) Complete genomes of the shortest (G4) and (c) the longest actinophages (G1) are shown. All proteins for which function could be assigned are labeled and colored in blue. Hypothetical proteins are shown in gray. A length scale is shown at middle right. Genes discussed in the text are underlined.

The three phages of G2 gave potentially several significant hits using both nucleotide and translated nucleotide searches to one contig from a single-cell-amplified genome identified by its 16 S rRNA sequence as belonging to the acI-A7 lineage (acIA-AAA024-D14; Supplementary Figures S6 and S7). This contig represents an incomplete phage genome that is distantly related but still syntenic to all G2 actinophages. Most protein sequence identities are in the range of 30–40% (similar to those among the G2 actinophages). These results indicate that G2-like phages actually infect freshwater Actinobacteria belonging to the acI-A lineage.

Abundance and global distribution

The abundance of these phages is comparable and sometimes higher than actinobacterial genomes assembled from the same reservoir (for example, acMicro1, acAMD2 and acAcidi). All phages recruit maximum reads from the Amadorio reservoir (Figure 2; Supplementary Figure S8) and at very low levels from other sites, suggesting a somewhat endemic character. In contrast, although some actinobacterial genomes recruit mostly from their site of origin (Figure 2), others, for example, acIA-AAA023-D18 obtained from Lake Damariscotta can be found at high levels in multiple locations, for example, Yellowstone lake and Amadorio (Supplementary Figure S9). However, comparisons at the protein level suggest that phages similar to the ones described here can be found globally (Supplementary Figure S10). Moreover, related phages appear to inhabit spatiotemporal regimes similar to their predicted hosts (for example, higher abundances in epilimnion in Yellowstone Lake and in spring compared with summer in Lake Mendota) (Supplementary Figure S11; Allgaier and Grossart, 2006; Salcher et al., 2010). The restricted distribution of some of these phages is in stark contrast to marine phages that may be retrieved from geographically distant locations (Breitbart et al., 2004; Mizuno et al., 2013b). However, the oceans are inter-connected by a defined circulation, whereas freshwater habitats are disconnected and considerably heterogenous in the local environmental conditions.

Figure 2
figure 2

Fragment recruitment of freshwater Actinobacteria and actinophages in metagenomes and metaviromes. Abundance is expressed as RPKG (reads recruited per Kb of genome per Gb of metagenome). Only hits with >95% identity and >50 bp length were considered. Microbial genomes are grouped according to how they were obtained, single-cell genomics, metagenomic assembly or from pure culture. Genomes are also grouped by phylogeny, acI-A, acI-B, acAMD, acIV and Micrococcineae. The actinophage genomes are shown to the right, and the groups (G1–G4) are indicated within boxes. LF, large 5 μm fraction metagenome; SF, small 0.1 μm fraction metagenome.

A phage-encoded ADP-ribosyltransferase toxin

The G1 actinophage genome (uvFW-CGR-AMD-COM-C203) encodes a protein that shows all the hallmark characteristics and active site motifs of ADP-ribosyltransferases found in AB toxins, also called binary toxins (Figure 1; Supplementary Figures S12 and S13). In these toxins, a receptor-binding B-component facilitates entry of the A-component (the toxin) into the cytoplasm of eukaryotic cells, where it exerts its toxic effect by ADP-ribosylating its target protein. AB toxins are frequently encoded by free-living phages (Casas et al., 2006), or by prophages or plasmids in pathogenic bacteria, for example, Shiga toxin (Obrien et al., 1984), cholera toxin (Waldor and Mekalanos, 1996) or anthrax toxin (Helgason et al., 2000). Phagocytic eukaryotic predators have been suggested as their natural targets (Lainhart et al., 2009). The G1 actinophage genome encodes only the toxic A-component of VIP2-like toxins, a special class of AB toxins, similar to Bacillus cereus VIP2 and Clostridium perfringens iota toxins, that inhibit polymerization of the major eukaryotic cytoskeletal protein actin, leading to cell death (Barth et al., 2004). It is likely that this phage toxin acts in similar manner, but without the requirement for a receptor-binding B-subunit. Remarkably, it has been shown that Escherichia coli cells with phages encoding only the toxic component of the Shiga toxin (Stx), can still cause significant mortality when phagocytosed by Acanthamoeba cells (ca. 25%), nearly all Acanthamoeba internalizing Stx-encoding bacteria being killed (Arnold and Koudelka, 2014). Given the high frequency of phage infections, it is not unlikely that abundant phages (like the G1 actinophage) may infect significant numbers of actinobacterial cells that may comprise more than half of the total microbial community (Glockner et al., 2000). Given that a specific receptor is not needed, this could provide a degree of protection to the host population under the paradigm that a proportion of infected cells will be rendered toxic to a broad range of eukaryotic predators. This altruistic strategy executed in tandem by the host and its phage against eukaryotic predators transforms the infected sub-population into ‘trojan horses’. The phage becomes ‘public goods’ by helping eliminate the predator. In spite of the infected cell being killed by the phage, the host-population density would increase. Thus, the use of such phage-encoded toxins is expected to contribute towards increased fitness both for the phage and its host from a group (kin) selection perspective (Smith, 1976).