Identification of mycobacteriophage toxic genes reveals new features of mycobacterial physiology and morphology

Double-stranded DNA tailed bacteriophages typically code for 50–200 genes, of which 15–35 are involved in virion structure and assembly, DNA packaging, lysis, and DNA metabolism. However, vast numbers of other phage genes are small, are not required for lytic growth, and are of unknown function. The 1,885 sequenced mycobacteriophages encompass over 200,000 genes in 7,300 distinct protein ‘phamilies’, 77% of which are of unknown function. Gene toxicity provides potential insights into function, and here we screened 193 unrelated genes encoded by 13 different mycobacteriophages for their ability to impair the growth of Mycobacterium smegmatis. We identified 45 (23%) mycobacteriophage genes that are toxic when expressed. The impacts on M. smegmatis growth range from mild to severe, but many cause irreversible loss of viability. Expression of most of the severely toxic genes confers altered cellular morphologies, including filamentation, polar bulging, curving, and, surprisingly, loss of viability of one daughter cell at division, suggesting specific impairments of mycobacterial growth. Co-immunoprecipitation and mass spectrometry show that toxicity is frequently associated with interaction with host proteins and alteration or inactivation of their function. Mycobacteriophages thus present a massive reservoir of genes for identifying mycobacterial essential functions, identifying potential drug targets and for exploring mycobacteriophage physiology.

T7 Kil protein that interrupts FtsZ ring formation 25 . A variety of toxic genes have been reported using genomic screens, such as for Rhodococcus phage Yf1 and Staphylococcal phages 26,27 , and this has been proposed as a strategy for identifying novel drug targets 26 . Some mycobacteriophage-encoded genes have been shown to be toxic including mycobacteriophage L5 genes 77, 78, and 79 28,29 , the product of one of which (gp77) interacts directly with the host protein Msmeg_3532 30 , and another (gp79) which promotes filamentation 28 .
In general, toxicity resulting from phage lytic gene expression is unlikely to be directly beneficial to the phage as they need metabolically active cells to support viral replication; moreover, cells will die due to lysis at the conclusion of lytic growth regardless. However, there are a variety of mechanisms by which phage-encoded proteins could interact with host proteins or complexes such as to be beneficial to the phage, and which are manifested as toxicity when the protein is experimentally overexpressed. One example is provided by mycobacteriophage Fruitloop gp52, which interacts directly with the host DivIVA (Wag31) protein, interferes with its function, and impedes superinfection by a DivIVA-dependent phage, Rosebush 31 . DivIVA is an essential protein, and although Fruitloop gp52 is toxic, it is the exclusion of Rosebush and not its toxicity per se that is beneficial to phage Fruitloop 31 . Phage-encoded gene toxicity thus provides a surrogate indicator for elucidating gene function and identifying interactions with the host metabolic machinery.
Here we screened 193 mycobacteriophage genes to identify those that inhibit host growth. We observe that ~ 25% negatively impact host cell growth when expression is induced, although the effects range from mild to severe. Together, these confer a variety of changes in growth and cellular morphology of M. smegmatis, including filamentation, and branching, as well as differential effects on survival of daughter cells at division. Identification and characterization of mycobacteriophage-encoded toxic genes provides an effective approach for understanding mycobacterial physiology and identifying potential new drug targets.

Results
Identification of mycobacteriophage-encoded toxic and inhibitory genes. Mycobacterio-phages are genomically diverse and can be readily grouped into 'clusters' according to their overall sequence relationships [32][33][34] . Currently, there are 29 clusters (Clusters A, B, C, etc. to Cluster AC) and ten singleton phages (each with no close relative) 3 ; all but one of these clusters (Cluster C, myoviridae) have siphoviral morphotypes. For the mycobacteriophage siphoviruses, the virion structure and assembly genes are syntenically organized and transcribed rightwards in the left parts of the genomes 4 . Genes in the right parts of the genomes have nonstructural roles, are small relative to the structural genes, abundant, and most are of unknown function 4 .
To choose genes from the actinobacteriophage database to test for cytotoxicity (Table S1) we used the following criteria. First, we excluded virion structure and assembly genes, as well as the lysis and immunity genes. Second, we selected genes that are either of unknown function, or have non-specific functional assignments (e.g. DNA binding). Third, most of the genes are small (< 350 codons), and we specifically excluded larger nonstructural genes whose functions are readily predicted (e.g. DNA Polymerases). Fourth, we generally chose genes from different protein 'phamilies' (phams) to avoid sequence redundancy. Fifth, the genes are either predicted or have been shown to be expressed during lytic growth. Sixth, we screened genes for the predicted likelihood of crystallization using PPCpred 35 to facilitate downstream analysis. We screened genes from a diverse set of phages, but included larger numbers from phages LHTSCC, Fruitloop, and Wildcat (Table S1). Using these criteria, we selected 193 genes from 13 genomes from different clusters/subclusters for characterization of toxicity (Table S1).
Each of the 193 genes were PCR amplified-including the open reading frame (ORF) and 30 bp upstream of the predicted translation initiation site to include the native ribosome binding site-and inserted into plasmid vector pTNds 31 , an extrachromosomal E. coli-mycobacterium shuttle vector with a Tet-ON promoter. Recombinant plasmids were recovered and verified in E. coli, transformed into M. smegmatis mc 2 155, and grown on solid media lacking the anhydrotetracycline (ATc) inducer of the Tet-ON promoter. For one gene-LHTSCC 91-no M. smegmatis transformants were recovered, although the gene was successfully inserted into an integrationproficient vector (pTNdi 31 ) from which transformants were recovered and used for subsequent analyses. For each plasmid, five individual colonies were picked and streaked onto plates with and without 100 ng/ml ATc (300 ng/ ml ATc for LHTSCC 91) and incubated at 37 °C (Fig. 1A). Transformants with vector only, or with a plasmid in which mCherry is fused to Tet-ON, were used as controls.
For 148 of the 193 (76.7%) genes tested, we saw similar growth in the presence and absence of ATc and no toxicity was observed; the other 45 genes (23.3%) showed varying levels of reduced growth in the presence of ATc (Fig. 1A, Table 1, Table S1, Figs. S1, S2). We scored the levels of toxicity in the presence of ATc using a Toxicity Index (TI; 0-5) with 5 being the most toxic and 0 indicating no toxicity (Fig. 1A, Table 1, Table S1, Figs. S1, S2). For one gene (Fruitloop 94), toxicity was relatively mild, but bacterial growth on solid media containing ATc inducer has an altered morphology and appears smooth relative to the rough parent strain (Fig. S2). For some genes (e.g. Wildcat 145 and Wildcat 147) poor growth was seen in the absence of inducer, likely indicating leaky expression of highly toxic genes in the absence of induction. For Wildcat 147 this was sufficiently severe that further propagation in liquid culture resulted in overgrowth by non-toxic mutants, and Wildcat 147 was not studied further. Overall, almost one quarter of the genes tested are deleterious to M. smegmatis growth when expression is induced with ATc.
Approximately two-thirds of the 45 toxic genes have no predicted function, and the others have a diverse array of potential functions, including DNA binding, proteases, methyltransferases, lipoproteins, and other functions ( Table 1). The toxic/inhibitory gene products vary in length from 25 amino acids (LHTSCC 56) to 354 amino acids (Babsiella 50). At only 26 and 25 amino acids respectively, the assignments of Hammer 103 and LHTSCC 56 (both phages are in Cluster A) warrant further examination (Fig. 1B). We note that there are 95 genes related to Hammer 103 (grouped in the same phamily) in the Cluster A phage genomes, distributed across four subclusters, located downstream of the P left early lytic promoter, and expressed early in lytic growth 36,37 Scientific RepoRtS | (2020) 10:14670 | https://doi.org/10.1038/s41598-020-71588-5 www.nature.com/scientificreports/ (Fig. 1B). Homologues of Hammer 103 have been assigned to these genomes even though they are quite varied at the nucleotide sequence level (Fig. 1B). This comparative genomic analysis and the toxicity of these ultra-small genes supports their assignment in the genome annotations.
Impact of toxic gene expression on M. smegmatis growth. Twenty-one of the most-toxic genes (toxicity index = 5) were characterized for their impact on M. smegmatis growth when induced in liquid culture (Fig. 2). Protein expression was induced in each strain in early log-phase growth (OD 600 at 0.4) and cell densities Genes are shown as grey boxes above or below each genome corresponding to rightwards-and leftwardstranscription. Pairwise nucleotide sequence similarities are shown as spectrum-colored shading between genomes with violet corresponding to greatest similarity. Hammer 103 and its homologues are colored as red boxes, and indicated by the vertical arrow.  4 Mass spectrometry analysis of HA-tagged toxic proteins' co-immunoprecipitation (co-IP) targets. Details can be found in Table S2. The most prominent hits are listed here. 5 Features are determined through bioinformatic analyses. NKF = no known function. Hits to domains of unknown function (DUF) are shown. Loss of viability with toxic gene expression. Cellular toxicity could arise either from a temporary cessation of growth, or from irreversible loss of viability. We determined whether expression of the most toxic genes  www.nature.com/scientificreports/ viability is reduced by at least three orders of magnitude (Fig. 3). These include four genes (BAKA 17, LHTSCC 48, Wildcat 144, and Wildcat 166) that showed a marked reduction in OD 600 in the liquid culture-with the OD 600 reading at 24 h at or below the starting OD 600 -and four genes (Konstantine 66, Troll4 60, Troll4 72, and Troll4 85) that have milder impacts on growth in liquid culture (Fig. 2). Wildcat 166 expression appears to cause curved or crooked filaments ( Fig. 4; Fig. S3). One gene (Wildcat 165) does not obviously promote filamentation, but forms curved cells, perhaps by inhibiting cell wall growth unequally across the cell ( Fig. 4; Fig. S4). Expression of three genes (LHTSCC 46, LHTSCC 49, Wildcat 168) results in formation of bulbous ends at one pole of the cell ( Fig. 4; Fig. S4), a phenotype reminiscent of expression of Fruitloop gp52, which is known to interact directly with M. smegmatis Wag31 (DivIVA) 31 .

Impact of toxic gene expression on
To examine localization of the toxic proteins we constructed C-terminal eGFP fusions of each of the toxic proteins. First, we tested if the fusion maintains toxicity (Fig. S7); twelve were severely abrogated in their toxicity and were not analyzed further (i.e. the TI changed from 5 of the native protein to 0 or 1 for the fusion). The remaining ten fusions were examined by fluorescent microscopy 4 h after addition of ATc inducer (Fig. 5), and several notable patterns were observed. First, Wildcat gp168-eGFP appears to localize to the bulbous end of the cell (Fig. 5), and is reminiscent of the localization of Fruitloop gp52 31 . Second, six of the genes that promote filamentation (Konstantine 66, LHTSCC 48, Troll 60, Troll4 72, Wildcat 145, and Wildcat 163) form fluorescent puncta, which with Wildcat gp145-eGFP are localized predominantly at one pole of the filament (Fig. 5). The other five strains have several distinct puncta arranged throughout the filament. Rosebush gp44 does not promote filamentation per se but forms 2-4 puncta in each cell (Fig. 5).
One of the most intriguing morphological changes occurs by expression of Wildcat gp11 (Fig. 4). In bright field images, we observed many pairs of cells (2-20%; average of 10% in five images) in which one appears to be normal but the other has a ghost-like appearance (Fig. 4). In cells expressing mCherry (not as a fusion protein), only the more normal-appearing cell of each pair has normal fluorescence and the partner cell has little or no fluorescence (Fig. 6A). The different fates of M. smegmatis daughter cells at division has been described previously 38 , and we propose that as the cells divide, Wildcat gp11 causes death in just one of the two types of daughter cells. Consistent with this, we find that live/dead staining shows instances in which one cell of a pair stains positively with Syto9 and is alive, whereas the partner cell stains with propidium iodide and is dead (Fig. 6B); in some cell-pairs, one partner cell stains with propidium iodide, and the other does not stain at all, perhaps indicating complete loss of DNA (Fig. 6B). We note that for some cell pairs, the daughters stain differently but appear similar by bright field microscopy (Fig. 6).
The Wildcat gene 143 to 171 region is rich in toxic genes. The Wildcat genome has a set of 29 leftwards-transcribed genes at the extreme right end of its genome (Fig. 7). Because these have the characteristics of being small and all are of unknown function, we tested all of these for toxicity (Table S1). Twelve of these (41%) showed some degree of toxicity, two-thirds with a Toxicity Index of 5 (Table S1). Because toxic genes may be involved in influencing phage-host dynamics including exclusion-as exemplified by Fruitloop 52 31 -we predict that these are expressed early in lytic growth. To examine this in Wildcat, we performed RNAseq analysis at early and late lytic growth (30-   and the lysis cassette (49)(50)(51)(52) are only expressed at the 150 min time point, and the capsid (30) and major tail (35) subunits are among the most highly expressed genes (Fig. 7A). The most highly expressed lytic gene is the phage tmRNA gene, which has a 15-fold higher RNA signal than the virion structural genes.
Because of the prevalence of toxic genes in the 143-171 region, we bioinformatically examined this region further. The genes lack the typical organization of closely-packed ORFs in an operon with closely-spaced translation start and stop codons (Fig. 7B) 39 ; for example, in the 155 to 170 segment, 11 of the 15 intergenic distances are 80-100 bp long, and likely contain regulatory sequences. We have not been able to identify any single conserved sequence motif common to these regions, although there is a 60 bp region (Motif 1) shared (75% identity) in the four gene intervals 162-163, 168-169, 163-164, and 164-165 (Fig. S8), and a second 45 bp motif (Motif 2; 82% identity) in the two intervals 166-167 and 167-168 (Fig. S8). We also note that the 143-171 region is evidently mosaic in structure, with 12 of the 29 genes having homologues in a variety of phages in other clusters, including Gordonia, Arthrobacter, Microbacterium, and Streptomyces phages; typically only a single gene is shared with different flanking genes to the left and to the right. Finally, the RNAseq of this region shows a shark-tooth appearance, with low points between genes that correspond closely with the regions where there are small (~ 100 bp) intergenic gaps. Although we've not been able to identify putative promoter or terminator sequences, many of these genes may be independently expressed. www.nature.com/scientificreports/ www.nature.com/scientificreports/ The Wildcat 143-171 gene series is separated from the 1-23 genes-which are also leftwards transcribed (Fig. 7A)-by cos. During lytic growth when the cos-ends are ligated, transcription does not appear to traverse cos, and the region between gene 171 and cos is not transcribed (Fig. 7B). However, genes 1-23 are also expressed early in lytic growth (Fig. 7A), although the genes are not separated by intergenic non-coding regions (with the exceptions of 3-4 and 18-19 which are separated by ~ 50 bp gaps). Fourteen of the 23 genes were tested for

Identification of interacting proteins by co-immunoprecipitation.
The patterns of protein localizations, cellular morphologies, bacterial growth, and irreversible killing suggests that the toxicity of many of the toxic proteins results from specific interactions with host components, rather than non-specific effects. To explore this, we constructed HA-tagged derivatives of 13 toxic proteins, expressed them in M. smegmatis, and analyzed the immunoprecipitated proteins by SDS-PAGE (Fig. 8). Strong elution conditions were used to try and capture as many interacting proteins as possible 31 , although this yields a high background of protein recovery in control experiments in which the phage proteins lack the HA tag (Fig. 8). Nonetheless, several of the toxic proteins appear to co-precipitate with relatively prominent protein bands that are absent in the controls, notably with LHTSCC gp46, LHTSCC gp49, LHTSCC gp83, Wildcat gp13, Wildcat gp144, Wildcat gp145, and Wildcat gp165 (Fig. 8).  www.nature.com/scientificreports/ Co-immunoprecipitating proteins were identified by mass spectrometry, either by excising individual protein bands or by analysis of the whole protein eluate (Table S2). For several of the toxic proteins, strong predictions for interacting proteins can be made. First, for LHTSCC gp46, three prominent co-precipitating bands were observed with apparent sizes of ~ 27 kDa, ~ 48 kDa, and ~ 150 kDa (Fig. 8). Analysis of the 27 kDa showed that it is predominantly a PadR-family transcriptional regulator (Msmeg_6227), and these spectra were 100-fold more abundant in the HA-tagged sample than the non-tagged sample (Table S2). The most abundant peptides in the 48 kDa band correspond to EF-Tu, but are only represented 2.5-fold over the non-tagged control; it is a highly abundant protein in M. smegmatis and it may not reflect a specific interaction (Table S2). The 150 kDa band has β' subunit of RNA Polymerase and is especially enriched relative to the non-tagged control sample. The β subunit of RNA polymerase is also detected in the whole protein eluate of the HA-tagged sample with ~ five-fold more peptide spectra than the non-tagged sample. Together, these data suggest that LHTSCC gp46 interacts with the gene expression machinery, and possibly with Msmeg_6227 itself. Interestingly, analysis of the entire eluate precipitated with Wildcat gp11 also showed Msmeg_6227 peptides recovered from the HA tagged protein but not the non-tagged control (Table S2). Msmeg_6227 is at similar molecular weight to Wildcat gp11 and likely co-migrates in SDS-PAGE (Fig. 8). We note that the closest relative of Msmeg_6227 in M. tuberculosis is Rv3488 40 , which shares weak similarity (35% amino acid identity) in a central 85-residue segment of the protein. Rv3488 has been implemented in countering toxic metal stress 40 .
For Wildcat gp145, a co-immunoprecipitating band migrating at approximately 60 kDa was excised from the HA-tagged sample (and from an equivalent position in the non-tagged sample) and analyzed by mass spectrometry. Peptides were identified that correspond to many M. smegmatis proteins-most with predicted molecular weights between 55 and 65 kDa-and most with similar numbers of peptides in the two samples. However, thioredoxin disulfide reductase (Msmeg_1516) has by far the greatest number of spectra and likely reflects the prominent band observed by SDS-PAGE ( Fig. 8; Table S2). Msmeg_1516 is a homologue of Rv3913 (36% amino acid identity) that is implicated in stress tolerance 41 .
A large number of proteins co-precipitated with LHTSCC gp83-HA, and several excised bands predominantly contain spectra from ribosomal proteins (Table S2). LHTSCC gp83 may thus interact with ribosomes, but whether this is specific and related to its toxicity is unclear. Likewise, a large molecular weight band coprecipitating with LHTSCC gp49-HA was identified as the β' subunit of RNA Polymerase, but the specificity of the interaction is unclear. Finally, a prominent band co-precipitating with Wildcat gp13 was identified as Msmeg_1285 ( Fig. 8; Table S2). It may reflect a specific interaction, and Msmeg_1285 is a putative scaffolding protein that interacts with PonA1 a putative D,D-transpeptidase 42 . Msmeg_1285 is a homologue (59% amino acid identity) of M. tuberculosis Rv0613c, which is non-essential for in vitro growth of M. tuberculosis H37Rv and is reported to be membrane-associated 43 .

Discussion
Like most bacteriophages, mycobacteriophage genomes have large numbers of genes of unknown function. Determining their roles in phage life cycles is complicated by the observation that many are not required for lytic growth 9 . They are also often not conserved among groups of closely-related genomes, although they frequently contribute to mosaic relationships and have relatives in unrelated genomes 2 . Screening genes for toxicity when expressed in the host is a relatively simple way of observing particular phenotypes and identifying potential interacting host protein partners 26,31 . Here, screening of 193 diverse mycobacteriophage genes from 13 different genomes showed that about 23% are toxic when expressed in M. smegmatis. We note that the expression system used is tightly, but not completely repressed in the absence of inducer, and genes with severe toxicity are difficult to recover and propagate (Figs. S1, S2). It is also plausible that some genes may be expressed at sufficiently high levels when fully induced that they confer non-specific toxicity that is not informative. Nonetheless, a range of toxicity was observed, and the genes can generally be classified as having severe or mild effects on bacterial growth.
Non-specific toxicity is a general concern when screening phage genes for inhibitory effects. However, many of the genes tested here result in irreversible growth inhibition and changes in cellular morphology that suggest interference with cell division, or growth at the cell poles, all of which are likely to reflect specific effects. The most striking of the morphological effects is the apparent impairment by Wildcat gp11 of one cell of a daughter pair after division, reflecting a highly specific consequence. The co-immunoprecipitation studies agree with these interpretations and support that toxicity often results from protein interactions. Confirmation of these interactions will require reverse co-IP with tagged target proteins, and to-date this analysis has confirmed only the interactions between Fruitloop gp52 and Wag31 31 and Wildcat gp13 with Msmeg_1285 (our unpublished observations).
One of the most intriguing phenotypes observed resulted from expression of Wildcat gp11. Wildcat gp11 is a 25 kDa protein containing a methyltransferase motif (Pfam: Methyltranf_24 PF13578), is strongly toxic on solid media, and confers largely irreversible mortality when induced in liquid culture, even though there is only a modest impairment in OD 600 rise. Microscopy shows that pairs of cells-presumably reflecting daughter cells immediately after division-contain only one cell that stains as being alive. It is known that at division one daughter cell inherits a growing pole whereas the other has to establish a growing pole 38 and it seems plausible that Wildcat gp11 can distinguish between these two types of cells. Co-immunoprecipitation suggests that Wildcat gp11 interacts with the PadR-like transcriptional regulator, Msmeg_6227 (Fig. 8), which has been shown to be highly abundant in dormant M. smegmatis cells 44 . It is not clear whether Msmeg_6227 itself localizes differentially between daughter cells at division, or if it regulates expression of genes involved in these processes. Msmeg_6227 is not essential for viability of M. smegmatis 45  www.nature.com/scientificreports/ regulator properties, rather than simply inactivating it. Whether the methyltransferase activity of Wildcat gp11 is required for the toxic and morphological effects is not known. Interestingly, LHTSCC gp46-a 56-residue protein with no conserved domains and no known function-also interacts with Msmeg_6227 (Fig. 8). LHTSCC gp46 expression confers a different morphological change than Wildcat gp11, resulting in a prominent bulge at one pole of the cell, which we presume to be the growing pole (Fig. 4). LHTSCC gp46 might confer a different type of Msmeg_6227 dysregulation than Wildcat gp11, resulting in the different morphologies. How phage infection by either LHTSCC or Wildcat might benefit from dysregulation of Msmeg_6227 is unclear, but it is possible that it results in exclusion of phages that hypothetically might require Msmeg_6227 for efficient infection, mirroring the activity of Fruitloop gp52, which induced a similar morphology and also localizes to the pole 31 .
Curiously, Wildcat 165 expression promotes cell curvature. Wildcat gp165 is a 91-residue protein of unknown function that is very toxic, but mostly bacteriostatic when induced in liquid culture. We note that similarly curved cells were observed when the host DivIVA (Wag31) gene was C-terminally tagged with eGFP 46 . This is a phenotype distinct from the bulging poles observed when DivIVA is depleted 47 or when Fruitloop gp52 is expressed 31 . We have not been able to show a direct interaction between Wildcat gp165 and Wag31, but this may represent a second instance in which two unrelated phage proteins (Fruitloop gp52 and Wildcat gp165) target the same host protein (Wag31) but in different ways, with different morphological outcomes.
We note that phages such as Wildcat may have large groups of genes such as the 143-171 group that are expressed early and perhaps influence host-phage dynamics. There are many genes in this group that are toxic when expressed in M. smegmatis, and it is plausible that the non-toxic genes may also modulate host processes, but in ways that do not impair cell growth or survival when overexpressed. Toxic genes identified in screens such as those described here may thus be useful for characterizing the functions of genes that are closely linked and similarly expressed.
Finally, this report suggests that ~ 25% of non-virion mycobacteriophage genes have a toxic phenotype, and that a deeper investigation could provide a very large number of toxic genes, many with informative interactions with host proteins. The ~ 1,880 sequenced mycobacteriophage genomes contain ~ 200,000 genes constituting over 15,000 phamilies, of which 3,000 may have toxic phenotypes. Further characterization of these genes and their interacting protein partners will provide new insights into mycobacterial physiology, identify new potential drug targets, and elucidate key aspects of phage-bacterial dynamics.
Propagation of phages was done by infecting 500 µl of a saturated M. smegmatis culture with 10 µl phage lysate for 10 min at room temperature, adding 2.25 mls of Middlebrook Top Agar (MBTA; Middlebrook 7H9, 1 mM CaCl 2 ) pouring onto solid media, and incubating at 37 °C for one to two days for plaque formation.
Plasmid construction. The vector pTNds is a hygromycin (Hyg) resistant, Tet-ON Gateway Cloning destination plasmid, that replicates extrachromosomally in M. smegmatis mc 2 155 31 . The vector pTNdi is also a Tet-ON Gateway Cloning destination plasmid but is streptomycin (STR) resistant and integration proficient. Both vectors have been described previously 31 . Cloning of genes into pTNds or pTNdi was done according to Invitrogen's 'Gateway Technology with Clonase II' user manual but with reduced reaction volume. Due to the nature of the Gateway Cloning technique and the location of the Gateway Cloning Cassette, a ribosomal binding site (RBS) must be included with the gene to be expressed, hence the inclusion of 30 bp DNA upstream of each gene. Protein overexpression from the extrachromosomal plasmids was induced by adding 100 ng/ml anhydrotetracycline (ATc) to the growth media for extrachromosomal plasmids, and 300 ng/ml ATc for integrated plasmids.

Identification of toxic/inhibitory mycobacteriophage-encoded genes in M. smegmatis.
Approximately 50 ng of each plasmid was transformed into 100 µl M. smegmatis mc 2 155 electrocompetent cells with 2.5 kV, 1,000 Ω, and capacitance at 25 µF as described previously 48 . After recovery for 2-4 h in liquid medium, transformants were recovered on selective media. Individual transformants were picked and patched onto solid media lacking or containing 100 ng/ml ATc (300 ng/ml ATc for pTNdi-based vectors) and incubated at 37 °C for 2-3 days. As a gene expression control, the pTNds or pTNdi with mCherry was also transformed and patched. Three days later, the patches with mCherry on the induced plate were visibly pink.
Genes were analyzed for potential functions using a variety of bioinformatic strategies, including psiBLAST and HHpred 49,50 . Toxicity Index assignment. A Toxicity Index (TI) value of 1-5 was assigned for each construct based on growth of bacterial patches on solid media in the presence of inducer, as follows: 5, little or no bacterial growth, 4, some but very little growth, 3, growth on some patches, but somewhat variable for different patches, 2, evident although not good bacterial growth, 1, good bacterial growth but slightly smaller patches than in the absence of inducer. No difference in growth was assigned a TI of zero. Although the assignment of TI is based on the above www.nature.com/scientificreports/ characteristics, it is also somewhat objective and based on relative ranking of toxicity with other strains assayed similarly.
Live/dead staining. M. smegmatis with pTNds vector carrying Wildcat 11 was grown in liquid culture at 37 °C with shaking till OD 600 is ~ 0.5. The culture was then induced with 100 ng/ml ATc for six hours at 37 °C with shaking. The induced cells were treated with Live/Dead BacLight Bacterial Viability Kit for microscopy (ThermoFisher) following the recommended protocol in the kit.
Microscopy. Light microscopy was done using an Axiostar plus Transmitted-Light Microscopy (Zeiss; https ://www.micro -shop.zeiss .com/en/us/syste m/axiov ision +softw are-softw are+axiov ision -softw are/6014/) with 100X objective lenses, and images were acquired using AxioCam MRc5 camera and AxioVision software (version 4.6.3.0). To prepare a microscope sample slide, a 5-10 µl liquid culture was pipetted onto a slide. Then a coverslip was placed on top of the drop, followed by hard pressing the coverslip with a piece of Kimwipe to absorb excess liquid. The coverslip was then sealed with nail polish. Fluorescence microscopy was performed using the same microscope with the attachment of HBO 50 Microscope Illuminator (Zeiss) and filter 42002 (Chroma).
Co-immunoprecipitation. The co-IP protocol is similar to the one described previously 31 . In brief, a 100 ml liquid culture of M. smegmatis containing the desired plasmid was grown to OD 600 ~ 0.5 at 37 °C at 250 rpm. Protein expression was induced for 4 to 7 h, and cell pellets were isolated and resuspended in 500 µl lysis buffer (50 mM Tris-HCl, 1 mM EDTA, 0.5% Triton X-100, 1 mM PMSF) followed by sonication and centrifugation to collect the lysates; 4,000 µg to 8,000 µg total protein of each lysate was used for co-IP with Pierce HA tag IP/Co-IP kit with anti-HA agarose beads or anti-HA magnetic Beads (ThermoFisher), washed with lysis buffer and TBS (25 mM Tris-HCl, 150 mM NaCl, pH7.2-7.4), and 2X Non-reducing sample buffer was used for elution with β-mercaptoethanol added after elution. The eluate was analyzed on a 4-20% SDS-PAGE gradient gel and imaged using GelDoc XR + (Bio-Rad) with ImageLab (Bio-Rad, version 3.0, build 11, https :// www.bio-rad.com/en-us/produ ct/image -lab-softw are?ID=KRE6P 5E8Z). The eluates or excised candidate bands were trypsin digested, and identified by Biomedical Mass Spectrometry Center at the University of Pittsburgh with nano-reverse phase HPLC interfaced with an LTQ linear ion trap mass spectrometry along with SEQUEST search engine and Scaffold software for statistical validation, or identified by University of California at Davis Proteomics Core Facility. For Wildcat gp11, Pierce anti-HA magnetic beads were used (ThermoFisher). After the washing steps, protein-bound magnetic beads were not eluted and were directly digested on beads and analyzed by tandem Mass Spectrometry with X! Tandem search engine by the University of California at Davis Proteomics Core Facility.

RNA sequencing analysis.
Mycobacterial liquid cultures of OD 600 ~ 0.7 were infected with Wildcat phage lysate at a multiplicity of infection of three for 10 min before incubating at 37 °C. Thirty minutes and 150 min post-infection, 4 ml samples were collected for RNA isolation using matrix lysing tubes (MP Biomedicals) and RNeasy Mini Kit (Qiagen). rRNA was then removed using RiboZero rRNA removal kit (Gram-positive bacteria, Illumina). RNAseq was then performed and analyzed as described previously 11 . RNAseq data was visualized using Integrative Genomics Viewer (version 2.3.92) 51 combined with Phamerator-generated genome maps 52 .

Data availability
The data generated or analysed during this study are included in this published article (and its Supplementary Information files). Phage genome sequence information is available in GenBank and at https ://phage sdb.org. RNAseq data of Wildcat is deposited at Gene Expression Omnibus (https ://www.ncbi.nlm.nih.gov/geo/) with accession number GSE149160.