Introduction

Mycobacteria inhabit various environmental reservoirs such as ground and tap water, soil, animals and humans. They are divided into rapid and slow growing mycobacteria, RGM and SGM, respectively. Among SGM, Mycobacterium tuberculosis (Mtb) causes tuberculosis (TB) whereas the RGM Mycobacterium smegmatis MC2-155 (Msmeg) is frequently used as a mycobacterial model system. As other bacteria, mycobacteria form biofilms, show changes in their colony morphotype (CM) and cell shape. They also appear to have a growth advantage in water that contains disinfecting agents. Some mycobacteria such as Mycobacterium mucogenicum (Mmuc) is omnipresent in water; it can be isolated from sewage and hospital water systems and has been demonstrated to be associated with various infections1,2,3,4,5,6. The phylogenetically close RGM Mycobacterium neoaurum (Mneo) and Mycobacterium cosmeticum (Mcos) were originally isolated from soil and granulomatous lesion of a female patient, respectively3.

We recently provided a comparative genomic analysis of non-tuberculosis mycobacteria (NTM) belonging to the Mmuc- and Mneo-clade emphasizing tRNA and non-coding (nc) RNAs. In addition to Mmuc, the Mmuc-clade includes Mycobacterium phocaicum (Mpho), Mycobacterium aubagnense (Maub) and Mycobacterium llatzerense (Mlla) while Mneo and Mcos belong to the Mneo-clade7. These rapid growing NTM are associated with various infections and they show high tolerance against the first-line anti-tuberculosis drugs isoniazid, rifampin and pyrazinamide3,5,7,8,9,10,11,12 (and Refs therein) and can also show unusual patterns of antibiotic susceptibility13. Together this emphasizes the importance of this group of NTMs and provided an incentive for a comparative genomic analysis focusing on virulence and selected regulatory genes. Moreover, information about transcription of genes in NTM, other than Msmeg are limited (but see the recent Mneo transcriptome and proteome study14). Hence, we decided to focus on MmucT, for which the complete genome is available7, and study mRNA levels at different in vitro growth conditions of selected genes suggested to be involved in gene expression and virulence.

Comparative genomic analysis encompassing seventeen genomes provides new insights into how some of the characteristics of these mycobacteria such as common and unique genes (and their functional classification), and horizontal gene transfer (HGT) might be manifested as phenotypic differences. Specifically, for virulence genes we provide data related to the distribution of sigma factor genes, serine threonine protein kinase (STPK) genes, type VII secretion systems (ESX genes) and mammalian cell entry genes (mce). Our data, where we analyzed the mRNA levels for these genes in MmucT by RNASeq, further suggested that their levels depend on growth conditions and these data are discussed particularly in relation to the SGM Mycobacterium marinum, a model system for Mtb15,16,17. On the basis of our genomic and RNASeq data we also discuss possible reasons as to why the Mmuc- and Mneo-clade members show resistance to the first-line anti-tuberculosis drugs rifampin and isoniazid. Finally, we discuss the formation of rough and smooth colony morphotypes among these mycobacteria.

Results

Members of the Mmuc- and Mneo-clades share a common ancestor and are phylogenetically close where the Mmuc-clade members constitute an earlier branch than the Mneo-clade (Fig. S1; Pettersson et al. unpublished)7,11. We first discuss the overall functional classification of common and unique genes in Mmuc- and Mneo-members focusing on the type strains MmucT, MphoT, and MaubT belonging to the Mmuc-clade, and MneoT and McosT members of the Mneo-clade (see Methods). We also include Mycobacterium sp. URHB0044, which is phylogenetically close to the Mmuc- and Mneo-clades but does not appear to belong to either of these two clades (Fig. S1). Then, we identify genes acquired through horizontal gene transfer, HGT, and their functional classification. Following this, our analysis focuses on virulence and regulatory genes with a specific emphasis on mammalian cell entry factors (mce), type VII secretion (ESX systems) factors, transcription sigma/anti-sigma factors and serine threonine protein kinases (STPK). Finally, we discuss transcription of selected genes in MmucT (for which the complete genome is available7) under different in vitro growth phases focusing on virulence genes. In this context, we compare the transcription patterns in MmucT and in the SGM pathogen M. marinum CCUG 20998 (a derivative carrying the gene coding for the red fluorescent protein, rfp, and referred to as Mmarrfp while MmarT lacks rfp)18.

Core and unique gene analysis and functional classification

Homologous and non-homologous chromosomal genes were predicted using the PanOCT (see Methods) and 2770 genes, corresponding to ≈52% of all predicted MmucT genes, were also identified in MphoT, MaubT, MneoT and McosT (Fig. 1a). The predicted number of unique genes in MmucT, MphoT, MaubT, MneoT and McosT were 702, 736, 1161, 1017 and 1618, respectively where McosT has the highest number of unique genes, which correlates with its larger genome size. The number of genes unique to the Mmuc- and Mneo-clade members were predicted to be 776 and 699, respectively (Fig. 1a). However, including other Mmuc- and Mneo-clade members (Fig. S1; Table S1) the numbers dropped to 637 and 557, respectively.

Figure 1
figure 1

Analysis of CDS and functional classifications. (a) Venn diagram showing the presence of common and unique genes for the five type strains MmucT, MphoT, MaubT, MneoT, and McosT. (b) Functional classification of 2770 core genes present in five type strains MmucT, MphoT, MaubT, MneoT, and McosT. (c) I, species as indicated. II, grey boxes represent number of genes not functionally classified, and boxes marked with different colors represent number of functionally classified genes in subsystems in the different species as indicated. III, grey boxes represent number of genes classified as hypothetical genes, which are not functionally classified while 104, 93, 89, 101 and 85 correspond to the number of hypothetical genes that are functionally classified. The scale 0 to 100 represent the distribution expressed as percentage, e.g. in II (MmucT column) of the total number of genes (2390 + 1764 = 4154) 42.5% were functionally classified in subsystems. Functional classification of unique genes identified in panel (a) from the five type strains MmucT, MphoT, MaubT, MneoT, and McosT.

Functional classification of the predicted 2770 core CDSs in the five type strains (MmucT, MphoT, MaubT, MneoT and McosT) did not reveal any major species-specific difference [Fig. 1b; the data set for MmucT represents all five species and 2054 (of these 1206 were classified in only one category) of the 2770 CDSs could be classified; Fig. S2a]. Subsystem classification suggested that 18% belongs to the “Amino Acids and Derivatives” category, 12% to the “Cofactors vitamins prosthetic groups pigments” category, 12% to the “Carbohydrates” category and 10% to the “Protein Metabolism” category while 2.6% belongs to the “Virulence, Disease and Defense” category. Together, these categories encompass ≈55% of the 2770 core CDSs. Considering unique genes (Fig. 1c) in the same five species revealed that the number of CDSs in the “Carbohydrates” category was higher in McosT (28%) and in MaubT (23%) compared to MmucT (16%), MphoT (14%) and MneoT (15%). At the “clade” level (Fig. S2b), the Carbohydrates category was 28% (Mneo-clade) and 14% (Mmuc-clade). Comparing the Mmuc- and Mneo-clades at the subcategory level, the “Carbohydrates” category “Monosaccharides” was higher in the latter. Also, the data predicted that CDSs in the subcategories “One carbon metabolism”, “CO2 fixation and Organic acids” were only detected in the Mneo-clade and the number of CDSs in these subcategories was high in both MneoT and McosT (Fig. S2c). Functional classifications for all 17 Mmuc- and Mneo-clade members, and M. sp. URHB0044, MtbH37Rv, Msmeg and MmarT, see Fig. S2a.

Horizontal gene transfer - HGT

To predict horizontally transferred genes we used HGTector (see Methods) and the number of genes ranged from 41 (Mneo ATCC25975 referred to as MneoATCC25975) to 88 (M. sp. URHB0044), see Table S1. For the five type strains, MmucT, MphoT, MaubT, MneoT and McosT, we predicted that of the six HGT genes common to these mycobacteria, five genes are derived from Proteobacteria (two from the β-, two from the γ-, one from the α-branch) and one from unclassified bacteria. Many HGT genes in the five type strains were predicted to originate from several other bacterial sources with the highest numbers from Rhizobiales, Burkholderiales and Pseudomonadales (Fig. 2; Table S2a–p). In MmucT and MphoT we also detected genes of possible eukaryotic origin; the inosine-5-monophosphate dehydrogenase gene (MMUC_02941, which probably originates from Tritrichomonas suis), pseudooxynicotine oxidase (MMUC_05484 and MPHO_05174) gene, FAD dependent oxidoreductase (MPHO_04106) gene and a gene encoding a hypothetical protein in MneoT (MNEO_01334). The eukaryotic origin for the three latter genes was not identified (Table S2a–p). Functional classification of HGT genes in 14 of the mycobacteria including the type strains, revealed that the highest number of genes were to be found in the category “Amino Acids and Derivatives” with the notable exception of McosT and M. sp. URBH0044. For these two, roughly 40% of the HGT genes belong to the category “Carbohydrates” (Fig. 2b).

Figure 2
figure 2

Identification of horizontally transferred genes and their functional classification. (a) Heat map plot showing putative horizontally transferred genes in the Mmuc- and Mneo-clade members (species indicated below the heat map plot) using HGTector (see Methods). The upper left shows the colored code where the x-axis represents the number of horizontally transferred genes; the scale is in multiple of 4 (green, 1 to 4; blue, 5 to 8; orange, 9 to 12; purple, 13 to 16; yellow, 17 to 20; red, 21 to 24) genes originating from different orders indicated to the right, e.g. Pelagibacterales. Grey refers to that no genes originate from indicated order. (b) Functional classification of putative horizontally transferred genes in the Mmuc- and Mneo-clade members.

Virulence genes

The Mmuc- and Mneo-clade members carry virulence genes (Table S1) and 129 genes are common for MmucT, MtbH37Rv and MmarT (Fig. 3a). Compared to the non-pathogen Msmeg, roughly similar numbers of genes in Mmuc- and Mneo-clade members were predicted while MtbH37Rv and MmarT encode higher numbers. Among these, genes in the categories: cell surface components, secretion systems and mce operons are abundant (Figs. 3b and S3; Table S3a). Specifically, we noted that the numbers of predicted mce homologs are slightly higher in Mmuc- and Mneo-clade members compared to MtbH37Rv while MtbH37Rv and MmarT have higher numbers in the categories “Cell Surface Components” and “Secretion Systems”. We also noted that Mmuc- and Mneo-clade members lack mce2 homologs (see above; Fig. 3c; Table S3b), which influence Mtb virulence19. In the category secretion systems, homologs of ESX-5 associated genes (Fig. 3d; Table S3c), known to have an impact on mycobacterial virulence20,21, are missing in the Mmuc- and Mneo-clade members (see below). With respect to the category regulation, we predicted the number of transcription sigma (σ) factor genes and serine-threonine protein kinase (STPK) genes, which have key roles in global regulation of gene expression and function of proteins and hence virulence22,23. Briefly, the numbers of σ-factor genes vary between 17 and 29 among members of the Mmuc- and Mneo-clade (M. sp. URHB0044 has 35; see below). This should be compared to MtbH37Rv and Mmar, which carries 13 and 17–18 (dependent on strain) σ-factor genes, respectively18,22. Interestingly sigC, mainly present in SGM such as MtbH37Rv and Mmar, is present in the Mmuc-clade members but absent in the Mneo-clade (Fig. 3e; Table S3d). In this context, we note that Msmeg encode 28 σ-factors (for further details, see below).

Figure 3
figure 3

Virulence factors and analysis of selected genes related to virulence. (a) Venn diagram showing common and unique homologous virulence factor genes among MmucT, MtbH37Rv, and MmarT. (b) Classification of virulence factors genes (VFanalyzer, VFDB; see Methods) present in MmucT, MphoT, MaubT, MneoT, McosT, MtbH37Rv, MmarT, and Msmeg as indicated. (c) Heat map plot showing the number of genes corresponding to the different mce operons, marked as Mce1 to Mce8, among Mmuc- and Mneo-clade members, MtbH37Rv, MmarT and Msmeg. MCE_type represents mce related genes that could not be classified as Mce1-8. (d) Heat map plot showing the distribution of different ESX genes among Mmuc- and Mneo-clade members, MtbH37Rv, MmarT and Msmeg. (e) Heat map plot showing the distribution of sigma factor genes among Mmuc- and Mneo-clade members, MtbH37Rv, MmarT and Msmeg. (f) Heat map plot showing the distribution of STPK genes among Mmuc- and Mneo-clade members, MtbH37Rv, MmarT and Msmeg.

Another group of regulatory genes, STPK genes, was also predicted to be larger compared to MtbH37Rv (Fig. 3f; Table S3e; see below), which might be related to that Mmuc- and Mneo-clade members are found in different ecological niches than that of Mtb. The higher numbers were mostly attributed to multiple copies of pknF and/or pknH.

Given that MmucT shows a mucoid growth we searched for the presence of genes reported to influence formation of rough and smooth colony morphotype (CM) focusing on genes constituting the GPL locus and involved in the synthesis of glycopeptidolipids that are present in the cell wall and as such have a role in virulence24. Interestingly, members of both the Mmuc- and Mneo-clades lack several GPL locus genes suggesting differences in the nature of the glycopeptidolipids in these mycobacteria compared to Mtb, Mycobacterium abscessus and Msmeg (Table S3f; see discussion). In this context, we note that all Mmuc-clade members lack ahpC, suggested to influence resistance against the antibiotic isoniazid25, while it is present in the Mneo-clade members. Also, sitA and sitB (genes related to iron uptake in e.g. Shigella flexneri26), and ureB and ureG (genes related to acid resistance and nitrogen metabolism27) were predicted to be present in Mneo-clade members but not in the species of the Mmuc-clade (Table S3a). For members of the Mmuc-clade we also detected the presence of nitrate reductase related genes (anaerobic respiration). Finally, mosR, a redox-dependent transcriptional repressor identified in Mtb reported to influence virulence28, appears not to be present in any of these NTM (Table S3a).

Mammalian cell entry – mce – operons

MtbH37Rv has four mce operons, mce1-4, while ten mce gene clusters have been predicted in the environmental mycobacteria Mycobacterium phlei19,29. The Mmuc- and Mneo-clade members also carry mce genes (Fig. 3c; Table S3b). Briefly, mce1, 3 and 4 are present in all the genomes, while mce7 (and mce8) is missing in MphoT and MneoVKMAc-1815D. The mce5 operon appears to be missing in the Mneo-clade members, while it is present in members of the Mmuc-clade. As discussed above, none of the studied mycobacteria was predicted to carry the mce2 operon but some were predicted to have a few mce2 paralogs, e.g., two mce2 paralogs were predicted in MmucT (Table S3b). Also, we noticed the presence of two or three mce3 operons in several of the genomes as well as an extra mce4 operon in several of these mycobacteria (Fig. 3c; Table S3b). Taken together, as in MtbH37Rv, these NTM carry mce operons that relate to virulence, cell wall lipid composition (mce1), and lipid metabolism (where mce3 relates to lipid metabolism and mce4 to cholesterol metabolism29,30,31,32,33,34).

Type VII secretion – ESX-operons

We predicted the presence of ESX gene families from the different mycobacterial species using the BLASTp approach and esx genes present in the MtbH37Rv genome (see Methods). For comparison, we included MmarT and Msmeg. Following this outline, we predicted homologs to the ESX-1, ESX-3 and ESX-4 gene clusters for members of the Mmuc- and Mneo-clades, and M. sp. URBH0044, while ESX-2 and ESX-5 genes seem to be absent in these NTM (Fig. 3d; Table S3c). Moreover, we could not predict MtbH37Rv orthologs of the espACD operon, known to have a role in virulence20,21, in any of the Mmuc- and Mneo-clade members (or Msmeg) genomes. But, paralogs to these genes, except espC, were predicted in some of the genomes, e.g. MphoT (Fig. 3d; Table S3c). These might be homologs of espE, espF and espG but using a bidirectional best hit analysis approach, with MtbH37Rv as reference, this appears not to be the case (not shown). Moreover, several of the genes in the ESX-1 loci in McosT appear to be absent while compared to MtbH37Rv, the ESX-1 genes espE, espF, espJ and espK are missing in all the Mmuc- and Mneo-clade members (Table S3c). Interestingly, pknJ (encoding a STPK, see above) is located in the ESX-1 region in members of the Mmuc-clade, however, its role with respect to secretion (if any) is not known.

Sigma and anti-sigma factor genes

For MmucT, MphoT, MaubT, MneoT and McosT the predicted number of σ-factor genes ranged between 17 and 29, with MmucT and MneoT having the highest and lowest numbers, respectively (Fig. 3e; Table S3d). The numbers for the other Mmuc- and Mneo-clade members were similar while for M. sp. URBH0044 we predicted 35 σ-factor genes. So far, this is the highest number of σ-factor genes predicted in any mycobacteria. Compared to the other mycobacteria, its genome is larger (≈7.5 Mbp) and it does not belong to either the Mmuc- or the Mneo-clade7.

Sigma factors are divided into four groups; Group 1 and 2 include the housekeeping σ-factor, SigA, and SigB, respectively, group 3 SigF and group 4 the ECF (extra cytoplasmic function) σ-factors35,36,37,38. As other mycobacteria Mmuc- and Mneo-clade members code for several ECF σ-factors in addition to SigA, SigB and SigF. Single copies were detected for sigA, sigB, sigC (when present), sigD and sigM irrespective of species (Fig. 3e; Table S3d). The majority of the σ-factor genes belong to group 4, the ECF22,37,38. For several of these, more than one gene was annotated, as exemplified by σ-factor E (sigE) with three copies (four including rpoE), and with two copies of sigH, sigK and sigL in MmucT. Mmuc-clade members were also predicted to have three copies of the alternative σ-factor sigF. In Table S3d, the σ-factors with more than one copy annotated are listed separately based on gene synteny (e.g., sigF genes were predicted at three different genomic locations; Fig. S4a,b). Moreover, in keeping with that Mneo-clade members have fewer σ-factor genes the numbers of extra copies of different alternative σ-factor genes are lower. As discussed above, sigC was predicted in members of the Mmuc-clade, including the type strain MmucT, while it is missing in members of the Mneo-clade (Fig. 3e; Table S3d; see discussion). This sigma factor has previously been reported to exist in slow growing pathogenic mycobacteria such as MtbH37Rv and Mmar but not in the RGM Msmeg22,37,38 (and Refs therein). Also, sigC is implicated to have a role in Mtb pathogenicity39,40,41,42. Whether this is the case also for Mmuc-clade members warrants further studies.

Gene synteny analysis suggested that sigD, sigE1, sigF3, sigH2, sigK1, sigK2, sigL1 and sigM are closely linked to genes encoding the corresponding putative anti-σ factor (see below). Thus, we consider these σ-factor genes as homologs (except for the two sigK genes both of which are positioned close to their respective anti-σ factor genes) of the single σ-factor genes in other mycobacteria such as MtbH37Rv and Mmar18,37,38. Alignment of the different σ-factors indicated variations within the respective group (Fig. S4c). To understand the interrelation between the σ-factors we therefore generated a “Sigma factor” phylogenetic tree (based on amino acid sequence). In this tree, the σ-factors are deployed in distinct clusters (Fig. S4d). The SigF variants cluster (close to SigA and SigB) indicating that these indeed belong to group 3 and have a common ancestor. For the ECF group and using the Mtb ECF σ-factors as reference the following are suggested to be phylogenetically close: i) SigH and SigE, ii) SigM and SigC1, iii) SigK (K1 and K2) and SigL, iv) SigI1 and SigI1-like, v) SigL1, SigE1 and SigW, vi) CnrH, RpoE, and SigG, and vii) SigJ, SigI and SigJ-like while SigD forms a separate cluster. Of notice, the SigI1 and SigI1-like cluster at a different location (close to SigL and SigL1) than SigI. Together these findings indicate the phylogenetic relationship and evolutionary history of the σ-factors in these mycobacteria.

The activities for several of the ECF σ-factors such as SigC, SigD, SigE, SigF, SigH, SigK, SigL and SigM are regulated by anti-σ factors (Fig. S4e). The anti-σ factor genes co-localize with the σ-factor genes with the exception of the putative rscA (anti-SigC)37,38,43, which is positioned elsewhere on the chromosome in e.g, Mtb44. However, in Mmuc-clade members it is located close to sigC with four or five genes in between (Fig. S4f; see discussion). Moreover, as in Mtb the anti-σ factor genes rsbW, rsdA, rseA, rskA (A1 and A2), rsmA, rslA, rshA are directly linked to the corresponding σ-factor gene in the Mmuc- and Mneo-clade members (Table S3g). An “anti-Sigma factor” phylogenetic tree (based on amino acid sequences with RsbW as the root) revealed that RsdA is closest to the ancestor followed by RskA1 and RskA2, RsmA, RslA, RshA and RseA. This is in rough agreement with the phylogeny for the corresponding σ-factors with the notable exceptions for RsdA and RseA. The RsdA is closer to the two RskA, RscA, RslA, RsmA and RshA while SigD (using SigF as the root) branch-out earlier and is positioned closer to the rooted σ-factor. The opposite is the case for RseA, which is positioned closer to the rooted anti-σ factor whereas SigE shares a common ancestor with SigH and is positioned closer to SigC, SigM, SigL and the two SigK (Fig. S4g,h; see discussion).

Serine threonine protein kinases – STPKs

We predicted a total of 35 different pkn genes encoding STPKs in all the members of the Mmuc- and Mneo-clades (Fig. 3f; Table S3e). Following the naming of STPK genes identified in MtbH37Rv23, these pkn genes were predicted to be orthologous of pknA-L, or to one referred to as pkn. The number of pkn genes in the individual species varied from 14 (MneoT; notably, only 13 genes were predicted using the MneoDSM44074 genome available at NCBI) to 19 genes (MmucLZSF01; which is a Mpho strain7), while M. sp. URBH0044 with its larger genome was predicted to carry 24 pkn genes. This should be compared to MtbH37Rv, MmarT and Msmeg, which harbor 11, 27 (21 with and six genes lacking the kinase domain) and 16 pkn genes, respectively (Fig. 3f; Table S3e). Genes encoding PknB, PknG and PknL were detected in all species, including MtbH37Rv, MmarT and Msmeg, indicating their importance in mycobacteria. PknA is another important STPK23, and we predicted pknA in all these mycobacteria, except McosT. However, we detected it in the publicly available McosDSM44829 genome. Therefore, its absence in our McosT genome is likely due to draft genome status. Interestingly, we predicted two copies of pknA and pknB in the Mneo-clade members. Two pknA genes were also found in MmucT and MphoT, while one pknA and two pknB were detected in MaubT (failure to detect the additional pknA copy in MaubT could again be due to draft genome status). Of the two MmucT pknA (pknA1 and pknA2; 37% identity and 86% query coverage), pknA1 is positioned downstream of pknB while pknA2 is localized elsewhere on the chromosome. An extra pknG gene was also predicted in McosT, while pknK was only detected in MphoT and MmucLZSF01. Multiple copies of pknF were predicted to be common in the Mmuc-clade members, while several pknH copies were found in Mneo-clade members. On the basis of these data (Supplementary information Table S3e, pkn genes as indicated in columns C and F) we generated the “Pkn phylogenetic tree” illustrating the interrelation between the different STPKs in these mycobacteria and MtbH37Rv (Fig. S4i; see also ref. 45).

Analysis of transcription of selected virulence genes in Mmuc T under different growth conditions

To map transcription, we isolated RNA from MmucT cells growing at exponential phase and cells from stationary phase and subjected the RNA to RNASeq (see Methods). For comparison, we decided to use the SGM Mmarrfp for which we have access to similar transcriptome data18 (unpublished data). We focused on 129 virulence genes identified above (Fig. 3a,b) and the data are presented in Fig. 4.

Figure 4
figure 4

Transcription of virulence genes in MmucT and Mmarrfp Bar plot showing mRNA levels for 147 virulence factor genes present in exponentially growing and stationary MmucT (red) and Mmarrfp (turquoise) cells as indicated. A negative log2-value suggests that the corresponding mRNA is more abundant in exponentially growing cells while a log2-positive value suggest higher levels in stationary cells. The x-axis labels refer to the gene name and/or gene annotation number in MmucT. Statistical significance, see Methods; *p < 0.05; **p < 0.01; ***p < 0.001. For the genes where no genes are indicated (top panel) the numbers refer to annotated gene numbers in MmucT since no gene names are available. For a detailed description see Table S3a.

For the majority of genes, the change (log2-fold) in mRNA levels comparing exponentially growing cells with stationary cells showed similar patterns (albeit the magnitudes of change differ) for MmucT and Mmarrfp with a few notable exceptions. In MmucT, the mmgC_4 transcript is higher in stationary phase while in Mmarrfp it is lower. This is also the case for mmaA4_1, aceA, papA1, fbpB_2, and dhbE mRNA. All these genes are related to building the cell wall. Considering σ-factor mRNAs (with roles in virulence, see also below), we noted that the levels for sigD, sigL_1 and sigM in MmucT are higher in stationary cells while in Mmarrfp the corresponding transcripts are either unchanged (sigL_1) or higher (sigD and sigM) in exponentially growing cells (Fig. 4; see also below)18. With respect to antibiotic resistance, transcripts of MmucT genes such as katG and lipF are higher in stationary phase and their levels is also higher compared to Mmarrfp. Moreover, mRNA levels for the well-studied esxA and esxB genes are higher in exponentially growing MmucT cells while for Mmarrfp the corresponding transcripts are more abundant in stationary phase (see below). Taken together, it appears that some genes related to virulence are differentially expressed comparing the RGM MmucT and SGM Mmarrfp suggesting variation in the regulatory circuits controlling the expression of these genes and possibly genes under the control of SigD, SigL_1 and SigM.

Below we focus on sigma factor, ESX, mce and STPK mRNA levels in MmucT and compare with the levels detected in Mmarrfp. We calculated mRNA levels in two ways; distribution refers to the abundance of mRNA for each individual gene relative to the sum of the other transcripts of related genes, e.g., level of SigA mRNA relative to the sum of all σ-factor mRNA levels. While change refers to the change (log2-fold) in mRNA levels comparing exponentially growing cells and cells in stationary phase.

Variation in sigma factor transcripts

In exponential growth phase, SigA, SigD and SigH2 mRNAs are the most abundant and constitute roughly 60% of the MmucT σ-factor transcripts while the remaining transcripts are distributed among the other σ-factors (Fig. 5a). Compared to stationary phase there is a notable change such that the levels for SigB, SigE1 and SigL1 mRNAs increased while the fraction of SigA mRNA was lower (Fig. 5b). However, mRNA levels for the majority of σ-factors were higher in stationary cells relative to exponential cells with notable exceptions for sigC, sigE, sigF2, sigI1, sigI2, sigJ1, sigK2 and cnrH1 (Fig. 5c). Higher levels of SigB and SigE mRNAs in stationary phase are consistent with what we previously reported18 for the SGM Mmarrfp. Moreover, in contrast to the RGM MmucT the levels for the majority of the σ-factors were found to be higher in exponentially growing Mmarrfp cells (Fig. S5a–c)18. From this comparison, it appears that the regulation of the expression of the σ-factor genes differs comparing MmucT and Mmarrfp. Also, the data indicates that SigB and SigE (SigE1 in MmucT) represent σ-factors necessary for the expression of genes in stationary phase both in RGM and SGM.

Figure 5
figure 5

Analysis of sigma factor mRNA levels in exponentially growing and stationary MmucT cells. (a,b) Distribution profiles for MmucT sigma factor mRNAs at different growth conditions expressed as percentage, panel (a) exponentially growing cells and panel (b) stationary cells. All sigma factor mRNAs together constitute 100%, for details see main text. (c) Change, expressed as log2-fold change, comparing mRNA levels in exponentially growing and stationary MmucT cells. A negative log2-value suggests that the corresponding mRNA is more abundant in exponentially growing cells while a positive value suggest higher levels in stationary cells. (d) Change in mRNA levels (log2-fold) for cognate sigma/anti-sigma pairs mRNA levels in exponentially growing and stationary MmucT cells, see also figure legend (c). Statistical significance, see Methods; *p < 0.05; **p < 0.01; ***p < 0.001.

Considering anti-σ factor mRNAs, we observed that the change in mRNA levels for rseA, rsdA, rshA, rskA1, rskA2 and rslA1 followed the change for the corresponding σ-factor mRNAs (Fig. 5d). For rsbW it was higher, which might be related to that rsbW is positioned upstream of sigF3 while the other anti-σ factor genes are located downstream of the corresponding σ-factor gene (but note rscA). The relationship comparing mRNAs for these sigma factors and anti-sigma factors was expected given that they likely belong to the same transcriptional unit. In contrast to the other “σ/anti-σ“ pairs, the rsmA mRNA level is higher in exponentially growing cells while for sigM it is higher in stationary phase (Fig. 5d). This might indicate differences in the regulation of sigM vs rsmA and/or stability of the corresponding mRNA. We also note that the mRNA level for the putative anti-SigC follow the same trend as SigC (Fig. 5d).

Variation in the levels of ESX, STPK and Mce mRNAs under different growth conditions

Irrespective of growth phase mRNAs originating from pknA1, pknB and pknF2 were the most abundant; together they constitute roughly 50% of all STPK mRNAs. However, comparing mRNA levels in exponential and stationary phase suggested that these as well as the other pkn mRNAs decreased in stationary phase with the exception of pknE1 mRNA, which increased, but the magnitude of change was less than one-fold (log2) (Fig. 6a; for distribution see Fig. S6a,b). Moreover, given that pknA1 is positioned upstream of pknB and that the pknB mRNA is approx. two-fold more abundant (irrespective of growth phase; Fig. 6a) than pknA1 might indicate regulation at the transcriptional level or that the pknA1 mRNA is more prone to degradation. Compared to MmucT, MmarT carries 21 complete STPK genes (of 27, see above) and pknA and pknB mRNAs are also among the most abundant in exponential and stationary cells with pknB mRNA modestly higher (Fig. S6c,d). As for MmucT, we detected small changes with the notable exceptions for pknF2 and pknK2, which are higher during exponential growth and in stationary cells, respectively (Fig. 6b). Together this indicated similarities in STPK mRNA levels (i.e. the homologs) in these two phylogenetically distant mycobacteria.

Figure 6
figure 6

Analysis of STPK, ESX and Mce mRNA levels in exponentially growing and stationary MmucT and Mmarrfp cells. (a,b) Change, expressed as log2-fold change, comparing STPK mRNA levels in exponentially growing and stationary MmucT (panel a) and Mmarrfp (panel b) cells. A negative log2-value suggests that the corresponding mRNA is more abundant in exponentially growing cells while a positive value suggest higher levels in stationary cells. (c,d) Change, expressed as log2-fold change (see a and b above), comparing ESX mRNA levels in exponentially growing and stationary MmucT (panel c) and Mmarrfp (panel d) cells. (e,f) Change, expressed as log2-fold change (see a,b above), comparing Mce mRNA levels in exponentially growing and stationary MmucT (panel e) and Mmarrfp (panel f) cells. Statistical significance, see Methods; *p < 0.05; **p < 0.01; ***p < 0.001.

Among ESX clusters, transcripts originating from ESX-1 were the most abundant followed by ESX-3 transcripts in MmucT (Fig. 6c). The ESX-1 transcripts from esxA and esxB constituted roughly 35% of all ESX-1 transcripts. Moreover, some of the ESX-1 mRNA levels are higher in exponentially growing cells, including esxB and esxA, while the levels for others such as pe35 and ppe68 increase in stationary phase (Fig. 6c). By contrast, the ESX-3 mRNA levels are higher [between approx. 3.8- to 6.8-fold (log2)] in cells growing in exponential phase. The levels for ESX-4 transcripts are low but it appears that there is an increase for two ESX-4 genes, eccC4 and mycP4, in stationary phase. With respect to ESX-1, EccC and MycP (a mycosin protease essential for secretion) are integrated into the inner membrane46. For Mmarrfp, mRNA levels for the ESX-1 genes (except eccE1 and mycP1) and espACD were all higher in stationary phase while changes in the patterns for ESX-3 and ESX-4 mRNAs were similar to those observed in MmucT (cf. Fig. 6c,d; see discussion).

The distribution profiles in MmucT for mce operon transcripts showed high similarity comparing exponential and stationary cells with the most abundant transcripts originating from the mce1 operon followed by the mce4 and mce5 while the levels for mce3, mce7 and mce8 were low (Fig. S6g,h). Moreover, comparing mRNA levels isolated from exponential and stationary cells revealed an increase for mce3 (except for mce3E_1 and mce3D_1), mce5 and mce8. The increase was however ≤2.4-fold (log2; Fig. 6e). For MmucT, mce4 and mce7 transcripts we detected lower levels for some of these mRNAs in stationary cells and no change for mce1 transcripts (Fig. S6i). The pattern of change for the corresponding mce genes in Mmarrfp was similar compared to MmucT (except mce8 genes; cf. Fig. 6e,f). On the basis of these observations it appears that these mce operons in these two mycobacteria, in particular in MmucT, are differentially expressed dependent on growth phase.

Discussion

We recently reported that the size of the genomes for Mmuc- and Mneo-clade members including type strains range between 5.4 to 6.4 Mbp and that phages, IS elements as well as horizontally transferred tRNA genes and phage-derived ncRNAs have likely contributed to the evolution of these mycobacteria7. Here we provide data suggesting that the number of unique genes range between 702 and 1618 in the type strains MmucT, MphoT, MaubT, MneoT and McosT encompassing roughly 12% to 26% (dependent on mycobacteria) of the total annotated CDSs. While comparing Mmuc- and Mneo-clade members revealed that the former group carry 637 and the latter 557 unique genes (based on sixteen genomes; Fig. S1). Among the unique genes we noted differences such as a notable number of genes classified as: “Phages, Prophages, Transposable elements, Plasmids” in MmucT, “DNA metabolism” in MphoT and “Carbohydrates” and “Respiration” in McosT. Also, Mneo-clade members carry a larger number of core genes in the category “Carbohydrates” where McosT encodes the highest number of unique genes in this category (as well as in the “Amino Acids and Derivatives” category). Together this expands our previous study7 and provides further insight into the evolution of these mycobacterial species. As such our findings reflect their capacity to thrive in different environments. In this context, Mmuc- and Mneo-clade members were predicted to carry several horizontally transferred genes (HGT) originating from other bacteria as well as genes of eukaryotic origin such as the inosine-5’-monophosphate dehydrogenase gene from Tritrichomonas suis, a parasite of pigs, cattle and cats47.

Type VII secretion systems

Of the CDSs in the type strains MmucT, MphoT, MaubT, MneoT and McosT we predicted that approx. 3.4 to 4.4% code for proteins related to virulence. This is in accordance with Msmeg while of the 3906 CDSs in MtbH73Rv 8.7% (6% in MmarT) are classified as virulence genes. The difference is mainly related to that MtbH73Rv and MmarT both have larger numbers of genes in the categories “Cell surface components” and “Secretion system” while the number of mce genes is lower. Of the five type VII secretion systems (ESX 1-5) identified in MtbH37Rv and Mmar (which also has ESX-6 but no ESX-2), ESX-1, -3 and -5 have been reported to affect virulence20,46,48,49,50,51,52. The Mmuc- and Mneo-clade members lack ESX-5 genes but carry ESX-1 and ESX-3. With respect to ESX-1 espE, espF, espJ and espL are missing where espE and espF have been suggested to influence biofilm formation53. It was recently reported that EspL is essential for MtbH37Rv virulence in stabilizing the levels of EspE, EspF and EspH54. Whether the absence of three of these proteins has an impact on virulence for Mmuc- and Mneo-clade members remains to be investigated. Comparison of ESX-1 gene transcripts in MmucT and Mmarrfp revealed that while the mRNA levels are higher in exponential phase for the majority of the genes in MmucT, including the major virulence factors esxA (ESAT6) and esxB (CFP10), they increase in Mmarrfp stationary cells (Fig. 6b). Together this might indicate differences in the regulation of these and other ESX-1 genes in the RGM MmucT and SGM Mmarrfp. Moreover, the espACD operon suggested to be a pathogenicity-associated genomic island is present in a number of SGM pathogens46. In Mtb, export of EsxA and EsxB is suggested to be co-dependent on EspA and EspC, where EspC is localized on the bacterial surface46,55. In Mmarrfp, the level of the espACD transcript is > five-fold (log2) higher in stationary phase. Albeit, espACD is absent in Mmuc- and Mneo-clade members, espA and espD paralogs were predicted to be present in Mmuc-clade members and in three Mneo strains while espC appears to be missing. For the two espA paralogs in MmucT, we detected modestly higher mRNA levels in exponentially growing cells while the levels for the three espD paralogs were similar (if anything slightly higher in exponential phase; not shown). Together this raises questions about the function of the espA and espD paralogs and the mechanism of secretion of ESAT6 and CFP10 in Mmuc- and Mneo-clade members. In contrast ESX-3 gene transcripts were higher in exponentially growing MmucT and Mmarrfp cells. ESX-3 has a role in iron acquisition and virulence56,57 and hence, our findings reflect the demand of iron in growing mycobacterial cells.

Rough and smooth colony morphotypes

Isolates of various mycobacteria such as Mycobacterium abscessus, Mycobacterium salmoniphilum and Mmar can form smooth and rough colony morphotypes (CM) when grown on solid media24,52,58. We also note that Mycobacterium canettii with its unusual CM is referred to as the “smooth tubercle bacilli”59,60. However, smooth morphotypes for certain SGM, including M. canettii, are due to the production of lipooligosaccharides and differs from RGM61. Analysis of in particular M. abscessus, Msmeg and Mycobacterium avium strains reveals that the smooth and rough CM are related to genes involved in generating glycopeptidolipids (GPL locus), which are exposed on the cell surface. Mutations or deletion of genes within the GPL locus can result in transition from a smooth (S) to a rough (R) CM, where mycobacteria with rough CM appear to be more virulent24,52. MmucT is highly mucoid when grown on solid media3,5 and surveying the GPL locus shows that several of the genes are absent in these members of the Mmuc- and Mneo-clades. In keeping with this, MmucT forms R colonies when grown on 7H10 media. In contrast, MphoT, MaubT and McosT forms S colonies while in MneoT cultures we detected both R and S colonies (Fig. S8). Therefore, this indicates that factors other than genes in the GPL locus influence CM and we cannot exclude that gene transcription has a role. Considering the mucoid growth of Mmuc strains these findings suggest that it is not simply related to the GPL locus.

Sigma and anti-sigma factors

As other mycobacteria Mmuc- and Mneo-clade members encode for several ECF σ-factors in addition to SigA, SigB and SigF18,37,38,58,62. Phylogeny suggested that these cluster into distinct groups. However, they do not cluster into groups in accordance with their annotation. Hence, this emphasizes the importance to generate phylogenetic trees in order to understand the phylogenetic relationship and evolutionary history of σ-factors and anti-σ factors. This relates also to other factors such as STPKs (pkn genes; Fig. S4i). More specifically, given that several ECF σ-factors are closely linked to an anti-σ factor gene [rsbW, rsdA, rskA (A1 and A2), rsmA, rslA, rshA] one important question is whether these anti-σ factors also interact with other σ-factors. Therefore, to understand and study if any of the known anti-σ factors also regulate the activity of another σ-factor information about the phylogenetic relationship of both the σ- and anti-σ factors is important. For example, previously it was reported that the Escherichia coli Rsd anti-σ factor interacts with the house keeping σ-factor, Sig70, and interferes with its activity63,64,65. We also note that the putative anti-σC factor (rscA) is not positioned close to sigC in Mtb37,43 indicating that anti-σ factor genes do not necessarily have to be part of the same transcriptional unit as the corresponding σ-factor gene. However, in MmarT and Mmuc-clade members the putative rscA is located close to sigC with one and four (or five) genes in between, respectively (this report; Pettersson et al., unpublished). Mneo-clade members lack sigC and rscA and relative to Mmuc-clade members they also lack nearby genes (Fig. S4f). Given that mycobacteria that are phylogenetically closer to the mycobacterial ancestor carry sigC and rscA (unpublished data) might indicate that the Mneo-clade members lost sigC and rscA after they diverged from the Mmuc-clade.

We recently reported variations of σ-factor mRNA levels in Mmarrfp that depended on growth conditions with sigB and sigE mRNA levels dominating in stationary phase18. Although sigB mRNA dominates in MmucT stationary cells, sigD and sigL1 levels are higher than the sigE homolog, sigE1. Together, this is in accordance with the notion that SigB is involved in general stress response in mycobacteria and that this likely also apply to SigE18,38,65,66. Apart from that the two sigL mRNAs increased significantly in stationary phase the levels for the ECFs sigW1, sigW2, cnrH1 and cnrH2 were higher. The sigW genes were annotated on the basis of sigW present in Bacillus subtilis, which have been suggested to be involved in mediating resistance to certain antibiotics, e.g. fosfomycin67. The cnrH genes were predicted as homologs of cnrH present in Cupriavidus metallidurans CH34 where it is part of a circuit regulating resistance to metals, in particular nickel68,69. Together this raises the possibility that stationary MmucT cells are prepared to “face” exposure to antibiotics such as fosfomycin and nisin as well as nickel. Moreover, the MmucT sigD mRNA level increased in stationary phase, which is in contrast to Mtb and Mmar where it is lower18,70,71. For Mtb, however, higher level of SigD mRNA in late stage of “exponential” growth has been reported72. Nevertheless, in Mtb SigD has been discussed to affect various stages during infection such as replication and cell division73,74 and to be involved in the control of ribosome-associated genes75. On the basis of these findings it is clear that in order to get a deeper understanding of the role of SigD (as well as the role of other MmucT sigma factors), and whether there is a difference between SGM and RGM, warrants further studies. Interestingly, a functional similarity between SigW in B. subtilis and SigD in Mtb has been discussed75. In this context, we note that Mneo SigD is suggested to act as a negative regulator in phytosterol metabolism76.

Antibiotic resistance – rifampin and isoniazid

The RNA polymerase (RNAP) is the target for one of the first-line anti-TB drugs, rifampin, while another first line drug, isoniazid, interferes with the building of the cell wall. Mutations in rpoB (RNAP β-subunit) and katG can result in resistance to rifampin (RifR) and isoniazid, respectively25,77,78 (https://tbdreamdb.ki.se/), and Mmuc- and Mneo-clade members are resistant to both these antibiotics5,79 (see introduction). For rifampin, we were unable to detect changes in any of the antibiotic resistance “hot spot” positions in rpoB that could be the reason to their natural resistance. Two other genes, rbpA and arr, have been reported to influence resistance. RbpA corresponds to an essential RNAP binding protein in mycobacteria80,81,82 while Arr is a rifampin ADP-ribosyltransferase, which catalyzes rifampin ribosylation82 (and Refs therein). Mneo-clade members have two copies of rbpA while for the Mmuc-clade members only one was predicted (Table S4a). For arr, the Mmuc strains all have two copies (MUCO_DSM_01098 and MUCO_DSM_04701) while the other mycobacteria carry one copy with the exception of the Mneo strains which appear to lack arr (Table S4b). The two arr are expressed both in exponential and stationary phase MmucT cells. However, the mRNA level for MUCO_DSM_01098 is three times higher in exponentially growing cells than MUCO_DSM_04701. But, in stationary phase their mRNA levels are roughly equal (Fig. S7).

For isoniazid, we detected amino acid substitutions at several positions in katG and inhA where mutations have been reported to lead to resistance. For example, all Mmuc- and Mneo-clade members carry valine at 139 in katG while isoniazid sensitive Mtb has alanine at this position and mutation to valine result in isoniazid resistance25,78.

Together these observations might give indications as to why Mmuc- and Mneo-clade members show resistance to rifampin and isoniazid.

Concluding remark

Our present findings together with our recent report where we analyzed the Mmuc- and Mneo-clade members focusing on tRNAs and non-coding RNAs7 provide insight into the biology of these two groups of rapid growing and opportunistic NTM pathogens. As such this knowledge will be useful to treat infections caused by these and other mycobacteria as well as to identify the species causing the infection. This is exemplified by the finding that an isolate classified as Mmuc (MmucLZSF01) in fact should be considered as a Mpho strain7.

Methods

Strains and genomes

For description of M. mucogenicum DSM44124 (MmucT), M. phocaicum DSM45104 (MphoT), M. aubagnense DSM45150 (MaubT), M. neoaurum DSM44074 (MneoT) and M. cosmeticum DSM44829 (McosT), the other Mmuc- and Mneo-clade members and Mycobacterium sp. URHB0044 genomes (in total 17 genomes) see Table S1. The genomes were previously deposited at NCBI under the Bioproject: PRJNA429429, see Behra et al.7.

Genome annotation, functional classification and core genes

Genome annotation and coding sequences (CDS) were predicted using the PROKKA software (version 1.0.9)83. For functional classification, the predicted (PROKKA and NCBI annotated) CDS were subjected to BLASTp against the RAST predicted CDS followed by mapping to the RAST subsystem database (http://rast.nmpdr.org/, last accessed May 5, 2015)84 using the BLAST approach85.

Core genes were identified as previously reported7.

Horizontal gene transfer – HGT

Putative horizontal gene transfer (HGT) were identified using the HGTector tool v1.986. The prediction of HGT genes is based on the combination of BLAST search method, and the NCBI taxonomic hierarchical classification. For the BLAST search, we used the NCBI nr-database (Uppmax resource, Uppsala University, as of Sep 2015), the NCBI-BLAST version 2.2.30+ and the tool BLASTp search setting the e-value to 1e-100.

For the NCBI taxonomic classification we chose “self = Mycobacterium” (taxonomic_id 1763) and “close = Actinomycetales” (taxonomic_id 2037), where the group “distal = all other organisms except from the list of “self” and “close” groups (as of NCBI taxonomy on Sep 2015; Ref of NCBI taxonomy)86. The common and unique HGT genes were obtained using the PanOCTv1.9 pipeline87, the NCBI-BLAST (ver 2.2.30+) tool, and the tool BLASTp with a minimum percentage identity 45% and query coverage 70%. As a complement, we performed a Mann-Whitney-Wilcoxon test (in R ver 3.2.2, 2015-08-14 on platform x86_64-pc-linux-gnu) for GC content of all protein CDS and GC content of putative HGT genes.

Prediction of virulence factor genes (VFDB)

Virulence factor genes were predicted using the tool VFanalyzer, webserver available at the Virulence Factor Data Base, VFDB88,89. We initially used protein CDS as input to the VFanalyzer and as reference we included MtbH37Rv, MsmegMC2-155, Mmar M strain, M. ulcerans AGY99, and M. avium partuberculosis K10 available from the VFDB database. The single XLS data files were combined and the so obtained data file were used for presentation of the data using the R interface (ggplot2 package)90.

For the detailed analysis of sigma factor, STPK, ESX, Mce and GPL locus genes we used MtbH37Rv, Msmeg (and M. abscessus, M. avium) and the VFDB XLS reference dataset (see above) as references. For identification of orthologous genes we used PanOCT and pair wise analysis using the reciprocal BLASTp hit followed by verifying the corresponding protein using the SMARTdb database for protein architecture91.

Phylogenetic analysis

Phylogenetic trees for sigma factor, anti-sigma factor and STPK proteins were generated based on the alignment of the respective amino acid sequences using the MAFFT (version 7.147b) software92. The so obtained MAFFT aligned multiple sequences were computed using the FastTree tool with 1000 cycles of bootstrapping and run with the default settings: default run infers approximately-maximum-likelihood phylogenetic trees from alignments of protein sequences, using the model Jones-Taylor-Thorton + CAT models of amino acid sequences93. Figures were generated using the iTOL tool webserver94.

The rooting of sigma factor phylogeny was set at the node level of SigA, SigB and SigF. For the cognate sigma/anti-sigma factor phylogeny SigF and RsbW were set as roots while for the STPKs we used the PknG protein.

RNA extraction, RNA sequencing and analysis

RNASeq analysis was performed as described in detail elsewhere18. Briefly, MmucT and Mmarrfp (biological duplicates) were grown in 7H9 media at 37 °C and 30 °C, respectively, and total RNA was extracted from exponentially growing and stationary phase cells. The RNA was extracted using Trizol and a bead beater, DNase treated and submitted for RNA sequencing at the SNP@SEQ Technology Platform at Uppsala University (HiSeq 2000 Illumina platform).

For the MmucT, the RNASeq data sets (i.e. number of reads) were mapped to the reference complete genome by building the index using bowtie2 v2-2-495 and followed by alignment with the tool Tophat v2.0.1396. From the aligned BAM files the read-counts were generated using the HTseq v0.9.197 and normalization, differential expression analysis was performed by using the Deseq.2 package, which gives p + adj values, i.e. statistic significance98. With respect to Mmarrfp the RNASeq data was generated as described by Pettersson et al.18.

Ethics statement

All methods were carried out in accordance with relevant guidelines and regulations.