Introduction

Investigating non-tuberculous mycobacteria in Iran recently succeeded in the characterization of two new species of rapidly growing scotochromogenic mycobacteria, i.e. Mycobacterium iranicum isolated in 2009 from the bronchoalveolar lavage of a 60-year-old female patient suffering from chronic pulmonary disease1 and Mycobacterium celeriflavum isolated in 2010 from the sputum of a 44-year-old male suffering from chronic obstructive pulmonary disease2. We recently had the opportunity to investigate four clinical isolates made in Iran and we proved they were representative of one additional new species of non-tuberculous Mycobacterium that we named Mycobacterium ahvazicum. AFP-003T strain was isolated in 2009 from the sputum and bronchoalveolar lavage specimen of a 68-year-old Iranian female suffering from chronic pulmonary disease. Phenotypic and genetic investigations based on 16S rRNA and rpoB gene sequencing revealed that the AFP-003 strain was probably representative of a new species of non-tuberculous Mycobacterium in Iran. Following the isolation of strain AFP-003, three other strains exhibiting the very same phenotypic and genetic characters were isolated in Iran: strain AFP-004 was isolated in 2009 from a biopsy of diseased soft tissues in a 49-year-HIV-infected patient, strain MH1 was isolated in 2013 from sputum in a 60-year-old male patient and strain RW4 was isolated in 2015 from a bronchoalveolar lavage specimen in a 19-year-old patient suffering from asthma. Strain AFP-003 was then fully characterized as a type strain and then designated as strain AFP-003T.

Results

AFP-003T yielded smooth, yellow and scotochromogenic colonies after 3–4-weeks of incubation on Löwenstein-Jensen medium at a temperature between 33 °C and 42 °C, with an optimal growth at 37 °C; but it did not grow on Löwenstein-Jensen containing 5% NaCl. The observation of colonies by electron microscopy showed rod-shaped bacilli measuring 1.53 ± 0.32 µm long and 0.64 ± 0.07 µm large (Fig. 1). AFP-003T exhibited a heat-stable (68 °C) catalase but was negative for semi-quantitative catalase; and negative for urease activity, iron uptake, tellurite reduction, arylsulfatase activity after three days, niacin production, nitrate reduction, Tween hydrolysis and growth on MacConkey agar without crystal violet. These conventional phenotypic tests were not sufficient to differentiate AFP-003T from Mycobacterium lentiflavum (Table 1). However, AFP-003T reproducible matrix-assisted laser desorption ionization-time of flight-mass spectrometry (MALDI-TOF-MS) profile did not match any of the profiles entered in the Bruker database (version December, 2015, including M. lentiflavum), suggesting that AFP-003T was not identifiable as M. lentiflavum and could indeed be representative of a hitherto undescribed species of Mycobacterium.

Figure 1
figure 1

Transmission electron microscopy of Mycobacterium ahvazicum strain AFP-003T. The scale bar represents 200 nm.

Table 1 Phenotypic characteristics of M. ahvazicum strain AFP-003T and related slowly growing mycobacteria species.

AFP-003T was then shown to be in vitro susceptible to ciprofloxacin, clarithromycin and rifampicin (Table 2). Furthermore, Biolog® Phenotype MicroArray test showed that AFP-003T grew under other 14 inhibitory chemical conditions including minocycline, lincomycin, guanidine HCl, Niaproof Anionic Surfactant, vancomycin, tetrazolium violet, tetrazolium blue, nalidixic acid, lithium chloride, potassium tellurite, aztreonam, sodium butyrate and sodium bromate; and was able to metabolize eight carbon sources including α D-glucose, glucuronamide, methyl pyruvate, α-keto-glutaric, α-keto-butyric acid, acetoacetic acid, propionic acid and acetic acid (Table 3). The 16S rRNA gene sequence’s (GenBank accession: LT797535) highest similarity was of 98.1%, 97.8%, 97.5% and 97.4% with M. lentiflavum ATCC 51985, Mycobacterium simiae, Mycobacterium triplex and Mycobacterium sherrisii, respectively. Partial rpoB gene sequencing was previously shown to be a useful marker to delineate new Mycobacterium species3 and we sequenced a 619-bp rpoB gene fragment in AFP-003T strain (GenBank accession: FR695853). This sequence’s highest similarity was of 96.43%, 95.55% and 94.95% with Mycobacterium florentinum DSM 44852, Mycobacterium stomatepiae DSM 45059 and Mycobacterium genavense FI-06288 respectively. These values being all below the 97% cut-off value previously proposed to delineate different species among Mycobacterium3 enforced the suggestion that AFP-003T was representative of a new species belonging to the M. simiae complex, the largest complex in the genus Mycobacterium currently comprising 18 species4,5 (Fig. 2).

Table 2 Minimum inhibitory concentration of selected antibiotics against two M. ahvazicum strains.
Table 3 Phenotype Microarray Biolog, Gen III Microplate profile of M. ahvazicum strain AFP-003T.
Figure 2
figure 2

Phylogenetic tree based on the 16S rRNA gene sequence indicating the phylogenetic position of M. ahvazicum strain AFP-003T relative to other species of M. simiae and other mycobacteria species including Mycobacterium tuberculosis as an out group. Sequences were aligned using CLUSTLE W implemented on MEGA733. The analysis involved 34 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 1,233 positions in the final dataset. Phylogenetic inferences obtained using the maximum likelihood method based on the Tamura and Nei model (bootstrapped 1000 times). Bootstrap values >50% are given at nodes. Bar, 0.005 substitutions per nucleotide position.

We therefore decided to sequence the genome of AFP-003T strain. Genome sequencing yielded four scaffolds indicative of one 6,121,237-bp chromosome (66.24% GC content) without evidence for any extra-chromosomal replicon (Fig. 3). The genome of AFP-003T is smaller than that of Mycobacterium parascrofulaceum and M. triplex (6.564 Mb and 6.383 Mb, respectively) but larger than that of Mycobacterium interjectum, Mycobacterium genavense, M. sherrisii and M. simiae (5.848 Mb, 4.936 Mb, 5.687 Mb and 5.783 Mb, respectively); its GC% content is lower than that of M. parascrofulaceum, M. interjectum, M. genavense, M. sherrisii and M. triplex (68.45%, 106 67.91%, 66.92%, 66.92% and 66.6%, respectively) but higher than that of M. simiae (66.17%). The AFP-003T genome encodes for 5,704 proteins and 52 RNAs including 49 tRNA and one complete rRNA operon in agreement with its classification as slowly growing mycobacterium. A total of 4,869 genes (85.36%) were assigned with putative function (by COGs or by nr blast), whereas 88 genes (1.54%) were identified as ORFans. The remaining genes were annotated as hypothetical proteins without COG assignment (835 genes, 14.64%). A total of 2,617 proteins were found to be associated with the mobilome, including 194 phage proteins. Further genome analysis predicted two incomplete 23.3-Kb and 12.6-Kb prophage regions (Fig. 4). A total of 1,225 proteins were found to be associated with virulence, 95 proteins were associated with toxin/antitoxin systems and 11 genes encoded for bacteriocins while no gene was associated with the resistome. We identified a large number of genes assigned to COG functional categories for transport and metabolism of lipids (10.6%), secondary metabolites biosynthesis, transport and catabolism (6.8%), amino acid transport and metabolism (4.03%) and energy production and conversion (5.3%) (Table 4).

Figure 3
figure 3

Graphical circular map of the chromosome of M. ahvazicum strain AFP-003T. From outside to the center: Genes on the forward strand colored by COG categories (only genes assigned to COG), genes on the reverse strand colored by COG categories (only gene assigned to COG), RNA genes (tRNAs green, rRNAs red), GC content and GC skew.

Figure 4
figure 4

Genomic organization of two uncomplete prophage regions in the genome of M. ahvazicum strain AFP-003T.

Table 4 Number of genes associated in the M. ahvazicum strain AFP-003T genome with the 25 general COG functional categories. The total % is based on the total number of protein coding genes in the annotated genome.

The genome of AFP-003T has the genetic potential to produce secondary metabolites, with 39 genes found to be associated with polyketide synthases and non-ribosomal peptide syntases. M. ahvazicum genome exhibits an average nucleotide identity of 86% with M. genavense, 82% with M. simiae, 81% with M. interjectum, 72% with M. triplex, 69% with M. parascrofulaceum and 68% with M. sherrisii (Tables 5, 6). In silico DNA-DNA hybridization analysis yielded 36.45% ± 3.46% with M. triplex, 32.55% ± 3.46% with M. genavense, 26% ± 3.39% with M. sherresii, 25.8% ± 3.39% with M. simiae, 24.7% ± 3.39% with M. interjectum and 24.2% ± 3.39 with M. parascrofulaceum. Ori-Finder6 was used to predict the origin of replication in the genome of strain AFP-003T. We found three OriC regions separated by the dnaA gene and located in scaffold 1 (218, 312 and 391 bp) (Supplementary File 1). The three predicted OriC region showed no homology sequence with those of the DoriC database77. Contigs have been deposited (EBI accession number: FXEG02000000). Annotated genome is available at https://www.ebi.ac.uk/ena/data/view/PRJEB20293.

Table 5 Numbers of ortholog genes between genomes (upper right), average percentage similarity of nucleotides corresponding to orthologs between genomes (lower left) and number of ORFs per genome (bold); in selected M. simiae complex species including M. ahvazicum strain AFP-003T.
Table 6 Comparison of M. ahvazicum AFP-003T with related mycobacteria species using GGDC, formula 2 (DDH estimates based on identities/HSP length.

To better describe AFP003T, the mycolic acids were identified. The mass spectrometry analysis of Mycobacterium tuberculosis H37Rv strain (used as a positive control) showed the previously described mycolic acid pattern8,9, including α- (C74-84), methoxy- (C80-90) and keto- (C80-89) forms. Strain AFP-003T showed two known mycolic acids subclasses, α- (C71-74) and α′- (C64-68) forms, representing 15% of relative intensity defining an original mycolic acid profile (Table 7, Fig. 5).

Table 7 Identified mycolic acids for strains AFP-003T and Mycobacterium tuberculosis H37Rv (control).
Figure 5
figure 5

ESI-MS spectra of the [M − H] mycolic acid ions. (A) Mycobacterium tuberculosis H37Rv (control), (B) Mycobacterium ahvazicum AFP-003T.

The unique phenotypic, genetic and genomic characteristics of AFP-003T strain all support the fact that it is representative of a hitherto undescribed species in the genus Mycobacterium. We named this new species Mycobacterium ahvazicum sp. nov., derived from the name Ahvaz, the city in the southwest of Iran where the strain AFP-003T (=JCM 18430) was discovered; and strain AFP-003T is the type strain of M. ahvazicum. The data here reported indicated that M. ahvazicum is another new species belonging to the large M. simiae complex in which 18 new species have been reported over the last fifty years. Interestingly, seven of these isolates have been isolated from sputum4,5,10,11,12,13,14, five from cervical lymph nodes15,16,17,18,19, one from blood20, one from rhesus macaques21, two from fishes22,23, one from water24 and one from an unknown human clinical source25 (Table 8).

Table 8 Synopsis of the M. simiae complex species characterized since 1965.

The discovery of M. ahvazicum is one more example illustrating that digging for mycobacteria in previously under-explored territories would reveal new species, as previously illustrated by our recent report of Mycobacterium massilipolyniensis in one remote island of the French Polynesian territories26.

Methods

Phenotypic characterization

Biochemical tests were carried out using standard methods27 and the minimal inhibitory concentration (MIC) of the major antimycobacterial agents was determined using the broth microdilution method28.

Biolog Phenotype microarray

The ability of AFP-003T to metabolize 71 different carbon substrates and resist to 23 inhibitory chemicals was tested using Gen III Microplates Biolog® Phenotype MicroArray (Biolog Inc)29. AFP-003T was cultured at 37 °C on Middlebrook 7H10 agar medium supplemented with 10% (v/v) oleic acid/albumin/dextrose/catalase (OADC) (Becton Dickinson, Sparks, MD, USA) for 2 weeks. Colonies were gently taken with the wet swab off the agar plate culture and then rubbed against the wall of a dry glass tube. The cells were then suspended in IF-B (Biolog inoculating fluid recommended for strongly reducing and capsule producing bacteria, including Mycobacteria) and adjusted to 90% transmittance using a turbidimeter (Biolog Inc). Two plates (duplicate) were then inoculated and incubated in the OmniLog PM System (Biolog Inc.) at 37 °C for three days. The results were obtained as area under the curve (AUC) by Biolog’s parametric software.

Transmission Electron Microscopy

The size of the microorganisms was determined by transmission electron microscopy (Morgani 268D; Philips, Eindhoven, The Netherlands) after negative staining at an operating voltage of 60 kV.

Extraction and analysis of mycolic acids

AFP-003T and Mycobacterium tuberculosis H37Rv (used a positive control) were cultured on Middlebrook 7H10 agar medium supplemented with 10% 0ADC for three weeks. Mycolic acids were prepared as detailed previously with modifications8,30. At least six inoculation loops were collected from a culture plate and transferred into 2 mL of potassium hydroxide 9 M. Mycolic acids were hydrolyzed at 100 °C during 2 hours. Free mycolic acids were then extracted with 2 mL of chloroform at low pH by adding 3 mL of 6 N hydrochloric acid. The organic phase was collected and dried at 40 °C under a stream of nitrogen. Free mycolic acids were then dissolved in 100 µL of a methanol-chloroform mixture (50:50, v/v) and subjected to electrospray-mass spectrometry analysis after a 2000 fold dilution in methanol. Samples were analyzed in the Sensitivity Negative ionization mode using a Vion IMS QTof high resolution mass spectrometer (Waters, Guyancourt, France). Samples were infused at 10 µL/min after fluidics wash with a chloroform/methanol solution (50:50) and monitored from 500 to 2000 m/z during 2 minutes. Ionization parameters were set as follow: capillary voltage 2.5 kV, cone voltage 50 V, source and desolvation temperatures 120/650 °C. Mass calibration was adjusted automatically during analysis using a Leucine Enkephalin solution at 50 pg/µL (554.2620 m/z). Mass spectra between 900 and 1400 m/z were used for subsequent data interpretation. Mycolic acids were described according to previously detailed structures31.

MALDI-TOF-MS

Using a sterile 200 µL tip, a small portion of a colony was picked on a Middlebrook 7H10 solid-medium and applied directly on a ground-steel MALDI target plate. Then, one µL of a matrix solution (saturated α-cyano-4- hydroxycinnamic acid in 50% acetonitrile and 2.5% trifluoroacetic acid) (Bruker Daltonics) was used to over-lay the sample. After 5 minutes-drying at room temperature, the plate was loaded into the Microflex LT (Bruker Daltonics) mass spectrometer. Spectra were recorded following the parameters as previously described32. All signals with resolution ≥400 were automatically acquired using AutoXecute acquisition control in flexControl software version 3.0 and the identifications were obtained by MALDI Biotyper software version 3.0 with the Mycobacteria Library v2.0 database (version December, 2015).

Phylogenetic analysis

Phylogenetic and molecular evolutionary analyses based on the 16S rRNA gene sequence were inferred using the maximum likelihood method implemented on MEGA733, with the complete deletion option, based on the Tamura-Nei model for nucleotide sequences. Initial trees for the heuristic search were obtained automatically by applying the neighbor-joining and BIONJ algorithms to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach. Statistical support for internal branches of the trees was evaluated by bootstrapping with 1000 iterations.

Genome sequencing

Total DNA of strain AFP-003T was extracted in two steps: A mechanical treatment was first performed by acid-washed (G4649-500g Sigma) glass beads using a FastPrep BIO 101 instrument (Qbiogene, Strasbourg, France) at maximum speed (6.5 m/sec) for 90 s. Then after a 2-hour lysozyme incubation at 37 °C, DNA was extracted on the EZ1 biorobot (Qiagen) with EZ1 DNA tissues kit. The elution volume was of 50 µL. gDNA was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) to 32.5 ng/µL. Genomic DNA was sequenced on the MiSeq Technology (Illumina Inc, San Diego, CA, USA) with the two applications: paired end and mate pair. Both strategies were barcoded to be mixed respectively with 11 other genomic projects prepared according to the Nextera XT 166 DNA sample prep kit (Illumina) and with 11 others projects according to the Nextera Mate 8 Pair sample prep kit (Illumina). To prepare the paired-end library, 1ng of gDNA was fragmented and amplified by limited PCR (12 cycles), introducing dual-index barcodes and sequencing adapters. After purification on AMPure XP beads (Beckman Coulter Inc, Fullerton, CA, USA), the libraries were normalized and pooled for sequencing on the MiSeq. Automated cluster generation and paired-end sequencing with dual indexed 2 × 250-bp reads were performed in a 9-hour run. Total information of 9.0 Gb was obtained from a 1,019 k/mm2 cluster density with a cluster passing quality control filters of 90.2% (17,374,744 passed filtered reads). Within this run, the index representation for AFP-003T was determined to be of 8.20%. The 1,424,260 paired end reads were trimmed and filtered according to the read qualities. The mate pair library was prepared with 1.5 µg of genomic DNA using the Nextera mate pair Illumina guide. The genomic DNA sample was simultaneously fragmented and tagged with a mate pair junction adapter. The profile of the fragmentation was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. The optimal size of obtained fragments was of 5.043 kb. No size selection was performed and 544 ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with optima on a bi modal curve at 421 and 881 bp on the Covaris device S2 in T6 tubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA) and the final concentration library was measured at 16.97 nmol/L. The libraries were normalized at 2 nM, pooled with 11 other projects, denatured and diluted at 15 pM. Automated cluster generation and 2 × 250-bp sequencing run were performed in a 39-hour run. This library was loaded on two different flow cells. For each run, global information of 5.3 and 7.2 Gb was obtained respectively from a 559 and 765 K/mm2 cluster density with a cluster passing quality control filters of 96.3 and 94.7% (10,450,000 and 14,162,000 passed filter clustersfor each sequencing run). Within these runs, the index representation for AFP-003T was determined to be of 8.51 and 7.62%. The 888,760 and 1,079,096 paired-end reads. The three runs leaded to a total of 3,392,116 paired-end reads which were filtered according to the read qualities. The reads were assembled using the SPAdes software (http://bioinf.spbau.ru/spades)34. Contigs obtained were combined by use of SSPACE35 assisted by manual finishing and GapFiller36. Open reading frames (ORFs) were predicted using Prodigal37 with default parameters. The predicted ORFs were excluded if they spanned a sequencing gap region (containing N). The predicted bacterial protein sequences were searched against the GenBank database and the Clusters of Orthologous Groups (COGs) database using BLASTP (E value 1e-03, coverage 0.7 and 30% identity). If no hit was found, it searched against the NR database using BLASTP with an E value of 1e-03, coverage 0.7 and 30% identity. The tRNAs and rRNAs were predicted using the tRNA Scan-SE and RNAmmer tools, respectively38,39. SignalP and TMHMM were used to foresee the signal peptides and the number of transmembrane helices, respectively40,41. For each selected genome, complete genome sequence, proteome genome sequence and Orfeome genome sequence were retrieved from the FTP site of National Center for Biotechnology Information (NCBI). All proteomes were analyzed using proteinOrtho42. An annotation of the entire proteome was performed to define the distribution of functional classes of predicted genes per cluster of orthologous groups of proteins (using the same method as for the genome annotation). The origin of replication was predicted using OriFinder5,6 (http://tubic.tju.edu.cn/Ori-Finder/) and homology with other OriC regions was searched using blast algorithm in DoriC database7 (http://tubic.tju.edu.cn/doric/). The M. ahvazicum strain AFP-003T genome was further incorporated into in silico DNA-DNA hybridization (DDH)43 with reference genomes selected based on 16S rRNA gene proximity; and DDH values were estimated using the GGDC version 2.0 online tool44. For AFP-003T genome comparison, we used the following species: of M. parascrofulaceum, M. triplex, M. interjectum, M. genavense, M. sherrisii and M. simiae.