“Mycobacterium massilipolynesiensis” sp. nov., a rapidly-growing mycobacterium of medical interest related to Mycobacterium phlei

In French Polynesia, respiratory tract clinical isolate M26, displayed unusual phenotype and contradictory phylogenetic affiliations, suggesting a hitherto unidentified rapidly-growing Mycobacterium species. The phenotype of strain M26 was further characterized and its genome sequenced. Strain M26 genome consists in a 5,732,017-bp circular chromosome with a G + C% of 67.54%, comprising 5,500 protein-coding genes and 52 RNA genes (including two copies of the 16 S rRNA gene). One region coding for a putative prophage was also predicted. An intriguing characteristic of strain M26’s genome is the large number of genes encoding polyketide synthases and nonribosomal peptide synthases. Phylogenomic analysis showed that strain M26’s genome is closest to the Mycobacterium phlei genome with a 76.6% average nucleotide identity. Comparative genomics of 33 Mycobacterium genomes yielded 361 genes unique to M26 strain which functional annotation revealed 84.21% of unknown function and 3.88% encoding lipid transport and metabolism; while 48.87% of genes absent in M26 strain have unknown function, 9.5% are implicated in transcription and 19% are implicated in transport and metabolism. Strain M26’s unique phenotypic and genomic characteristics indicate it is representative of a new species named “Mycobacterium massilipolynesiensis”. Looking for mycobacteria in remote areas allows for the discovery of new Mycobacterium species.

Non-tuberculous mycobacteria (NTM) are isolated from human and environmental samples all over the world 1 , including tropical areas and remote countries and territories 2 . Our recent investigation of mycobacteria in French Polynesia, a remote French territory composed of 117 islands in the South Pacific 3 , indicated that rapidly-growing mycobacteria formed the most prevalent group of NTM in human respiratory tract specimens 3 . Among such organisms, sequencing the 16 S rRNA, rpoB and hsp65 genes suggested that strain M26, isolated from sputum collected in August 2011 from an 86-year-old French Polynesian man living in Taravao (17 44' S, 149 18' W) was representative of a new species. Strain M26 displayed 91.47% rpoB gene sequence similarity with Mycobacterium rhodesiae CIP 106806 T (GenBank: EU109302), 98.93% 16 S rRNA gene sequence similarity with Mycobacterium smegmatis ATCC 19420 T (GenBank: NR115233) and 95.59% hsp65 gene sequence similarity with Mycobacterium conceptionense CIP 108544 T (GenBank: AY859678) 3 . Since strain M26 was regarded as a potentially undescribed species of the genus Mycobacterium, it was further characterized.
For phenotype, we observed that strain M26 exhibited yellow-orange, circular, convex, smooth colonies on Middlebrook 7H10 medium. Exposing colonies to light during incubation did not change their color. Growth occurred from 28-42 °C with optimum growth at 30-37 °C, where colonies are detected at 48-hour subculture. Growth was also observed on trypticase soy agar and egg-based Coletsos medium, and Middlebrook 7H9 broth.
Strain M26 exhibited Gram-positive, red Ziehl-Neelsen stained cells, which were non-motile and measured 0.54 ± 0.1 μ m wide and 3.32 ± 1.46 μ m on an average (Fig. 1). The catalase, niacin and Tween 80 hydrolysis tests were negative at 37 °C. Culture was positive when adding nitrobenzoic acid or up to 2% salt in Middlebrook 7H10 solid medium. Strain M26 hydrolyzed the complex polysaccharides esculin and gelatin. Further enzyme activities included pyrazinamidase, alkaline phosphatase, alpha-glucosidase, beta-glucosidase, urease but it was negative for nitrate reductase, pyrolidonyl arylamidase, beta-glucuronidase, beta-galactosidase, and N-acetyl beta-glucosaminidase. This phenotypic profile was unique, and specifically M26 strain phenotypically differed from M. smegmatis and M. conceptionense, the two closest organisms based on 16 S rRNA and hsp65 gene sequence similarity (Fig. 2). In particular, results of catalase, growth at 5% NaCl, Tween 80 hydrolysis, nitrate reductase, gelatin hydrolysis distinguish strain M26 from M. smegmatis 4 . Also, the absence of scotochromogenic colonies, negative catalase activity and negative Tween hydrolysis at 37 °C all distinguished strain M26 from M. rhodesiae, the closest species by rpob sequencing 4 (Fig. 2). Accordingly, M26 strain exhibited a reproducible matrix-assisted laser desorption ionization-time of flight-mass spectrometry (MALDI-TOF-MS) profile that did not match any of the profiles entered in the Bruker database (version December, 2015). Furthermore, strain M26 was susceptible to rifabutin (minimal inhibitory concentration (MIC) < 0.015 mg/L), rifampicin (MIC, 0.06 mg/L), ethambutol (MIC, 1 mg/L), clarithromycin (MIC, 2 mg/L), azithromycin (MIC, 2 mg/L), linezolid  Further, we found that the genome of strain M26 comprised one 5,732,017-bp long chromosome without plasmid and a 67.54% GC content (Fig. 3). The genome of strain M26, comprising 12 contigs assembled into nine scaffolds, was ordered using Mauve software and Mycobacterium gilvum genome as nearest sequenced complete genome reference. The ordered scaffolds were then assembled into one complete sequence. Analyzing the replication origin in strain M26 genome using Ori-Finder 5,6 predicted two OriC regions splited by the dnaA gene and located at scaffold00008 (508 and 368 bp) (Supplementary File 1). The 368 bp predicted OriC region showed homology sequence with Mycobacterium rhodosiae NBB3 and Mycobacterium avium 104 in DoriC database 7 (Supplementary File 1). The genome of strain M26 encodes 5,500 protein-coding genes and 52 RNAs including two complete rRNA operons and 46 tRNA. A total of 4,086 genes (74.29%) were assigned as putative function (by COGs or by NR blast), whereas 166 genes (3.02%) were identified as ORFans. The remaining genes were annotated as hypothetical proteins (989 genes, 18%). A total of 2,876 proteins were found to be associated with mobilome. Eleven genes encode for bacteriocins and 100 proteins are associated with toxin/antitoxin system. We detected no gene encoding for proteins potentially involved in the resistome in the genome of strain M26. One protein was found to belong to the FxsA bacterial family of cytoplasmic membrane proteins. The molecular function of FxsA is unknown, but its over-expression in Escherichia coli has been shown to lessen the exclusion of phage T7 in those cells with an F plasmid. Analyzing the mobilome of strain M26 predicted one incomplete 23-kbp prophage region related to the Herpesviridae family, which lost the structural and replication protein required for a phage (Fig. 3, Fig. 4). We detected no gene for clustered regularly interspaced short palindromic repeats (CRISPR). We identified a large number of genes assigned to COG functional categories for transport and metabolism of lipids (8.9%), secondary metabolites biosynthesis, transport and catabolism (7.3%), amino acid transport and metabolism (6.3%) and energy production and conversion (6%) in the strain M26 genome  ( Table 1). This genomic content probably reflects the high lipid content in strain M26, as is known for the genus Mycobacterium. Mycobacterial lipids play important roles in virulence and drug susceptibility 8 . The genome of M26 also has the potential to produce secondary metabolites, with 73 genes found to be associated with polyketide synthases (PKSs) and non-ribosomal peptide synthetase (NRPSs). Some of these genes are orthologous to some Mycobacterium tuberculosis genes such as Pks 13 9 . NRPSs encode for peptide synthetase orthologous to those of Myxococcus stipitatus and siderophore biosynthesis protein, similar to those of Streptomyces davawensis and Streptomyces griseus. Siderophores are implicated in the acquisition of ferric iron in most microorganisms.
Phylogenomic analysis showed that the strain M26 genome is closest to Mycobacterium phlei genome, with average nucleotide identity of 76.6% (Fig. 5). The strain M26 genome is smaller than those of M. smegmatis str. mc2 155 and M. rhodesiae NBB3 (6.99 Mb and 6.42 Mb, respectively); its GC% content is higher than those of M. smegmatis str. mc2 155 and M. rhodesiae NBB3 (67.4% and 65.49%, respectively). We further analyzed 32 Mycobacterium species genomes in addition to the M26 strain genome to construct the Mycobacterium pangenome (Table 2). These 33 Mycobacterium species yielded a pangenome of 181,387 genes. The core genome (the set of genes shared by all the studied species) is composed of 1,474 orthologous genes (45.29%) and the accessory genome (the set of genes present in some but not all the species) is composed of 8,406 orthologous genes and 8,963 unique genes (genes unique to individual species) (54.71%). M26 strain encodes 361 unique genes (0.19%). A total of 14,356 genes encoded in other Mycobacterium species, are absent in the M26 genome (7.54%) (Fig. 6A). Functional annotation of M26 strain unique genes revealed that 84.21% have unknown function and 3.88% are implicated in lipid transport and metabolism; while 48.87% of genes absent in the M26 strain genome have unknown function, 9.5% are implicated in transcription and 19% are implicated in transport and metabolism (Fig. 6B). M. phlei is an environmental organism seldom responsible for opportunistic infection [10][11][12][13] . In a few patients, M. phlei was isolated from the respiratory tract, as was the case for strain M26 14,15 . In the situation where a clinical non-tuberculous clinical strain M26 exhibited an uncertain phylogenetic position, whole genome sequencing analysis unambiguously confirmed that it was a representative of a previously undescribed new species most closely related to M. phlei, with a 76.6% Average Nucleotide Identity (ANI).
The unique phenotypic and genetic characteristics of strain M26 support the fact that it is representative of a hitherto undescribed species in the genus Mycobacterium. Strain M26 T (= CSUR P1385 T = DSM 46850 T ) is the type strain of "Mycobacterium massilipolynesiensis" sp. nov. [masilipɔlinezjɛ nsis], whose name is composed of Massilia, the Latin name for Marseilles, France, and Polynesia, referring to French Polynesia, where it was isolated. It is proposed to be a novel species belonging to the Mycobacterium genus, based upon rpoB discrimination within the nontuberculous rapidly-growing mycobacteria group and its genome features. A remarkable genomic Mycobacterium massilipolynesiensis feature is the large number of genes assigned to the transport and metabolism of lipids and secondary metabolite biosynthesis, transport and catabolism. Older suggestion that a new bacterial species be reported on the basis of several isolates 16,17 would have prevented to draw attention of microbiologists to this new species. Alternatively, its free availability in public culture collection will allow microbiologists to report on additional isolates in order to precise the potential of M. massilipolynesiensis as an opportunistic pathogen; on the wave of culturomics hundreds of new bacterial species with only one cultured representative 18,19 .

Methods
To further characterize strain M26, growth was observed after sub-culture on Middlebrook 7H10 solid agar (Becton Dickinson, Le Pont de Claix, France) at 28-42 °C for two weeks. The development of colonies was inspected daily by the naked eye. Optical microscopy was performed using a Leica DM 2500 microscope (Leica, Wetzlar, Germany) after Gram staining and Ziehl-Neelsen staining and pictures were taken using a Nikon Digital Sight DS-U1 Camera System (Nikon, Tokyo, Japan). Transmission electron microscopy (Morgani 268D; Philips, Eindhoven, The Netherlands) was used to determine the size of the microorganisms after negative staining. Carbon source utilization and enzyme activities were determined by inoculating a API ® Coryne strip (bioMérieux) as indicated by the manufacturer. For Tween 80 hydrolysis, two drops of Tween 80 reagent (Sigma-Aldrich, L'Isle-d' Abeau, France) and neutral red as indicator were mixed with 1 mL of sterile distilled water into a screw-cap tube. The tube was inoculated with a loopful of the microorganism and incubated at 37 °C in the dark with the cap tight. The tube was visually inspected daily up for to 10 days of incubation. Niacin production was detected using BBL ™ Taxo ™ TB Niacin test strips (Becton Dickinson) as described by the manufacturer. Salt tolerance of strain M26 was tested by supplementing the Middlebrook 7H10 solid medium with 0-5% NaCl. Matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) protein analysis was carried out as previously described 20 . Drug susceptibility was tested using the broth microdilution method in Middlebrook 7H9 medium (Becton Dickinson) as previously described 21 .
Genome sequencing and assembly. DNA was extracted from strain M26 and cultured for seven days     specific beads according to the Nextera XT protocol (Illumina). Normalized libraries were pooled for sequencing on the MiSeq. The pooled single strand library was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and paired end sequencing with dual index reads were performed in a single 39-hour run in 2 × 251-bp. Total information of 5.9 Gb was obtained from a 643 K/mm 2 cluster density with cluster passing quality control filters of 91.8% (12,365,000 clusters). Within this run, the index representation for M26 was 6.43%. The 729,975 paired-end reads were trimmed and filtered according to the read qualities. Two separated mate-pair libraries were prepared with 1.36 and 1.5 μ g of genomic DNA using the Nextera mate pair Illumina guide. The genomic DNA sample was simultaneously fragmented and tagged with a mate pair junction adapter. The fragmentation profiles were validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. The size of the DNA fragments ranged from 1 to 11 kb with an optimal size at 4.58 and 5.27 kb, respectively. No size selection was performed and 503 and 600 ng respectively of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with an optimal at 871 bp on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA). The libraries profiles were quantified and visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc). The libraries were normalized at 2 nM, pooled, denatured and diluted at 15 pM for sequencing. One M26 library was loaded in two different 2 × 251-bp runs (at 506 K/mm 2 cluster density with cluster passing quality control filters of 97%, the index representation was determined at 5.71%, at 852 K/mm 2 cluster density with cluster passing quality control filters of 94.53%, the index representation was 4.04%). The second library was loaded and the run performed at 587 K/mm 2 cluster density with cluster passing quality control filters of 96.3% the index representation was 7.44%). These three runs in mate-pair strategy generated a total of 2,004,521 pass filter paired-reads for M26. Illumina reads were trimmed using Trimmomatic 22 , then assembled using Spades software 23,24 . Contigs were combined together by SSpace 25 and Opera software v1.2 26 , helped by GapFiller V1.10 27 to reduce the set. Some manual refinements using CLC Genomics v7 software (CLC bio, Aarhus, Denmark) and homemade tools made in Python improved the genome. Finally, the draft genome of strain M26 consists into nine scaffolds. The replication origin in strain M26 genome was predicted using Ori-Finder 5,6 (http://tubic.tju.edu. cn/Ori-Finder/) and homology with other OriC regions was predicted using blast algorithm in DoriC database 7 (http://tubic.tju.edu.cn/doric/). The 16 S rRNA gene sequence of strain M26 was deposited in GenBank under accession number LN909503 and its complete genome under accession number FAXD00000000.
Genome annotation. Non-coding genes and miscellaneous features were predicted using RNAmmer 28 , ARAGORN 29 , Rfam 30 , PFAM 31 , and Infernal 32 . Coding DNA sequences (CDSs) were predicted using Prodigal 33 and functional annotation was achieved using BLAST+ 34 and HMMER3 35 against the UniProtKB database 36 . To refine the search of mobile genetic elements in the genome of strain M26, PHAST software predicted prophage 37 completed by the Actinobacteriophage Database at phageDB.org and the online plasmid search tool: https:// plasmid.med.harvard.edu/PLASMID/Home.xhtml. Phylogenomic tree was constructed using an identity matrix based on Mauve program genome alignment 38 . Average Nucleotide Identity values were calculated between strain M26 and sequenced Mycobacterium species genomes using BLASTN and a home-made software, following the algorithm described by Goris et al. 39 . The genomes of 32 Mycobacterium species were downloaded from Genbank ( Table 2). The ORFeome of these species, in addition to M26 genome, were annotated using Prodigal 33 in order to normalize ORFs predictions. The Pangenomic analyses were computed using OrthoMCL software with > 60% amino acid identiy in all-against-all BLASTP algorithm 40 .