Colibacter massiliensis gen. nov. sp. nov., a novel Gram-stain-positive anaerobic diplococcal bacterium, isolated from the human left colon

The gut microbiota is considered to play a key role in human health. As a consequence, deciphering its microbial diversity is mandatory. A polyphasic taxonogenomic strategy based on the combination of phenotypic and genomic analyses was used to characterize a new bacterium, strain Marseille-P2911. This strain was isolated from a left colon sample of a 60-year old man who underwent a colonoscopy for an etiological investigation of iron-deficiency anemia in Marseille, France. On the basis of 16S rRNA sequence comparison, the closest phylogenetic neighbor was Anaeroglobus geminatus (94.59% 16S rRNA gene sequence similarity) within the family Veillonellaceae. Cells were anaerobic, Gram-stain-positive, non-spore-forming, catalase/oxidase negative cocci grouped in pairs. The bacterium was able to grow at 37 °C after 2 days of incubation. Strain Marseille-P2911 exhibited a genome size of 1,715,864-bp with a 50.2% G + C content, and digital DNA-DNA hybridization (dDDH) and OrthoANI values with A. geminatus of only 19.1 ± 4.5% and 74.42%, respectively. The latter value being lower than the threshold for genus delineation (80.5%), we propose the creation of the new genus Colibacter gen. nov., with strain Marseille-P2911T (=DSM 103304 = CSUR P2911) being the type strain of the new species Colibacter massiliensis gen. nov., sp. nov.

With 10 11 to 10 12 cells per gram 1 , the commensal microbiota that resides in the colon has become a focus of interest that is attracting the attention of scientists. Although this complex flora plays a role in homeostasis, it has been demonstrated to participate in triggering intestinal and extra-intestinal diseases in sensitive people 2 . The microbial diversity of the colon microbiota may result from a co-evolution with its host 3 but may be affected by environmental conditions 4 . In order to identify all bacteria (including uncultured and fastidious) present in the colon, we use the culturomics strategy based on diversified culture conditions (temperature, media, and atmosphere), and rapid bacterial identification using matrix-assisted desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) [5][6][7] . For putative new taxa, we use the taxonogenomic method that combines phenotypic characteristics and whole genome sequencing analysis to describe new bacterial species [8][9][10] . In 2016, we isolated the new bacterial strain Marseille-P2911 (=CSUR P2911 = DSM 103304), from a left colon sample of a 60-year-old patient who underwent a colonoscopy for the etiological investigation of iron-deficiency anemia in Marseille, France 11 . This bacterium was identified by matrix-assisted desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) using a Microflex spectrometer 12 (Bruker Daltonics, Bremen, Germany). The strain was predicted to be affiliated with members of the family of Veillonellaceae but distinct form species with a validly published name. In the present study, we aimed at comparing strain Marseille-P2911 to its closely related phylogenetic neighbors, and at proposing the creation of the new genus Colibacter massiliensis gen. nov., sp. nov. 1 Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditeranéennes (VITROME), Marseille, France. 2

Results
Strain identification and classification. Strain Marseille-P2911 was isolated from the left colon liquid sample of a 60-year-old man who underwent a colonoscopy for the etiological investigation of an iron-deficiency anemia. The patient provided signed informed consent, and the study was approved by the ethics committee of the Institut Fédératif de Recherche IFR48 under number 2016-010. Strain Marseille-P2911 could not be identified by our systematic MALDI-TOF MS screening as the score was lower than 1.8, suggesting that the corresponding species was not in the database (Fig. S1). Moreover, strain Marseille-P2911 exhibited a 94.59% 16S rRNA sequence similarity with Anaeroglobus geminatus strain AIP 313.00 (GenBank accession no. AF338413), the phylogenetically-closest species with standing in nomenclature (Fig. 1). As this value is lower than the 95% threshold proposed by Stackebrandt and Ebers for defining a new genus, strain Marseille-P2911 was considered as a reprsentative of a putatively new genus within the family Veillonellaceae in the phylum Firmicutes. phenotypic characteristics. Growth was observed on 5% sheep blood-enriched Columbia agar (BioMérieux) at 37 °C after 2 days of incubation. Colonies from strain Marseille-P2911 showed neither pigmentation nor haemolysis. They were circular with a diameter of 0.1 mm, and transparent. Bacterial cells were Gram-positive, non-motile diplococci with a diameter of 0.4 to 0.6 µm, as determined by transmission electron microscopy (Fig. 2). Strain Marseille-P2911 grew only in anaerobic conditions. The sporulation test (20 minutes at 80 °C) was negative. In addition, this bacterium had neither oxidase nor catalase activities. Biochemical Genome sequencing information and genome properties. The genome size of strain Marseille-P2911 was 1,715,864 bp long with a 50.2% G + C content. It was assembled into 2 scaffolds. Of the 1,655 predicted genes, 1,567 were protein-coding genes and 62 were RNAs (one complete rRNA operon, three additionnal 5S rRNAs and 49 tRNA genes). A total of 1,350 genes (81.57%) were assigned a putative function (by COGs) and 305 genes (18.43%) were annotated as hypothetical proteins. The genome properties and distribution of genes into COGs functional categories are detailed in Table S2 and Fig. 3. Genes putatively gained by hypothetical lateral gene transfer (LGT) were classified according to the bacterial families of origin (Fig. 4). Although we cannot rule out the possibility that some of the transfers may be from as yet unidentified taxa, most hypothetical lateral gene transfer (LGT)-acquired genes were obtained from members of the Veillonellaceae (90.6%) and Selenomonadaceae (1.6%) families. comparison with closely related bacterial strains.  (Table S3). The G + C content of strain Marseille-P2911 (50.2 mol %) was equal to that of M. massiliensis, but greater than those of all compared species (Table S3) except Megasphaera elsdenii (52.8%). The gene content of strain Marseille-P2911 (1,655) was similar to that of D. micraerophilus but smaller than those of other compared genomes (Table S3). The distribution of genes into COG categories was similar in all nine compared genomes (Fig. 5 (Table S4). These values are lower than the 70% threshold used for delineating prokaryotic species, thus confirming that this strain represents a new species. Finally, strain Marseille-P2911 exhibited average nucleotide identity (ANI) values ranging from 63.37% with D. micraerophilus to 74.42% with www.nature.com/scientificreports www.nature.com/scientificreports/ A. geminatus. An ANI value lower than 80.5% suggesting that two strains belong to distinct genera, we considered that strain Marseille-P2911 was representative of a new genus (Fig. 6). Consequently, based on the presented phenotypic and genomic data, we propose the creation of the new genus Colibacter gen. nov., with strain Marseille-P2911 T being the type strain of the novel species Colibacter massiliensis gen. nov., sp. nov.

Discussion
The gastrointestinal tract harbors a complex microbial microflora whose dynamic composition is important for health 14 . Here we aimed at describing a new bacterial species to enrich the knowledge on the human microbiome, using the culturomics and taxonogenomic strategies 7,10,15 .
The phylogenetic and phenotypic analysis of the new strain Marseille-P2911 revealed several distinct traits when compared to other members of the family Veillonellaceae 16 , suggesting that it could be classified in a new species of a new genus. The family Veillonellaceae is currently made of six genera of Gram-negative bacteria, including Veillonella (twelve species), Megasphaera and Dialister (five species each), Allisonella, Anaeroglobus and Negativicoccus (one species each) 17 .
As strain Marseille-P2911, many members of the family Veillonellaceae were detected in humans. For example, Allisonella histaminiformans was previously isolated from the vagina and A. geminatus was isolated from the gastrointestinal tract, stool and skin 17 . Moreover, these two species were demonstrated to be pathogenic in several diseases such as community-acquired pneumonia and advanced caries 17 .
The genomic content of strain Marseille-P2911 (dDDH, orthoANI, and AGIOS values) comforted its new species status. We observed a significant similarity to genes from the family Veillonellaceae (90.6%) (Fig. 4). A small rate of hypothetical lateral gene transfer (9.4%) from other bacterial families was observed, notably several toxin/antitoxin system-related genes putatively acquired from the families Eubacteriaceae, Lachnospiraceae, Lactobacillaceae, Selenomonadaceae and Streptococcaceae). So, we formally propose the creation of the new genus and species Colibacter massiliensis gen. nov., sp. nov., within the family Veillonellaceae. The type strain, Marseille-P2911 T , was deposited in the DSMZ and CSUR collections under accession numbers DSM 103304 and CSUR P2911, respectively. The 16S rRNA and genome sequences are available in GenBank under accession numbers LT576403 and FMIY00000000, respectively. The Digital Protologue TaxoNumbers (http://imedea.uibcsic.es/ dprotologue/index.php) of C. massiliensis gen. nov., sp. nov. is GA00103.
The type strain, Marseille-P2911 T , isolated from the left colon of a patient, was deposited in the DSMZ and CSUR collections under accession numbers DSM 103304 and CSUR P2911, respectively. The 16S rRNA and genome sequences are available in GenBank under accession numbers LT576403 and FMIY00000000, respectively.

Materials and Methods
Strain isolation and phenotypic tests. The left colon liquid sample of a 60-year-old-man, who underwent a colonoscopy for an aetiological investigation of iron-deficiency anemia, was initially collected in La Timone Hospital in Marseille, France. From this sample, strain Marseille-P2911 was isolated after 3 days of preincubation in an anaerobic blood culture bottle (VersaTREK REDOX 2, Thermo Scientific, Villebon sur Yvette, France) supplemented with 5 mL of 0.2 µm-filtered rumen. This enriched liquid medium was then inoculated on 5% sheep blood-enriched Columbia agar (BioMérieux, Marcy l'Etoile, France) followed by an incubation at 37 °C in anaerobic atmosphere (AnaeroGEN Compact, Oxoid, Thermo Scientific, Dardilly, France). MALDI-TOF mass spectrometry (MS) protein analysis was carried out using a Microflex spectrometer 18 (Bruker Daltonics,   Table 2 www.nature.com/scientificreports www.nature.com/scientificreports/ Bremen, Germany). Strain Marseille-P2911 spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analysed by standard pattern matching (with default parameter settings). Interpretation of the scores was performed as previously described 19 .
Moreover, the 16S rRNA gene was sequenced using the fD1-rP2 primer pairs as previously described 20 , using a 3130-XL sequencer (Applied Biosciences, Saint Aubin, France). A phylogenetic tree was obtained using the Maximum Likelihood method and Kimura 2-parameter within the MEGA 7 software 21 . Several growth temperatures (20,28,37,45 and 55 °C) on 5% sheep blood-enriched Columbia agar medium (BioMérieux, Marcy l'Etoile, France) were tested. Growth of strain Marseille-P2911 was tested under different atmospheres (anaerobic, aerobic and microaerophilic (CampyGEN, Oxoid). API ZYM and API 50CH strips (BioMérieux) were used to evaluate the biochemical properties of the strain test according to the manufacturer's instructions. All API experiments were performed under anaerobic conditions. Using API 50CH, API 20E and API ZYM strips, strain Marseille-P2911 was incubated for 48, 24 and 4 hours, respectively. The standard disc method was applied for antimicrobial susceptibility testing according to the French Microbiology Society 13 . Finally, cellular fatty acid methyl ester (FAME) analysis was performed by GC/MS. Two samples were prepared with approximately 18 mg of bacterial biomass per tube harvested from several culture plates. Briefly, fatty acid methyl esters were separated using an Elite 5-MS column and monitored by mass spectrometry (Clarus 500 -SQ 8S, Perkin Elmer, Courtaboeuf, France) 22 . GC/MS analyses were carried out as described before 23 . Spectral database search was performed using MS Search 2.0 operated with the Standard Reference Database 1 A (NIST, Gaithersburg, USA) and the FAMEs mass spectral database (Wiley, Chichester, UK). For transmission electronic microscopy, detection formvar-coated grids were dropped onto a 40 μL bacterial suspension before incubation at 37 °C for 30 minutes. Then, the grids were incubated on 1% ammonium molybdate for 10 seconds, dried on blotting paper and finally observed using a Tecnai G20 transmission electron microscope (FEI, Limeil-Brevannes, France) at an operating voltage of 60 Kv. All methods were performed in accordance with the relevant guidelines and regulations. extraction and genome sequencing. After a pretreatement by lysozyme incubation at 37 °C for 2 hours, DNA of strain Marseille-P2911 was extracted using an EZ1 biorobot (Qiagen) with the EZ1 DNA Tissue kit. The elution volume was 50 µL. Genomic DNA (gDNA) was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) at 92.6 ng/µl. Genomic DNA was sequenced on a MiSeq sequencer (Illumina Inc, San Diego, CA, USA) with the Mate Pair strategy. The gDNA was barcoded in order to be mixed with 11 others projects with the Nextera Mate Pair sample prep kit (Illumina). The Mate Pair library was prepared with 1.5 µg of genomic DNA using the Nextera Mate Pair Illumina guide. The gDNA sample was simultaneously fragmented and tagged with a Mate Pair junction adapter. The pattern of the fragmentation was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. DNA www.nature.com/scientificreports www.nature.com/scientificreports/ fragments ranged in size from 1.5 kb up to 11 kb with an optimal size at 8.4 kb. No size selection was performed and 600 ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with an optimal at 706 bp on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA) and the final concentration library was measured at 13.256 nmol/l. The libraries were normalized at 2 nM after a denaturation step and dilution at 15 pM, loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and sequencing run were performed in a single 39-hours run in a 2 × 251-bp format. Total information of 8.3 gb was obtained from a 910 K/mm 2 cluster density with a cluster passing quality control filters of 92.8% (16,316,000 passing filter paired reads). Within this run, the index representation for strain Marseille-P2911 was determined to 8.4%. The 688,244 paired reads were quality-checked using FastQC, trimmed using Trimmomatic version 0.36.6 24 and assembled in two scaffolds using the SPAdes version 3.5.0 software 25 . The option "careful" was used in order to reduce the number of mismatches and short indels. Default parameters were applied for k values, i.e., k-mer values of 127, 99, 77, 55, 33, and 21. SSPACE 26 and GapFiller 27 were used to combine contigs, using default parameters 28,29 . Genome annotation and genome comparison. The genome was annotated as previously described 19 .
In addition, we used the Genome-to Genome Distance Calculator (GGDC) web server available at (http://ggdc. dsmz.de) to estimate the overall similarity among the compared genomes and to replace the wet-lab DNA-DNA hybridization (DDH) by a digital DDH (dDDH) 30,31 . Average nucleotide identity analysis was also estimated using the orthoANI 32 and MAGI 33 softwares. Antibiotic resistance genes (ARG) were searched using the ARG-ANNOT database and Bio-Edit interface 34 . Assembled sequences were searched against the ARG-ANNOT database under moderately stringent conditions (e-value of 10 −5 ) for the in silico ARG prediction. These putative ARGs were further confirmed through a BLAST search against non-redundant (nr) database in GenBank.
The presence of pathogenesis-related proteins was investigated using PathogeneFinder 1.1 35 . Finally, predicted protein sequences of strain Marseille-P2911 were used as queries to search the NCBI GenBank non-redundant protein sequence database. These results were formatted to generate network of protein sequences using the Cytoscape tool 36 .