(Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans

Sharma, Anukriti; Gilbert, Jack A.; Lal, Rup

doi:10.1038/srep25527

Download PDF

Article
Open access
Published: 06 May 2016

(Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans

Anukriti Sharma¹,
Jack A. Gilbert^2,3,4 &
Rup Lal¹

Scientific Reports volume 6, Article number: 25527 (2016) Cite this article

3288 Accesses
18 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (F_opt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM’s genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium.

Comparative genomics of a novel clade shed light on the evolution of the genus Erysipelothrix and characterise an emerging species

Article Open access 09 February 2021

Ana Laura Grazziotin, Newton M. Vidal, … Belchiolina Beatriz Fonseca

Genome-wide Analysis of Four Enterobacter cloacae complex type strains: Insights into Virulence and Niche Adaptation

Article Open access 18 May 2020

Areeqa Mustafa, Muhammad Ibrahim, … Zhu Bo

Comparative in silico genome analysis of Clostridium perfringens unravels stable phylogroups with different genome characteristics and pathogenic potential

Article Open access 24 March 2021

Mostafa Y. Abdel-Glil, Prasad Thomas, … Christian Seyboldt

Introduction

With only three published species, C. cellulans¹, C. funkie², C. terreum³ and 31 16S rRNA gene sequences, the genus Cellulosimicrobium remains underrepresented in present NCBI reference databases. First proposed by Schumann et al.¹, this genus has stayed taxonomically confounding with multiple reclassifications from the genera Cellulomonas, Oerskovia, Brevibacterium and Arthrobacter¹. The ecological distribution of Cellulosimicrobium strains has been by and large limited to the mesophilic environments such as soil, marine sponges and clinical materials. However, only two instances of isolation from extreme environments have been reported, including hot springs⁴ and Antarctic snow⁵. Currently, there are four sequenced genomes for the genus Cellulosimicrobium, including Cellulosimicrobium sp. strain MM⁴, C. cellulans LMG 16121 (NZ_CAOI00000000.1), C. cellulans J36 (NZ_JAGJ00000000.1) and C. funkei U11 (NZ_JNBQ00000000.1), which were isolated from biofilms (surface temperature > 57 °C) at the Manikaran hot springs (surface water temperature > 95 °C), aluminium hydroxide gel antacid, compost and agricultural soil, respectively. The genus Cellulosimicrobium is associated with human infections such as meningitis, endocarditis, bacteremia, soft tissue infection, endophthalmitis, septic arthritis and prosthetic joint infections⁶. Here we perform a detailed genome wide investigation using two available C. cellulans genomes i.e. LMG 16121 and J36 along with Cellulosimicrobium sp. strain MM (for its > 97% whole genome proximity to C. cellulans strains) to determine evolutionary processes that have shaped pathogenicity across the species C. cellulans.

Previous genomic studies of emerging pathogenic bacteria have revealed that pathogenicity islands (PAIs) contribute significantly towards organismal evolution by expression of infection-related factors⁷. Using the three draft genome sequences and a metagenomically-derived minimal genome, we have deciphered the pathogenic gene complements of species C. cellulans, which is attributed to 80% incidences of the total human infections for this genus⁶. Highlighting the close proximity of the functionally coupled ORFs, Fic and VbhA, the results provide the evidence for the ‘selfish operon’ theory whereby their juxtaposition on a PAI results from a probable single horizontal gene transfer event⁸. PAIs ORF annotation revealed 49 virulence-encoding genes and suggested that horizontal transfer from non-pathogenic bacteria plays a signficiant role in the evolution of PAIs. Finally, these analyses provide a platform for using these 49 credible virulence markers to diagnose the presence of Cellulosimicrobium pathogens.

Results and Discussion

Phylogenomic analysis

Very few genomes or 16S rRNA sequences exist for Cellulosimicrobium, therefore, phylogenetic reconstruction for the strains was performed using the Family Promicromonosporacae, which significantly increased the number of genomes and 16S rRNA sequences available. 16S rRNA gene (n = 31) based tree topology revealed genus-specific clustering for Promicromonospora (n = 11), Isoptericola (n = 6), Myceligenerans (n = 3), Xylanibacterium (n = 1), Xylanimonas (n = 1), Xylanimicrobium (n = 1) and Cellulosimicrobium (n = 7). Cellulosimicrobium sp. strain MM was clustered with C. cellulans LMG 16121 (Fig. 1a). Pairwise average nucleotide (ANI) values were also calculated generating a score of 98.23% (>95%) between Cellulosimicrobium needs to be italicized. sp. MM and C. cellulans LMG 16121, which suggested that the genome was a sub-species⁹. ANI values for C. cellulans J36, C. funkei U11, I. variabilis 225, P. kroppenstedtii DSM19349, P. sukumoe 327MFSha31 and X. cellulosilytica DSM 15894 with respect to strain MM, were 88.24%, 85.29%, 81.17%, 80.28%, 79.71% and 79.88%, respectively, distinctly demonstrating species/genus level delineation (see Supplementary Table S2). Further, DNA-DNA hybridization (DDH) values were also determined in order to resolve strain MM at species level. % DDH values were 75.6, 55.3, 56, 20.9, 15.7, 16, 17.2 for C. cellulans LMG16121, C. cellulans J36, C. funkei U11, I. variabilis 225, P. kroppenstedtii DSM19349, P. sukumoe 327MFSha31 and X. cellulosilytica DSM 15894, respectively (see Supplementary Table S3). Both ANI and DDH values assigned Cellulosimicrobium sp. strain MM under C. cellulans species with values greater than species delination cut-off i.e. 95% and 70%, respectively for each analysis⁹. Interestingly, C. cellulans J36 demonstrated values less than the species delineation cut-off for both ANI and DDH which here indicates that this strain needs further confirmation using biochemical and physiological tests, to be put under C. cellulans species. Overall, the 16S rRNA analysis identified that C. funkei and C. cellulans form a clade, with C. terreum as an outlier to this clade (Fig. 1a). Multiple strains of C. cellulans were scattered into two sub-clades with C. funkei possibly because of the low number of Cellulosimicrobium strains with an available 16S rRNA sequence (n = 6; Fig. 1a). Whole genome based (n = 8) phylogenetic reconstruction using both 31 single copy genes¹⁰ (Fig. 1b) and 400 conserved bacterial marker genes¹¹ (see Supplementary Fig. S1) revealed similar tree topology whereby strain MM was grouped with C. cellulans LMG16121.

Comparative functional potential of Cellulosimicrobium cellulans strains

Metabolic pathway reconstruction for the three Cellulosimicrobium strains based on KAAS¹², revealed a conserved set of central pathways like glycolysis/gluconeogenesis, TCA cycle, β -alanine metabolism, inositol phosphate metabolism, propanoate metabolism and two-component system (TCS). Cellulosimicrobium sp. strain MM exhibited the pathways for fatty acid metabolism, synthesis and degradation of ketone bodies and D-alanine metabolism, which were not present in the other two neighbors (Fig. 1c). D-alanine is proposed to be involved in biofilm production, adhesion and pathogenesis. In addition, synthesis and degradation of ketone bodies in strain MM appear to play a pivotal role in quorum sensing, which is also associated with biofilm formation¹³. LMG16121 uniquely encoded a Type-II secretion system (Fig. 1c) in contrast to the other strains. Out of 102 pathways reconstructed across all the strains, taurine and hypotaurine metabolism, which are involved in membrane stabilization, glycolysis and glycogenesis, were unique to C. cellulans J36 (Fig. 1c)¹⁴. Further, hierarchical clustering of the three genomes on the bases of top 50 enriched metabolic pathways, revealed closeness of strain MM and LMG16121, as also shown using phylogenetic analysis (Fig. 1b,c). In total, 791 orthologous genes were identified across the three genomes (see Supplementary Table S4), with the majority (n = 598) assigned to translation, metabolism and structure maintenance¹⁵.

Horizontal gene transfer (HGTs) candidates were determined across three Cellulosimicrobium strains. Strain LMG had the greatest number of potentially transferred genes (n = 367), followed by strain MM (n = 348) and J36 (n = 280) (see Supplementary Table S5). It is possible that the high number of horizontal transfer events in strains LMG16121 and MM is indicative of the extreme environment from which they were isolated, i.e. antacid and arsenic contaminated hot spring microbial mat, respectively. Another explaination for occurrence of frequent HGT events overall in all three genomes can be presence of mobile genetic elements and PAIs across C. cellulans genomes (as discussed in the section below), as mobile genetic elements facilitate HGTs¹⁶. Strain MM encoded 2-oxo-acid dehydrogenase, histidine kinase, FNR transcriptional regulator, FAD dependent oxidoreductase, Clp subunits and hemin transport proteins on HGT loci. Whereas LMG16121 HGT loci included cobyrinic acid ac-diamide synthase, TetR family transcription regulator, chitin binding protein, luxR, copper oxidase, bleomycin resistance protein and CheY proteins. The HGT loci for J36 revealed integrase, IS3/IS911 family transposase, daunorubicin resistance protein and CLG chitinase B (see Supplementary Table S5). Additionally, hierarchical clustering was performed on annotated HGT candidates across three genomes i.e. strain MM, LMG16121 and J36 along with heatmap showing relative abundance of HGT genes. Strain MM was clustered with J36 based on the annotation of the HGT genes which is interesting here as strain MM coordinated with LMG16121 in terms of frequency of HGT events (see Supplementary Fig. S2).

Evidence for G+C biased gene conversions across the genus Cellulosimicrobium

Pairwise correlation between F_opt and %G+ C across the Cellulosimicrobium orthologous gene complement (n = 791, see Supplementary Table S4) revealed a weak, but statistically significant (R² < − 0.4, P-value ≤ 3.5e-15) negative correlation (Fig. 2a), which could be interpreted as a result of insufficient codon usage choices due to the 74%G+ C in this genus¹⁷. The weak-negative correlation between gene-based G+ C content and F_opt can be explained by both high genomic G+ C content (average %G+ C = 74) and G+ C biased gene conversion (gBGC) effect (Fig. 2a)^17,18,19. Habitat specific variations were evident (P-value < 0.05) in the genome wide pairwise analysis of codon usage across Cellulosimicrobium ecotypes. Strain MM (a hot spring ecotype) showed a significantly different codon usage profile when compared with the mesophilic LMG16121 (P-value = 5.373e-06) and J36 (P-value = 4.722e-07) (see Supplementary Fig. S3) highlighting the differential impact of local environmental functional constraints.

An average dN/dS of ≤ 0.8 for the 791 orthologous gene pairs revealed that the C. cellulans core genome was evolving under purifying selection (Fig. 2b), which is to be expected as these are all essential genes and the genome is G+ C rich¹⁸. A negative pairwise correlation was observed between F_opt and dN/dS values for all combinations (Fig. 2c), suggesting an association between the selection of protein sequences and the optimization of codon frequencies.

Anaerobic respiration leads to Cellulosimicrobium population segregation for hot spring ecotypes

Metagenomic reads from microbial mat at Manikaran hot spring were recruited on the genome of strain MM, whereby the regions of the MM genome that were underrepresented in the metagenome (Metagenomic Islands; MGIs) highlighted the accessory genome and environment-specific genetic repertoire. MGIs maintained 33 ORFs encoding for multiple tRNA synthetases, such as lysyl, aspartyl, isoleucyl, cysteinyl, etc. (Fig. 3a, see Supplementary Tabe S5). tRNAs were also annotated within MGIs, which supports their horizontal gene transfer potential²⁰. Besides the greater abundance of ABC transporters and DNA associated proteins, genes encoding quorum sensing, including oxygen sensor proteins, were enriched in the MGIs (see Supplementary Table S6 Fig. 3a). The oxygen sensing machinery included proteins such as FAD linked oxidoreductase (n = 16), histidine kinase (n = 18), NADH-ubiquinone and quinone oxidoreductase (n = 12), ubiquinone and menaquinone biosynthesis (n = 4) and luciferase family oxidoreductase (n = 4) (see Supplementary Table S6). Additionally, pyridine nucleotide-disulfide oxidoreductase dimerization protein (n = 4), LuxR (n = 13), arsenic resistance protein transcriptional regulator (ArsR) (n = 10), glycerol dehydrogenase like oxidoreductase (n = 6), succinate dehydrogenase (n = 11) and fumarate hydratase (n = 5) were also annotated on the MGIs (see Supplementary Table S6). The Manikaran hot spring microbial mats are characterized by both oxic and anoxic micro-niches²¹, hence Cellulosimicrobium hot spring ecotypes may use oxygen sensing for niche adaptation. Strain MM maintained the genetic potential for arsenic mediated respiration (Ars operon, 27.11% identity) and detoxification (Arr operon, 27.95%) with respect to E. coli K12. The Ars operon (arsA, arsB, arsC, arsD, arsR) can support respiration across oxic and anoxic conditions, whereas, Arr is only aerobic. Hence, we conclude that respiration might be a splitting factor for the Cellulosimicrobium hot spring ecotypes, given that these microbial mats possess oxic and anoxic micro-niches and occurrence of an(aerobic) respiration related genes on the accessory genome of strain MM²¹.

Identification and characterization of PAIs across Cellulosimicrobium cellulans ecotypes

Putative PAIs were identified across the 3 genomes by analyzing variations in %G+ C, codon usage patterns (Fig. 3b) and ‘true’ PAIs were assigned using gene content annotation, e.g. tRNA and virulence genes (Fig. 3b,c, Table 1). VirulentPred²² supplemented with MP3²³ predicted 80 virulent ORFs for the PAIs (MM = 19; LMG16121 = 36; J36 = 25; Fig. 3, see Supplementary Figs S4 and S5). Among these virulent proteins, 61% (49/80) are well known to cause human infections and have been associated with other human pathogens such as Mycobacterium tuberculosis²⁴, Staphylococcus aureus²⁵ and Pseudomonas aeruginosa²⁶ (Table S1).

Table 1 General features of PAIs determined across three Cellulosimicrobium genomes.

Full size table

The whole genome virulence profile showed that 32% (942/3082) of the total protein sequences of Cellulosimicrobium sp. strain MM were pathogenic with a threshold score above 0.2 (using MP3). Similarly, LMG16121 had 31% (998/3217) and J36 had 28% (784/2770) pathogenic proteins. It has already been reported that pathogens including, Mycobacterium tuberculosis H37Rv, Pseudomonas aeruginosa B136–33, Vibrio cholera IEC224 and Neisseria menigitides 053422, maintain 30.28, 26.2, 18 and 16.3%, respectively. This suggests that Cellulosimicrobium carries a high pathogenic potential. A total of 25–28% of the Cellulosimicrobium pathogenic genes could be annotated using the KEGG GENES database (see Supplementary Table S7), with 155 genes shared across all three genomes (see Supplementary Table S8). These core pathogenic genes included UDP-N-acetylmuramate dehydrogenase (murB), chitinase, penicillin amidase, ABC transporters (n = 41), multidrug resistance proteins (emrB, n = 4) and drug exporter proteins (n = 3) (see Supplementary Table S8). These common pathogenic genes were also assigned COG classes, whereby 20% were unknown function, 18% were involved in carbohydrate metabolism and transport, 9% in defense mechanisms, 5% in cell wall/membrane/envelope biogenesis and 2.5% in cell motility (see Supplementary Fig. S6). Cell motility especially was variably distributed across the 3 strains, being nearly absent in J36 (see Supplementary Fig. S6). Interestingly, all the above pathways have recently been proposed as drug targets against Brucella melitensis 16M²⁷ and therefore may present possible drug targets for treating Cellulosimicrobium infections.

The PAI gene content specific to each strain is outlined in the Supplemental information (see Supplementary Text S1) and Fig. 1 (also see Supplementary Figs S4 and S5). Strain MM had some interesting examples, including locus MM_CPAI1 which maintained mobile genetic elements along side multidrug efflux proteins and inositol synthesis and MM_CPAI2 which maintained genes encoding for anti-toxin VbhA and Fic (filamentation induced by cyclic AMP) proteins (Fig. 3c). Fic are effector proteins which work in a complex with VbhT (toxin) and VbhA (anti-toxin) system²⁸. MM_CPAI2 was marked by the presence of juxtaposing Fic and VbhA proteins, speculating that conjugative systems are transferred together via HGT on the pathogenicity or genomic islands loci following the ‘selfish operon model’ (SOM) (Fig. 3c)⁸. As an alternative to SOM, occurrence of these functionally coupled ORFs can also be justified by the co-regulation model whereby genes are clustered in an operon by mere rearrangements followed by selection for co-regulation⁸. The presence of fluoroquinolone resistance and DEAD-box helicase, pemK, comEC coding regions supports the annotation of these regions as essential for pathogenicity. Similarly, LMG16121 PAIs included genes encoding for UDP-glucose pyrophosphorylase, phage-associated proteins, laminarinase, rhamnolipids and β -glucosidase-related glycosidases (see Supplementary Fig. S4). Finally, J36 PAIs encoded for an abundance of mobile genetic elements, hemin ABC transporter protein, tetracyclin/bleomycin/doxorubicin/methyl viologen resistance proteins and dihydrofolate reductase (see Supplementary Fig. S5).

Molecular clock analysis of pathogenicity island proteins

A Bayesian approach was used to calculate the evolutionary protein clock for three PAI proteins, namely Fic, D-alanyl-D-alanine carboxypeptidase and transposase (for selection criterion please see “Methods”)²⁹. Molecular clock analysis for the Fic protein (dN/dS = 1.128) across the multiple bacterial lineages encoding this protein on PAIs (including Cellulosimicrobium), predicted the most recent common ancestor (MRCA) to have occurred 27 million years ago (mya) (Fig. 4a). The maximum clade credibility tree revealed Clavibacter michiganensis subsp. michiganensis NCPPB382 (r = 0.009 to 4.874) and Rhodococcus equi (r = 0.011 to 5.533) with highest substitution rates (r at 95% Highest Posterior Density (HPD) interval) in comparison to rest of the strains (shown by branch thickness and black color, Fig. 4a). While Clavibacter michiganensis is well established to be evolving at higher rates using evolutionary dating methodology³⁰, Rhodococcus equi is also known as an emerging pathogen³¹. The topology of the tree placed Cellulosimicrobium strains MM and J36 together and exhibited significant homology (Posterior Probability = 0.97) to that of Mycobacterium tuberculosis (Fig. 4a), in which Fic protein has been established to be involved in pathogenicity³².

D-alanyl-D-alanine carboxypeptidase found on a PAI in strain LMG16121, has a widespread phylogenetic distribution and a significant role in pathogenesis³³. The summary tree showed relatively high substitution rates (95% HPD interval) across all the nodes (Fig. 4b). LMG16121 (r = 0.001 to 2.9596) was grouped with Streptomyces turgidiscabies (r = 0.004 to 2.8708) and Clavibacter michiganensis (r = 0.0001 to 2.6795) (Posterior probability = 0.89). When observed closely (branch thickness), these 3 strains were characterized with relatively lower substitution rates as compared to Enterococcus faecium E980 (r = 0.005 to 3.4292), Escherichia coli 536 (r = 0.0003 to 3.3744) and Staphylococcus aureus (r = 0.005 to 3.4292) (Fig. 4b). The MRCA for D-alanyl-D-alanine carboxypeptidase was calculated at ~40 mya. A similar analysis was also performed across transposase protein sequences from multiple bacterial lineages, including all Cellulosimicrobium strains for its frequent presence on PAIs (Fig. 4c). Consistent with previous reports of molecular clock studies of transposable elements³⁴, the MRCA for transposase (dN/dS = 1.06) across the Cellulosimicrobium dataset was calculated at ~250 mya, indicating a relative age >100 mya. As expected the majority (n = 11) of the Cellulosimicrobium transposases were grouped together, highlighting the species specific nature of this gene family (Fig. 4).

HGT drives the evolution of PAIs via hypothetical proteins from avirulent isolates

The hypothetical proteins occurring on the PAIs were annotated using ACLAME³⁵ which were functionally assigned broadly to mobilization proteins, phage-related proteins, recombinase and DNA-related proteins (Table 2). All the annotated phage proteins were mapped to prophages or viral peptides in hosts such as Rhodobacter sphaeroides, Paracoccus denitrificans PD1222 and Burkholderia pseudomallei. However, one specific “phage lambda-related host specificity protein J” found on J36_PAI2 was found with origin in plasmid pMT1 of Yersinia pestis biovar Microtus str. 91001 (Table 2). Strikingly, only 37% (17/46) of the hypothetical proteins annotated across the 13 PAIs in the Cellulosimicrobium pan-genome were predicted to have originated from pathogenic bacteria (Table 2). The remaining 63% were predicted to have originated from non-pathogenic bacterial hosts, which highlights that HGT interactions between pathogens and other bacteria can play a significiant role in the evolution of PAIs. High number of hypothetical proteins has also been found on PAIs of other pathogenic bacteria such as Pseudomonas aeruginosa PAO1¹⁶. PAIs themselves are mobile genetic elements and are well known to have been acquired during speciation of pathogens from non-pathogenic or environmental ancestors. Hence, PAIs harbor both virulent and avirulent ORFs and thus origin of hypothetical proteins can be mapped to avirulent bacterial isolates as well.

Table 2 Annotation of hypothetical proteins deciphered on PAIs across three Cellulosimicrobium genomes using ACLAME database.

Full size table

Metagenomic recruitment of PAIs across Manikaran hot springs

We mapped environmental metagenomic reads from Manikaran hot-spring microbial mats to the 13 PAIs. Only 38% of the PAIs showed significant recruitment, with the majority in strain MM, which was isolated from these microbial mats (Fig. 5). Metagenomic reads mapped to the PAIs were enriched for ORFs encoding for integrase, transposase, secY, inositol-3-phosphate synthase and Clp subunits (Fig. 5). Inositol-3-phosphate synthase is known to be laterally transferred from Archaea to thermophillic bacteria³⁶ and also plays a role in Mycobacterium tuberculosis pathogenicity³⁷. The mapping of metagenomic reads to the secY protein (Fig. 5), suggests that the hot spring community is experiencing an elevated stress response (with respect to both temperature i.e. 57 °C and arsenic) across this environment²¹. Clp subunits (ClpX and ClpP), which are virulence markers, were also enriched in the microbial mats (Fig. 5)³⁸. This suggest that the microbial community is under considerable environmental stress and maintains significant virulence potential.

Conclusions

C. cellulans has been associated with human pathogenicity, which is likely acquired and conferred through mobile genetic elements including pathogenicity islands. Using 3 reference genomes, C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM, we annotated 13 PAIs, encoding 49 potential virulence factors well-established to cause human infections. However, 32% (63/200) of the annotated PAI ORFs encoded for unknown proteins, of which 63% mapped to non-pathogenic bacteria, supporting the role of HGT in the evolution of PAIs¹⁶. Characterized with a high G+ C content (average 74%), genus Cellulosimicrobium was predicted to experience high selection pressure, with an average dN/dS value of 0.8 (< 1) based on 791 orthologous genes. This study provides first insights into the evolution of PAIs across the genus Cellulosimicrobium and reveals 49 virulence genes such as, Fic, VbhA toxin/antitoxin system, FtsK, ClpX, etc. which can be used as diagnostic markers for pathogenic Cellulosimicrobium strains.

Methods

Phylogenomic analysis

Phylogenomic analysis was performed in order to assign phylogenetic status to the uncharacterized (at species level) Cellulosimicrobium strain MM. Given a limited number of genomes sequenced for the genus Cellulosimicrobium (n = 4), we used dataset from the complete family Promicromonosporacae (n = 8). 16S rRNA gene sequence, 31 single copy genes¹⁰ and 400 conserved marker protein sequence based methods¹¹ were used to perform the phylogenetic analysis for the family. The 16S rRNA gene sequences (n = 31) were retrieved from the NCBI database for all seven genera included in family Promicromonosporacae namely: Cellulosimicrobium, Isoptericola, Myceligenerans, Promicromonospora, Xylanibacterium, Xylanimicrobium and Xylanimonas including Cellulomonas aerilata 5420S-23 as the outgroup. Similarly, phylogenomic reconstruction was performed based on amino acid sequences of 400 conserved bacterial markers and 31 single copy genes belonging to eight genome sequences of the family Promicromonosporacae from NCBI database viz. Cellulosimicrobium cellulans J36, Cellulosimicrobium cellulans LMG16121, Cellulosimicrobium sp. strain MM, Cellulosimicrobium funkei U11, Isoptericola variabilis strain 225, Promicromonospora kroppenstedtii DSM 19349 and Promicromonospora sukumoe 327MFSha31 and Xylanimonas cellulosilytica DSM 15894 (http://www.ncbi.nlm.nih.gov/genome/browse/representative/). The concatenated sequences were aligned using CLUSTALW³⁹ and a maximum likelihood⁴⁰ tree was constructed at bootstrap value of 1000, implemented in software MEGA 6.0⁴¹. Further, in order to support 16S rRNA and marker gene based phylogenetic analysis, ANI and DDH values⁹ were also calculated for the 8 whole genomes from Promicromonosporacae. ANI was calculated at minimum alignment length cut-off of 700 bp, minimum identity cut-off of 70% and window size of 1000 bp.

Functional annotations and metagenomic recruitment of Cellulosimicrobium strains

To determine the functional variability across three Cellulosimicrobium strains, i.e. MM, LMG16121 and J36, comparative genomic analysis was performed based on proteins and metabolic pathways. ORFs were predicted from the three genome sequences using FragGeneScan 1.18⁴² followed by gene finding using KAAS (KEGG Automatic Annotation Server) by BLASTp against the KEGG (Kyoto Encyclopedia of Genes and Genomes) GENES database at E-value of 1e-5 and identity cut-off of 70%. Metabolic pathways were reconstructed and filtered using MinPath (Minimal set of Pathways)⁴³ for all the Cellulosimicrobium genomes, i.e. strains MM, LMG16121 and J36. Further, one-way hirerachical clustering was performed on the top 50 variables (i.e. metabolic pathways) and were plotted at relative abundance of 0.8% and standard deviation cut-off of 0.4%. Protein orthologues were determined using pairwise reciprocal smallest distance (RSD) algorithm initially, followed by retrieving the common set in all three genomes⁴⁴ at E-value of 1e-15 and divergence cut-off of 0.5. Whole genome alignments were created in Mauve 2.4.0⁴⁵ and visualized in Circos⁴⁶ at 10 Kb minimum cut-off. In order to elucidate the effect of arsenic contamination across microbial mats, arsenic related gene clusters were reconstructed from strain MM’s genome using PSI BLAST and identity cut-off of 25%.

HGT events were determined for three Cellulosimicrobium genomes based on codon usage deviation using Hidden Markov models (HMM) implemented in program SIGI-HMM⁴⁷. Hierarchical clustering was performed on the annotated HGT loci using Euclidean distance matrix across three Cellulosimicrobium genomes. Putative alien (pA) genes were hence determined on the stretch of genomic islands (GIs) using Viterbi algorithm based on codon usage variations of HGTs from the rest of the genome⁴⁷. Paired-end metagenomic reads (n = 78,891,278) from the biofilms at Manikaran hot springs²¹ were mapped over the genome of strain MM using GASSST (Global Alignment Short Sequence Search Tool)⁴⁸ at sequence similarity cut-off of 85% to allow for Cellulosimicrobium strain MM specific recruitments⁴⁹. Abundance-weighted average coverage analysis using Nonpareil⁵⁰ revealed that the metagenome dataset had a genome coverage of 93% against strain MM. Therefore, we believe that our sequencing depth was good enough to represent the in situ microbial diversity.MGIs were annotated as continuous stretches of gaps in the metagenome recruitment plot by subjecting them to ORF prediction and further BLASTp against the NCBI nr database at E-value of 1e-5, minimum bit-score cut-off of 100 and identity cut-off of 70%⁴⁹.

Determination of pathogenicity islands (PAIs)

Three genomes representing C. cellulans i.e. strains LMG16121, MM and J36 were used for deciphering PAIs following a segregative approach discriminating islands from the core genome on the basis of %G+ C content, dinucleotide frequency and codon usage implemented in program PAI-DA^20,51 with window size of 5 Kb per genome. Both %codon usage bias and %G+ C content were plotted against the individual genome to identify skewed “island-like” regions. Simultaneously, PAI-DB v2.0⁵² was used to validate occurrence of these PAI like regions against databases for both pathogenicity islands (PAIs) and resistance islands (REIs) at E-value of 0.01. Given that highly expressed genes (HEGs) such as genes encoding for ribosomal subunits, transcription and terminator genes and repair genes might also exhibit a skewed codon usage and G+ C content²⁰, we used sequence composition information (manual curation) to avoid the false negative predictions. Further, tRNA scan was used to check whether PAIs are flanked by tRNAs to validate the PAIs⁵³.

Assigning functions to PAIs

Protein coding sequences were obtained for the PAIs of all three genomes by FragGeneScan 1.18⁴². Functions were assigned to the ORFs using BLASTp (E-value = 1e-5) against non-redundant protein database, KEGG⁵⁴ and Gene Ontology (GO) (Gene Ontology Consortium, 2008) database. Protein sequences were also searched against the Pfam library of hidden Markov models (HMMs) using HMMER⁵⁵ for family level prediction. Further, the virulent content of the PAIs was determined using multiple databases such as Virulent Factor Database (VFDB)⁵⁶, VirulentPred²² and MP3²³. MP3 was used at a threshold value of 0.2 and minimum protein length of 30.The hypothetical proteins which were abundant on PAIs were then checked for their origin using the ACLAME database³⁵.

Molecular clock analysis of PAI gene content

After functional assignment of ORFs predicted on PAIs of the three genomes, we selected three proteins, i.e. Fic, D-alanyl-D-alanine carboxypeptidase and transposase from strain MM, LMG16121 and J36, respectively, to infer the divergence time of most recent common ancestor (MRCA) among multiple bacterial lineages harboring these genes on PAIs. In case where multiple strains of one bacterial species were found to be carrying the gene of interest on PAIs, only single strain was taken into account for tree construction. Fic protein from strain MM was selected for its significant association with pathogenesis⁵⁷ as well as its frequent occurrence on PAIs in other pathogenic bacterial lineages (n = 9). PAIs from strain LMG16121 were characterized by the repeated presence (n = 9) of ORFs encoding for proteins involved in cell-wall biogenesis out of which D-alanyl-D-alanine carboxypeptidase was chosen for its association with PAIs of other pathogenic bacteria (n = 5). Transposase was also selected for this analysis, for its repeated presence on PAIs of strain J36 (n = 7) as well as forming a significant part of PAIs in other bacterial genera (n = 19). The protein coding sequences for each of these genes (n = 3) was retrieved from Cellulosimicrobium genomes (n = 3) as well as from other bacteria (NCBI) reported for the presence of these genes on PAIs followed by multiple sequence alignment using CLUSTALW. The alignment was then used in BEAST version 1.8.2²⁹ to perform Bayesian molecular clock analysis using the following parameters: Clock = Random Local Clock, Substitution model = WAG (for amino acids), Site heterogeneity = Gamma, Tree Prior = Coalescent Constant Size, Length of Monte Carlo Markov Chain (MCMC) = 1000000 and Burnin = 100. Random Local Clock and Gamma distribution (relaxed model) was used to account for maximum heterogeneity in terms of substitutions given inter-genera, diverse nature of the dataset. TreeAnnotator (http://beast.bio.ed.ac.uk/TreeAnnotator) was further used to summarize the information from multiple sample trees generated from BEAST into a single target tree, i.e. “Maximum clade credibility” tree with values for the rate of substitution at 95% Highest Posterior Density (HPD) interval, posterior probability, length and height of 95% HPD interval of the node ages. The final annotated tree was then visualized using FigTree version 1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/).

Statistical analysis

For Cellulosimicrobium genomes undertaken in this study, pairwise correlation was computed between gene centric optimal codon frequencies F_opt (a measure of codon usage bias) and %G+ C content using Pearson Product-Moment Correlation Coefficient (R²) at 95% confidence level based on 791 common orthologues found between three genomes¹⁷. F_opt values were calculated using CodonW (version 1.4.4, http://codonw.sourceforge.net) for the 791 orthologous proteins from each of three Cellulosimicrobium genomes. Further, pairwise comparisons between the codon usage bias (measured as F_opt values) were performed between Cellulosimicrobium sp. MM, C. cellulans LMG16121 and C. cellulans J36, using Wilcoxon-Mann-Whitney test with continuity correction to elucidate the significance of variable codon bias patterns across different genomes inhabiting different environments⁵⁸. To further estimate coupling between selection on codon usage and selection of amino acids, Pearson correlation was computed between F_opt and the dN/dS for orthologous gene pairs (n = 791) between all three genome pairs. For this the mean F_opt value of two orthologous genes was taken as the F_opt value for that gene pair. dN/dS values for each orthologous gene pair was calculated by pairwise aligning protein sequences by CLUSTALW followed by codon to codon alignment of corresponding nucleotide sequences using PAL2NAL⁵⁹. Further substitution rates were estimated using yn00 module implemented in PAML⁶⁰. All the above statistical analyses and scatter plotting were performed in R (R Core Team, http://www.R-project.org/).

Accession numbers

Sequence data were obtained for the Cellulosimicrobium genomes from NCBI Genome database: Cellulosimicrobium sp. strain MM [GenBank:NZ_JPQW00000000.1], Cellulosimicrobium cellulans LMG16121 [NZ_CAOI00000000.1], Cellulosimicrobium cellulans J36 [NZ_JAGJ00000000.1], Cellulosimicrobium funkei U11 [NZ_JNBQ00000000.1], Isoptericola variabilis 225 [NC_015588.1], Promicromonospora kroppenstedtii DSM 19349 [NZ_AZXR00000000.1], Promicromonospora sukumoe 327MFSha31 [NZ_ARQM00000000.1] and Xylanimonas cellulosilytica DSM 15894 [NC_013530.1]. 16S rRNA gene sequence data was used from the family Promicromonosporacae under accession numbers: Promicromonospora citrea [X83808.1], Promicromonospora endophytica EUM 273 [GU434253.2], Promicromonospora enterophila [X83807.1], Promicromonospora flava [AM992980.1], Promicromonospora sp. UTMC 792 [JN038073.1], Promicromonospora sukumoe [AB023375.1], Promicromonospora thailandica [AB560974.1], Promicromonospora kroppenstedtii RS16 [AM709608.1], Promicromonospora aerolata [AJ487303.1], Promicromonospora umidemergens [FN293378.1], Promicromonospora vindobonesis [AJ487302.1], Promicromonospora xylanilytica strain YIM61515 [FJ214352.1], Cellulosimicrobium cellulans [X83809.1], Cellulosimicrobium funkei strain W6122 [AY501364.1], Cellulosimicrobium terreum strain DS-61 [EF076760.1], Isoptericola chiayiensis strain 06182M-1 [FJ469988.1], Isoptericola dokdonensis strain DS-3 [DQ387860.1], Isoptericola halotolerans strain YIM 70177 [AY789835.1], Isoptericola hypogeus [AJ854061.1], Isoptericola jiangsuensis strain CLG [EU852101.1], Isoptericola nanjingensis strain H17 [HQ222356.1], Myceligenerans crystallogenes [FR733716.1], Myceligenerans sp. XJEEM 11063 [EU910872.1], Myceligeneris xiligouensis strain XLG9A10.2 [AY354285.1], Xylanibacterium ulmi [AY273185.2], Xylanimicrobium pachnodae VPCX2 [AF105422.1], Xylanimonas cellulosilytica DSM 15894 [CP001821.1] and Cellulomonas aerilata strain 5420S-23 [EU560979.1]. Metagenome sequence (NGS) data were obtained from DDBJ/EMBL/GenBank under the accession number of PRJEB4614 (http://www.ebi.ac.uk/ena/data/view/PRJEB4614).

Additional Information

How to cite this article: Sharma, A. et al. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans. Sci. Rep. 6, 25527; doi: 10.1038/srep25527 (2016).

References

Schumann, P., Weiss, N. & Stackebrandt, E. Reclassification of Cellulomonas cellulans (Stackebrandt and Keddie 1986) as Cellulosimicrobium cellulans gen. nov., comb. nov. Int. J. Syst. Evol. Microbiol. 51, 1007–1010 (2001).
CAS PubMed Google Scholar
Brown, J. M. et al. Characterization of clinical isolates previously identified as Oerskovia turbata: proposal of Cellulosimicrobium funkei sp. nov. and emended description of the genus Cellulosimicrobium. Int. J. Syst. Evol. Microbiol. 56, 801–804 (2006).
CAS PubMed Google Scholar
Yoon, J.-H. et al. Identification of Saccharomonospora strains by the use of genomic DNA fragments and rRNA gene probes. Int. J. Syst. Bacteriol. 46, 502–505 (1996).
CAS Google Scholar
Sharma, A., Hira, P., Shakarad, M. & Lal, R. Draft genome sequence of Cellulosimicrobium sp. strain MM, isolated from arrsenic-rich microbial mats of a Himalayan hot spring. Genome Announc. 2, e01020–14; doi: 10.1128/genomeA.01020-14 (2014).
Article PubMed PubMed Central Google Scholar
Antony, R., Krishnan, K. P., Thomas, S., Abraham, W. P. & Thamban, M. Phenotypic and molecular identification of Cellulosimicrobium cellulans isolated from Antarctic snow. Antonie Van Leeuwenhoek 96, 627–634 (2009).
CAS PubMed Google Scholar
Petkar, H. et al. Cellulosimicrobium funkei: First report of infection in a nonimmunocompromised patient and useful phenotypic tests for differentiation from Cellulosimicrobium cellulans and Cellulosimicrobium terreum. J. Clin. Microbiol. 49, 1175–1178 (2011).
PubMed PubMed Central Google Scholar
Dobrindt, U., Hochhut, B., Hentschel, U. & Hacker, J. Genomic islands in pathogenic and environmental microorganisms. Nat. Rev. Microbiol. 2, 414–424 (2004).
CAS PubMed Google Scholar
Price, M. N., Huang, K. H., Arkin, A. P. & Alm, E. J. Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res. 15, 809–819 (2005).
CAS PubMed PubMed Central Google Scholar
Konstantinidis, K. T. & Tiedje, J. M. Genomic insights that advance the species definition for prokaryotes. Proc. Natl. Acad. Sci. USA 102, 2567–2572 (2005).
ADS CAS PubMed PubMed Central Google Scholar
Wu, M. & Scott, A. J. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034 (2012).
CAS PubMed Google Scholar
Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304; doi: 10.1038/ncomms3304 (2013).
Article ADS CAS PubMed Google Scholar
Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C. & Kanehisa, M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182–185 (2007).
PubMed PubMed Central Google Scholar
Kim, S. et al. RNAseq-based transcriptome analysis of Burkholderia glumae quorum sensing. Plant Pathol. J. 29, 249–259 (2013).
PubMed PubMed Central Google Scholar
Mosier, A. C. et al. Metabolites associated with adaptation of microorganisms to an acidophilic, metal-rich environment identified by stable-isotope-enabled metabolomics. MBio 4, e00484–00412; doi: 10.1128/mBio.00484-12 (2013).
Article PubMed PubMed Central Google Scholar
Jelsbak, L. et al. Identification of metabolic pathways essential for fitness of Salmonella Typhimurium in vivo. PLos One 9, e101869; doi: 10.1371/journal.pone.0101869 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Schmidt, H. & Hensel, M. Pathogenicity islands in bacterial pathogenesis. Clin. Microbiol. Rev. 17, 14–56 (2004).
CAS PubMed PubMed Central Google Scholar
Ran, W., Kristensen, D. M. & Koonin, E. V. Coupling between protein level selection and codon usage optimization in the evolution of bacteria and archaea. MBio 5, e00956–14; doi: 10.1128/mBio.00956-14 (2014).
Article PubMed PubMed Central Google Scholar
Bohlin, J., Brynildsrud, O., Vesth, T., Skjerve, E. & Ussery, D. W. Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes. PLos One 8, e69878; doi: 10.1371/journal.pone.0069878 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Lassalle, F. et al. GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLos Genet. 11, e1004941; doi: 10.1371/journal.pgen.1004941 (2015).
Article CAS PubMed PubMed Central Google Scholar
Che, D., Hasan, M. S. & Chen, B. Identifying Pathogenicity Islands in bacterial pathogenomics using computational approaches. Pathogens 3, 36–56 (2014).
PubMed PubMed Central Google Scholar
Sangwan, N. et al. Arsenic rich Himalayan hot spring metagenomics reveal genetically novel predator-prey genotypes. Environ. Microbiol. Rep. 7, 812–823 (2015).
CAS PubMed Google Scholar
Garg, A. & Gupta, D. VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics 9, 62; doi: 10.1186/1471-2105-9-62 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gupta, A., Kapil, R., Dhakan, D. B. & Sharma, V. K. MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data. PLos One 9, e93907; doi: 10.1371/journal.pone.0093907 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Schmid, M. C. et al. A translocated bacterial protein protects vascular endothelial cells from apoptosis. PLos Pathog. 2, e115; doi: 10.1371/journal.ppat.0020115 (2006).
Article CAS PubMed PubMed Central Google Scholar
Michel, A. et al. Global regulatory impact of ClpP protease of Staphylococcus aureus on regulons involved in virulence, oxidative stress response, autolysis and DNA repair. J. Bacteriol. 188, 5783–5796 (2006).
CAS PubMed PubMed Central Google Scholar
Kovačić, F. et al. Structural and functional characterisation of TesA - a novel lysophospholipase A from Pseudomonas aeruginosa. PLos One 8, e69125; doi: 10.1371/journal.pone.0069125 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Pradeepkiran, J. A., Kumar, K. K., Kumar, Y. N. & Bhaskar, M. Modeling, molecular dynamics and docking assessment of transcription factor rho: a potential drug target in Brucella melitensis 16M. Drug Des. Devel. Ther. 9, 1897–1912 (2015).
CAS PubMed PubMed Central Google Scholar
Engel, P. et al. Adenylylation control by intra- or intermolecular active-site obstruction in Fic proteins. Nature 482, 107–110 (2012).
ADS CAS PubMed Google Scholar
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214; doi: 10.1186/1471-2148-7-214 (2007).
Article CAS PubMed PubMed Central Google Scholar
Bentley, S. D. et al. Genome of the actinomycete plant pathogen Clavibacter michiganensis subsp. sepedonicus suggests recent niche adaptation. J. Bacteriol. 190, 2150–2160 (2008).
CAS PubMed PubMed Central Google Scholar
Weinstock, D. M. & Brown, A. E. Rhodococcus equi: an emerging pathogen. Clin. Infect. Dis. 34, 1379–1385 (2002).
PubMed Google Scholar
Mishra, S. et al. Cloning, expression, purification and biochemical characterisation of the FIC motif containing protein of Mycobacterium tuberculosis. Protein Expr. Purif. 86, 58–67 (2012).
CAS PubMed Google Scholar
Hung, W., Jane, W.-N. & Wong, H. Association of a D-alanyl-D-alanine carboxypeptidase gene with the formation of aberrantly shaped cells during the induction of viable but nonculturable Vibrio parahaemolyticus. Appl. Environ. Microbiol. 79, 7305–7312 (2013).
CAS PubMed PubMed Central Google Scholar
Giordano, J. et al. Evolutionary History of Mammalian Transposons Determined by Genome-Wide Defragmentation. PLos Comput. Biol. 3, e137; doi: 10.1371/journal.pcbi.0030137 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Leplae, R., Hebrant, A., Wodak, S. J. & Toussaint, A. ACLAME: A CLAssification of Mobile genetic Elements. Nucleic Acids Res. 32, D45–D49 (2004).
CAS PubMed PubMed Central Google Scholar
Michell, R. H. Inositol lipids: from an archaeal origin to phosphatidylinositol 3,5-bisphosphate faults in human disease. FEBS J. 280, 6281–6294 (2013).
CAS PubMed Google Scholar
Reynolds, T. B. Strategies for acquiring the phospholipid metabolite inositol in pathogenic bacteria, fungi and protozoa: making it and taking it. Microbiology 155, 1386–1396 (2009).
CAS PubMed PubMed Central Google Scholar
Ollinger, J., O’Malley, T., Kesicki, E. A., Odingo, J. & Parish, T. Validation of the essential ClpP protease in Mycobacterium tuberculosis as a novel drug target. J. Bacteriol. 194, 663–668 (2012).
CAS PubMed PubMed Central Google Scholar
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
CAS PubMed PubMed Central Google Scholar
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
ADS CAS PubMed Google Scholar
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
CAS PubMed PubMed Central Google Scholar
Rho, M., Tang, H. & Ye, Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38, e191; doi: 10.1093/nar/gkq747 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ye, Y. & Doak, T. G. A Parsimony Approach to Biological Pathway Reconstruction/Inference for Genomes and Metagenomes. PLos Comput. Biol. 5, e1000465; doi: 10.1371/journal.pcbi.1000465 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Wall, D. P. & Deluca, T. Ortholog detection using the reciprocal smallest distance algorithm. Methods Mol. Biol. 396, 95–110 (2007).
CAS PubMed Google Scholar
Darling, A. C. E., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1394–1403 (2004).
CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
CAS PubMed PubMed Central Google Scholar
Waack, S. et al. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 7, 142; doi: 10.1186/1471-2105-7-142 (2006).
Article CAS PubMed PubMed Central Google Scholar
Rizk, G. & Lavenier, D. GASSST: global alignment short sequence search tool. Bioinformatics 26, 2534–2540 (2010).
CAS PubMed PubMed Central Google Scholar
Sharma, A. et al. Pan-genome dynamics of Pseudomonas gene complements across hexachlorocyclohexane dumpsite. BMC Genomics 16, 313; doi: 10.1186/s12864-015-1488-2 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rodriguez-R, L. M. & Konstantinidis, K. T. Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics 30, 629–635 (2014).
CAS PubMed Google Scholar
Tu, Q. & Ding, D. Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. FEMS Microbiol. Lett. 221, 269–275 (2003).
CAS PubMed Google Scholar
Yoon, S. H., Park, Y.-K. & Kim, J. F. PAIDB v2.0: exploration and analysis of pathogenicity and resistance islands. Nucleic Acids Res. 43, D624–630 (2015).
CAS PubMed Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
CAS PubMed PubMed Central Google Scholar
Kanehisa, M. The KEGG database. Novartis Found. Symp. 247, 91–101; Discussion 101–103, 119–128, 244–252 (2002).
CAS PubMed Google Scholar
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).
CAS PubMed PubMed Central Google Scholar
Chen, L. et al. VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res. 33, D325–328 (2005).
CAS PubMed Google Scholar
Ma, Z. et al. Insight into the specific virulence related genes and toxin-antitoxin virulent pathogenicity islands in swine streptococcosis pathogen Streptococcus equi ssp. zooepidemicus strain ATCC35246. BMC Genomics 14, 377; doi: 10.1186/1471-2164-14-377 (2013).
Article PubMed Google Scholar
Torre, A. R. D. L., Lin, Y.-C., Peer, Y. V. de & Ingvarsson, P. K. Genome-wide analysis reveals diverged patterns of codon bias, gene expression and rates of sequence evolution in Picea gene families. Genome Biol. Evol. 7, 1002–1015 (2015).
Google Scholar
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–612 (2006).
CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
CAS PubMed Google Scholar

Download references

Acknowledgements

The authors acknowledge funds from Department of Biotechnology (DBT) and National Bureau of Agriculturally Important Microorganisms (NBAIM). AS gratefully acknowledge National Bureau of Agriculturally Important Microorganisms (NBAIM) for providing research fellowship. This paper was partly written during the visit by R.L. under DST-DAAD project to Germany (Helmholtz Zentrum für Umweltzforschung-UFZ, Leipzig).

Author information

Authors and Affiliations

Department of Zoology, University of Delhi, Delhi, India
Anukriti Sharma & Rup Lal
Biosciences Division (BIO), Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, USA
Jack A. Gilbert
Department of Surgery, University of Chicago, 5841 S Maryland Ave, Chicago, IL, USA
Jack A. Gilbert
Marine Biological Laboratory, Woods Hole, MA, USA
Jack A. Gilbert

Authors

Anukriti Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Jack A. Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
Rup Lal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S. and R.L. performed the data analysis and data interpretation. A.S. prepared the figures. R.L., A.S. and J.A.G. wrote the manuscript. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplementary Dataset S4

Supplementary Dataset S5

Supplementary Dataset S6

Supplementary Dataset S7

Supplementary Dataset S8

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Sharma, A., Gilbert, J. & Lal, R. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans. Sci Rep 6, 25527 (2016). https://doi.org/10.1038/srep25527

Download citation

Received: 07 December 2015
Accepted: 14 April 2016
Published: 06 May 2016
DOI: https://doi.org/10.1038/srep25527

This article is cited by

Bioinformatic Approach to Investigate Larvae Gut Microbiota Cellulosimicrobium protaetiae via Whole-Genome Analysis
- Han Le Ho
- Luan Tran-Van
- Thi Dong Phuong Nguyen
Molecular Biotechnology (2024)
Genome sequence analysis and bioactivity profiling of marine-derived actinobacteria, Brevibacterium luteolum, and Cellulosimicrobium funkei
- Faouzia Tanveer
- Muhammad Shehroz
- Azra Yasmin
Archives of Microbiology (2021)
Cellulosimicrobium fucosivorans sp. nov., isolated from San Elijo Lagoon, contains a fucose metabolic pathway linked to carotenoid production
- Fabiola A. Aviles
- John A. Kyndt
Archives of Microbiology (2021)
Taxonomically Characterized and Validated Bacterial Species Based on 16S rRNA Gene Sequences from India During the Last Decade
- Princy Hira
- Priya Singh
- Rup Lal
Indian Journal of Microbiology (2020)
Comparative metagenomic analyses of a high-altitude Himalayan geothermal spring revealed temperature-constrained habitat-specific microbial community and metabolic dynamics
- Nitish Kumar Mahato
- Anukriti Sharma
- Rup Lal
Archives of Microbiology (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results and Discussion

Phylogenomic analysis

Comparative functional potential of Cellulosimicrobium cellulans strains

Evidence for G+C biased gene conversions across the genus Cellulosimicrobium

Anaerobic respiration leads to Cellulosimicrobium population segregation for hot spring ecotypes

Identification and characterization of PAIs across Cellulosimicrobium cellulans ecotypes

Molecular clock analysis of pathogenicity island proteins

HGT drives the evolution of PAIs via hypothetical proteins from avirulent isolates

Metagenomic recruitment of PAIs across Manikaran hot springs

Conclusions

Methods

Phylogenomic analysis

Functional annotations and metagenomic recruitment of Cellulosimicrobium strains

Determination of pathogenicity islands (PAIs)

Assigning functions to PAIs

Molecular clock analysis of PAI gene content

Statistical analysis

Accession numbers

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links