Genome-wide Analysis of Four Enterobacter cloacae complex type strains: Insights into Virulence and Niche Adaptation

Enterobacter cloacae complex (Ecc) species are widely distributed opportunistic pathogens mainly associated with humans and plants. In this study, the genomes of clinical isolates including E. hormaechei, E. kobei, and E. ludwigii and non-clinical isolate including E. nimipressuralis were analysed in combination with the genome of E. asburiae by using the reference strain E. cloacae subsp. cloacae ATCC 13047; the Ecc strains were tested on artificial sputum media (ASM), which mimics the host, to evaluate T6SS genes as a case study. All five Ecc strains were sequenced in our lab. Comparative genome analysis of the Ecc strains revealed that genes associated with the survival of Ecc strains, including genes of metal-requiring proteins, defence-associated genes and genes associated with general physiology, were highly conserved in the genomes. However, the genes involved in virulence and drug resistance, specifically those involved in bacterial secretion, host determination and colonization of different strains, were present in different genomic regions. For example, T6SS accessory and core components, T4SS, and multidrug resistance genes/efflux system genes seemed vital for the survival of Ecc strains in various environmental niches, such as humans and plants. Moreover, the ASM host-mimicking growth medium revealed significantly high expression of T6SS genes, including PrpC, which is a regulatory gene of the T6SS, in all tested Ecc strains compared to the control medium. The variations in T6SS gene expression in ASM vs. control showed that the ASM system represents a simple, reproducible and economical alternative to animal models for studies such as those aimed at understanding the divergence of Ecc populations. In summary, genome sequencing of clinical and environmental Ecc genomes will assist in understanding the epidemiology of Ecc strains, including the isolation, virulence characteristics, prevention and treatment of infectious disease caused by these broad-host-range niche-associated species.


Results and Discussion
features of Enterobacter cloacae complex genomes. Genome sequencing of E. asburiae revealed a single circular chromosomal DNA of 4.81 Mbp in addition to 4 plasmids. Its genome encodes 4,827 genes, 86 tRNAs and 25 rRNAs, with a GC content of 55.47% 8 . The genome sequence of E. hormaechei consists of 3 scaffolds of 4.89 Mbp along with 1 plasmid. Its genome encodes 4,797 genes, 95 tRNAs and 28 rRNAs, with a GC content of 55.1% (Table 1). The draft genome sequence of E. kobei possesses a single chromosomal DNA and a plasmid collectively of 4.75 Mbp. This genome encodes 4,626 genes, 89 tRNAs, and 24 rRNAs, with a GC content of 55.43%. The draft genome sequence of E. ludwigii consists of a single chromosomal DNA and a plasmid collectively of 4.95 Mbp, encoding 4,795 genes, 88 tRNAs and 29 rRNAs, with a GC content of 55.43% (Table 1). The draft genome sequence of E. nimipressuralis was 4.98 Mbp, comprising 18 scaffolds of DNA and encoding 4,875 genes, 93 tRNAs and 30 rRNAs, with a GC content of 55.1%.
The five strains were sequenced by using PacBio technology, and the longest possible sequence of excellent quality and accuracy was obtained. It was further reasoned that PacBio, with a high sequencing coverage, could produce a long sequence read with an enhanced quality. To explore the Ecc genomes and their genetic components, comparative genomics was performed as a fundamental strategy. The identification of genes and other functional elements, such as regulatory regions, and their influence on the survival of the organism depends on revealing the natural selection signature of the genome 9 . We anticipate that the depth of these genomes will be a key factor in exploring the epidemiology of Ecc strains.
The draft genome sequence of E. cloacae subsp. cloacae ATCC 13047, which was used as the reference genome, had a slightly lower GC content (54.8%), while the number of coding sequences (5,581) of this genome was higher than those of the other sequenced genomes. Ecc species are abundant in nature and are notorious for causing various diseases. However, many biochemical and molecular studies have shown heterogeneity in Ecc species 9 . This has revealed variation in the genetic physiology of this pathogen, including the GC content, number of genes and RNAs. Genomes with low GC contents indicated recurrent horizontal gene transfer in the Ecc strains. The variation in the genomes may be due to the origin of species from various niches and the role of genes that are specific to different niches. comparative genome analysis. For comparative genome-wide analysis, the sequence-based comparative analysis results were obtained for all species with the RAST server and CGview genome-wide analysis tools. The circular genomes of the Ecc strains, including E. cloacae subsp. cloacae ATCC 13047 and E. cowanii as reference strains, are depicted in Fig. 1. The circular genomes of the Ecc strains exhibited the coding sequence (CDS), ORF, GC content, number of RNAs and GC skew (Fig. 1), where the inner six rings represented the CDS and the number of RNAs (tRNA, rRNA and sRNA) on the forward and reverse strands. The visualization of the entire www.nature.com/scientificreports www.nature.com/scientificreports/ genome showed high similarity levels (>90%) and depicted numerous sites of latent deletion/insertion events in the genome of the Ecc strains (Fig. 1). It was further revealed that all Ecc strains, including E. cloacae subsp. cloacae, have 80-90% similarity in their core genome (Fig. 1). Various genes were classified as hypothetical, which is consistent with previous studies on genome-wide analysis of other species. In genomics, the comparisons and characterizations of functional elements depend on the identification of the natural selection footprints.
Genome wide gene prediction and analysis of virulent genes. Comparative genome analysis helps to investigate natural evolutionary processes such as genetic drifts, virulence factors, drug resistance genes and mutations 9,10 . Moreover, antimicrobial resistance has evidently damaging effects in infectious diseases and is the main threat of superbugs around the globe 9,10 . By using reference sequences as well as virulence genes annotated with the VFDB filtered database, we analysed 5 Ecc genomes. A large number of functional genes contributing to drug resistance, stress resistance, general physiology and virulence of the bacteria were identified in all Ecc genomes studied (Tables S1, S2 and S3). www.nature.com/scientificreports www.nature.com/scientificreports/ Several genes responsible for antibiotic resistance were also retrieved by computational analysis (Table S2) in Ecc, including β-lactamase-encoding, ABC transporter and efflux system genes (Table S2). In the past two decades, bacteria producing extended-spectrum β-lactamase (ESBL) have been identified mostly from humans and animals samples 11 . ESBLs confer resistance to extended-spe ctrum cephalosporins (ESCs, e.g., ceftriaxone, cefpodoxime, ceftiofur, ceftazidime) and monobactams and often express resistance to non-β-lactam antimicrobials, leaving limited therapeutic options. Moreover, recent studies have shown that in addition to direct killing effects, exposure to antibiotics at subinhibitory concentrations could cause an increase or reduction in the virulence of human and animal bacterial pathogens. The prediction of resistance genes in this pool revealed that Ecc strains are potentially threatening 11 .
The bacterial response is activated by potentially damaging conditions that result in alterations in gene activity and gene expression. Bacterial growth and survival are also dependent on changes in osmotic pressure 12 . We conducted a genome-wide survey and found that several key elements were associated with the adaptation of Ecc strains to various niches. Ecc bacteria contain multiple osmoregulatory systems (Table S3) to ensure optimal metabolic fitness 12 . proP is a proton symporter that senses and responds to osmotic shifts by bringing osmolytes such as glycine betaine, proline, pipecolic acid, stachydrine, taurine and ectoine into the cell. ProP is both an osmoregulator and an osmosensor that plays a role in the osmoregulatory response of bacteria 12 .
Pathogenicity-associated genes, such as T1SS, T2SS, T3SS, T4SS, and T6SS, which encode secretion system proteins and play essential roles in causing diseases not only in plants but also in animals, were identified 13,14 .
Bacterial genomes have evolved a number of complicated nanomachines, such as T6SS, that aid in the transfer of a variety of virulence determinants across the bacterial cell membrane. In recent years, substantial development has been made to understand gram-negative bacterial secretion systems (I-VI) at both the molecular and structural levels 15,16 . We identified T1SS, T2SS, T4SS and T6SS in five Ecc strains. Notably, the T3SS, which is a key component of the secretion machinery, was not encoded by any strain in this study. The T3SS in the bacterial structure helps gram-negative pathogens invade the host with an exclusive mechanism of virulence and enables them to bypass the extracellular barriers [16][17][18] .
Gram-negative bacteria secrete type 1 secretion systems via a translocator comprising three proteins: (1) 2 cytoplasmic membrane proteins, i.e., an adaptor or a so-called membrane fusion protein, (2) an ATP-binding cassette (ABC) and (3) a specific outer membrane protein (OMP) 19 . We identified a set of type 1 secretion system-associated genes, including hlyB, tolC, macA and macB (Table S1), by BLAST analysis on the RAST server against each genome by utilizing virulence factor databases such as VFDB, UniProt and the literature. T1SSs were also reported to be activated by C-terminal secretion signals by initial interactions with ABC exporter proteins, resulting in secretion complex assembly by triggering specific protein interactions among ABC exporter, OMP, and MFP type 1 secretion system proteins. ABC exporter and T1SS proteins provide substrate specificity for system activation, leading to the secretion of HlyA, proteases and C-terminal secretion signals upon activation 16 .
The T2SS, which is encoded by 12 core genes, including T2SD, T2SE, and T2SG, is a major pseudophilin along with some minor pseudophilin genes, such as T2S H, J, I, and K, which play a significant role in ATPase attachment to the cell inner membrane. Additionally, T2S apparatus comparing F, M, and L, the inner (trans) membrane-associated proteins, and T2S O, the pre-pseudopilin peptidase/methyltransferase that also acts on type 4 prepilin, were identified. The T2S C protein of the T2SS is responsible for specific substrate identification and interaction for secretion 20 . Apart from these 12 core genes, other genes, such as T2S A, T2S S, T2S B and T2S N, were also found in some Ecc strains (Table S1). Based on the genome annotation carried out in this study, it was noted that the T2SS and its components were absent in E. kobei, E. hormaechei, and E. ludwigii, while E. nimipressuralis and E. asburiae, along with the reference strain, encode entire T2SS components with the exception of the gspE genes. The absence of T2SS genes indicates that these strains may not be pathogenic or may use other bacterial systems to invade the host 20 . In addition, these strains were isolated from clinical specimens, but there is a lack of experimental data to link the absence of T2SS with the pathogenicity of these strains.
Many gram-negative bacterial pathogens, plants and animal symbionts possess type III secretion systems comprising 20 different cytoplasmic, integral and outer membrane soluble proteins 21,22 . Ecc genomes lack any such genes, yet Ecc species possess all the genetic material needed for the cellular pathway controlling flagellum assembly. The Ecc strains reported in our study depend on flagellum-like proteins instead of all components of the T3SS.
The type IV secretion system (T4SS) is evolutionarily related to the conjugation system of bacteria 23 . T4SSs are categorized into type 4 A (T4ASS) or type 4B (T4BSS), depending on the structural components resembling either the Agrobacterium tumefaciens VirB/D4 complex or the IncI plasmid conjugation system, respectively 23 . T4SS A and B both directly transfer effector proteins to the host cytosol through a central pore. In this study, two sets of T4ASSs were found in the Ecc genomes (Table S1), including all of the genes of the virB/D4 complex but not virB7 24 . The low GC content and the presence of T4SS may be linked with horizontal gene transfer (HGT), which helps bacteria adapt to environmental changes and acquire antibiotic resistance. The Ecc strains studied here may be able to arbitrate HGT, which might facilitate the adaptation of Ecc strains to various environmental changes leading to antibiotic resistance.
The type VI secretion system (T6SS) transfers effector proteins into host cells (either eukaryotes or prokaryotes) in a contact-dependent manner. The major targets of T6SS antibacterial effectors include the cell wall, membrane or nucleic acid 25 . The T6SS comprises at least 14 subunits forming the core machinery apparatus, and these subunits are encoded by genes in the imp operon. Overall, all imps, as well as hcp components, were present in the Ecc 5 type strains studied here. icmf, impaK, impM and lipoprotein/VasD were exceptionally absent in E. kobei and in E. ludwigii (Table S1). It is also interesting to note that E. asburiae, E. nimipressuralis and E. hormaechei encode two loci of T6SS clusters. The genetic organization of each T6SS of the Ecc strains is shown in Fig. 2.
The functions of T6SS genes are diverse, including virulence, microbe-microbe interactions, and host-pathogen interactions. Moreover, the ATPase activity of the intracellular multiplication protein F family (2020) 10:8150 | https://doi.org/10.1038/s41598-020-65001-4 www.nature.com/scientificreports www.nature.com/scientificreports/ protein (icmF) energizes the T6SS system. It seems that the absence of this protein may have effects on the entire T6SS system in E. kobei and in E. ludwigii, or these species may adopt other pathways. It will be interesting to explore the function of icmF in these species by generating icmF gene mutants. Moreover, E. asburiae, E. cancerogenus and E. hormaechei were exceptions because these strains contained two T6SS loci in the genomes and lacked some core and conserved components. This feature of T6SSs in Ecc has been reported previously in the genomes of Yersinia pseudotuberculosis and Burkholderia pseudomallei, which harbour six and four T6SS loci, respectively [25][26][27] . Similar to B. pseudomallei and Y. pseudotuberculosis, E. asburiae, E. cancerogenus, and E. hormaechei encode two loci that may be differentially regulated and possess different functions.
The regulation of the T6SS is controlled at multiple levels. The T6SS gene cluster subset encodes orthologues of the forkhead-associated (FHA), cognate phosphatase pppA, and serine/threonine kinase ppkA domain-containing proteins. This indicates an association of a threonine phosphorylation (TPP) regulatory pathway in bacteria 25 . These three major components of regulatory pathways are encoded within the imp operon. In all five Ecc genomes, at least one regulatory pathway was identified, including protein serine/threonine phosphatase prpC (Table S1).
Apart from the 14 core genes, many additional genes, such as 9 genes of the hcp operon, e.g., clpB and vgrG, are also required for system assembly and function 26 . Valine-glycine repeat protein G (vgrG) and hemolysin-coregulated protein (hcp) are known as T6SS hallmarks. Hcp is also considered to be an important chaperone for T6SS effectors, preventing their degradation, and hcp is secreted together with these effectors. In addition to its structural function in the T6SS, hcp is thought to be a secretory protein with multiple functions involved in bacterial invasion, cell adherence, colonization, and competition, as well as intracellular reproduction, in different bacteria 27 . In our study, vgrG/hcp genes were found as an orphan component. The presence of hcp and vgrG proteins along with the previously identified T6SS locus suggests that Ecc encodes a functional T6SS.
Notably, E. nimipressuralis CIP 104980 (clinical), E. hormaechei ATCC 49162 (sputum of a male patient), E. ludwigii EN-119 (clinical) and E. kobei JCM 8580 (clinical) sequenced in this study encode well-defined T6SSs containing vgrG and hcp proteins as orphan components. Several studies have reported the niche, antibiotic resistance and virulence of E. kobei and E. ludwigii, which are most commonly isolated from human clinical specimens [28][29][30] . Although these strains are most commonly associated with clinical specimens, variations in loci containing regulatory proteins such as prpC or prpK, hcp and vgrG have been observed, showing that these strains encode functional T6SSs. evolutionary analysis of ecc strains. The full-length rpoB sequences of five Ecc strains, namely, E. kobei, E. hormaechei, E. ludwigii, E. nimipressuralis and E. asburiae, and eight other bacterial species from various sources, particularly well-known pathogenic strains, were evolutionarily characterized. The sequence comparison revealed that five Ecc strains, i.e., E. kobei, E. hormaechei, E. ludwigii E. nimipressuralis and B. cenocepacia, formed a single clade, and B. cenocepacia, an opportunistic human pathogen, was also in the same clade. Phylogenetic affiliation of the Ecc strains with B. cenocepacia and other species, such as E. coli, Pseudomonas aeruginosa, and Mycobacterium tuberculosis, showed that the Ecc strains might encode similar pathogenic features (Fig. 3). www.nature.com/scientificreports www.nature.com/scientificreports/ Quantitative expression and analysis of T6SS genes. In this study, the structural genes (impA-impE, impG-impK, impM and prpC) of the 5 Ecc strains mentioned in Table 2 showed notable expression levels of activation in ASM culture conditions that mimicked in vivo conditions compared to LB medium, which was used as a control. This notable expression was observed in all Ecc-type strains with little variation. The imp operon is a conserved part of almost all T6SSs and actively participates in mediating hcp secretion 9,30 . The high expression of the imp operon and prpC under in vivo mimicking conditions revealed that the imp operon seems to play a very active role in virulence and host-microbe interactions and is fully functional. It has been determined that icmF and clpB are powerful T6SS hallmark proteins and that hcp exhibits ATPase activity 10 . Significantly, the high expression of icmF in all Ecc type strains showed that the secretion of the T6SS is powered by icmF. Furthermore, the absence of icmF was also verified by our real-time qPCR analysis, and its absence highlights that clpB may play a key role in the secretion of the T6SS imp operon in the absence of icmF in E. kobei and E. ludwigii.   Table 2. Comparative gene expression profile of type six secretion systems in Enterobacter cloacae complex in ASM and LB medium optimized at log phase. The changes in gene expression or variations were two-fold higher than those in the control (in vitro or LB medium conditions). (2020) 10:8150 | https://doi.org/10.1038/s41598-020-65001-4 www.nature.com/scientificreports www.nature.com/scientificreports/ conclusions Ecc genome-wide sequence analysis reveals key insights into the potential spread and evolution of antimicrobial resistance and virulence, including the T6SS, efflux system and environmental stress-related genes. The literature shows that these genes might play a significant role in either increasing the general fitness (iron uptake system and protease) or the competitiveness of Ecc bacterial species, which help them to better survive in diverse environments. The high gene expression of the T6SS in ASM medium revealed that the T6SS might be a crucial determinant and indispensable for the survival of Ecc strains in diverse environments. Moreover, the differences in gene expression profiles also indicate that ASM could be considered an alternative model system for Ecc instead of rats or humans. The Ecc genome sequence data have provided us with a real-time patient management resource for genome-scale antimicrobial resistance gene prediction. Furthermore, RNA sequencing and genetic engineering will be imperative for the in-depth investigation of the pathology and niche adaption of these strains.

Materials and Methods
culturing and genomic DnA extraction. The information about bacterial species/strains sequenced in this study is given in Table 1; the genome of E. asburiae was reported previously, while genome-wide analysis of the rest of the strains focusing on sigma factors was reported in our previous study 8,31 . Strains were cultured in LB medium and allowed to grow until mid-logarithmic phase at 30 °C in a shaking incubator. DNA was extracted using a genomic DNA isolation kit (Promega, Madison, WI). The concentration and quality of the DNA were determined by a Nanodrop 2000 (Thermo Fisher Scientific Germany).
Genome sequencing and assembly. PacBio sequencing technology (Pacific Biosciences, USA) was used to sequence the whole genome of four Ecc strains. The PacBio RS II sequencer was equipped with Sequence Runs for four single-molecule Real-Time (SMRT) cells and a movie time of 120 minutes/SMRT cell. Adapter trimming and read filtering were conducted with SMART Analysis (V2.2) using default parameters. The 600 Mb postfiltered data (approximately 80X coverage with an average read length of 7 kb) were taken for assembly. The Hierarchical Genome Assembly Process (HGAP) assembly protocol was used to perform the de novo genome assembly equipped with the SMRT Analysis packages (V 2.2).
Genome annotation and analysis. All four genomes were annotated using the RAST annotation system using SEED viewer for the prediction of rRNAs, tRNA, coding genes and GC content. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups of proteins (COGs) were used for the classification of the predicted genes, i.e., virulent genes, various secretion systems, drug resistance genes and different pathways. The SignalP server and TMHMM server were used to predict the signal peptides and transmembrane helices in genes 32-34 . comparative genomic analysis. The genome alignment of Ecc strains was conducted using the MAUVE software package 34 , which is employed to generate alignments of multiple genomes to look at the highly similar subsequences, evolutionary events such as inversions, and rearrangements and to reveal the correct global alignment. The genomes were further compared using the CGView tool 35,36 . A circular genomic map of the Ecc strains was generated using CGView, which presents circular genomes in a graphical map, resulting in base composition plots, sequence features and analyses of the GC skew, GC content and number of RNAs.
Comparison of the genomes of E. nimipressuralis CIP 104980, E. hormaechei ATCC 49162, E. asburiae JCM 6051, E. ludwigii EN-119 and E. kobei JCM 8580 was carried out by using Circos 36. Muscle was used to generate the sequence alignment. Conserved regions were trimmed by using trimAl 37 .
Genome-wide prediction and analysis of virulent genes. The existence of putative virulence-associated genes in the draft genomes of the Ecc strains isolated from various niches was analysed using the reference strain E. cloacae subsp. cloacae ATCC 13047 with RAST and SEED viewer. The presence or absence of virulent and drug-resistant coding sequences such as T1 to T6SS and efflux drug-resistant genes in the five Ecc genomes were retrieved using E. cloacae subsp. cloacae ATCC 13047 genes as bait sequences following independent confirmation by performing nucleotide BLAST analysis at NCBI as well as BioEdit. The genes with query coverage higher than 70% and similarities higher than 50% were taken as homologs. Using the RAST annotation output, the presence of putative known pathogenicity and drug resistance-associated genes were further investigated.
Furthermore, we conducted an extensive computational analysis including BLAST and BLASTP for the prediction of drug resistance genes in the five Ecc genomes. Moreover, the antibiotic resistance gene database (ARDB) 38,39 was also used to verify the drug resistance gene dataset to predict virulence and drug resistance genes in the five type strain genome sequences in our study.
To explore the bacterial secretion system, we followed the methods of Li et al., T346Hunter and SecReT6 40,41 , where the various secretion systems and the nominal adequate number of components of the secretion system have been explained in detail. Briefly, among other things, the method for identifying the secretory genes following relevant information for secretion system analysis was specified 40,41 . For example, (1) the secretory system is encoded in a single locus or in a multiple locus, (2) the core components (essential and ubiquitous) must be encoded, and (3) the components are defined as accessory if they are accessory or poorly conserved sequences, and the function of these sequences may be essential. Considering this information and using reference genes of E. cloacae subsp. cloacae ATCC 13047, BLASTP, BLASTN, TBLASX, HHpred, Phyre2, and SecReT6 as well as T346Hunter-based systematic analysis of the genetic architecture and gene contents of the secretion system was conducted 42 . evolutionary analysis of ecc based on the rpoB gene. The genetic diversity and evolutionary analysis of Ecc, i.e., E. nimipressuralis CIP 104980, E. hormaechei ATCC 49162, E. asburiae JCM 6051, E. ludwigii EN-119 and E. kobei JCM 8580, were analysed by reconstructing the phylogenetic tree based on the multiple sequence alignment data of rpoB, which encodes the β subunit of the bacterial RNA polymerase gene, which is used to analyse microbial diversity. MUSCLE, a comprehensive and powerful tool for next-generation sequence and molecular biology analysis, was used for nucleotide sequence alignment of rpoB genes. Evolutionary relationships among five Ecc strains were constructed using the MEGA 7 and CLUSTAL OMEGA tools in the form of an evolutionary tree. MEGA 7 was set with default parameters to build a tree with additional settings, including UPGMA, the neighbour-joining method, and minimum evolution method, with default preferences, including the statistical method, Poisson model and phylogeny reconstruction.
preparation of ASM and quantitative real-time pcR. T6SS quantitative gene expression analysis was conducted in control medium (LB) and artificial sputum medium (ASM). ASM is a culture medium containing the components of cystic fibrosis (CF) patient sputum, including amino acids, mucin and free DNA. Dozens of studies revealed that ASM could be used to mimic the host, particularly respiratory conditions, and as an alternative model to achieve reproducible culture conditions to study drug resistance, virulence and niche adaptation 43 . ASM was used to assess not only its efficacy as a medium but also to examine the gene expression profile of the T6SS by real-time PCR analysis. The lists of primers designed by using Primer 3 are listed in Table S4. ASM was prepared as described by Lawal et al. 43 . The inoculated Ecc culture was harvested at stationary phase. RNA was extracted using the RNAeasy Mini Kit (Qiagen, Germany); the cDNA library was prepared using a kit (Thermo Fischer Scientific, USA), and quantitative PCR analyses were conducted using PikoReal Real time PCR (Thermo Fischer Scientific, USA). The fold change in gene expression was determined by the average threshold cycle (Ct). The 2 −ΔΔCt method was used for relative quantification of the T6SS genes in the Ecc strains.