Comparative Genomics of 86 Whole-Genome Sequences in the Six Species of the Elizabethkingia Genus Reveals Intraspecific and Interspecific Divergence

Bacteria of the genus Elizabethkingia are emerging infectious agents that can cause infection in humans. The number of published whole-genome sequences of Elizabethkingia is rapidly increasing. In this study, we used comparative genomics to investigate the genomes of the six species in the Elizabethkingia genus, namely E. meningoseptica, E. anophelis, E. miricola, E. bruuniana, E. ursingii, and E. occulta. In silico DNA–DNA hybridization, whole-genome sequence-based phylogeny, pan genome analysis, and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed, and clusters of orthologous groups were evaluated. Of the 86 whole-genome sequences available in GenBank, 21 were complete genome sequences and 65 were shotgun sequences. In silico DNA–DNA hybridization clearly delineated the six Elizabethkingia species. Phylogenetic analysis confirmed that E. bruuniana, E. ursingii, and E. occulta were closer to E. miricola than to E. meningoseptica and E. anophelis. A total of 2,609 clusters of orthologous groups were identified among the six type strains of the Elizabethkingia genus. Metabolism-related clusters of orthologous groups accounted for the majority of gene families in KEGG analysis. New genes were identified that substantially increased the total repertoire of the pan genome after the addition of 86 Elizabethkingia genomes, which suggests that Elizabethkingia has shown adaptive evolution to environmental change. This study presents a comparative genomic analysis of Elizabethkingia, and the results of this study provide knowledge that facilitates a better understanding of this microorganism.

Hong Kong, the USA, and Taiwan [2][3][4][5][6][7][8][9] . The most severe outbreak of E. anophelis reported on to date occurred in Wisconsin and Illinois between November 1, 2015 and April 12, 2017 5,7,9 . With advances in molecular biology and biotechnology, whole-genome sequencing has become a popular technique in microbiology research. Whole-genome sequences of microbes can provide comprehensive information on virulence factors, pathogenesis, drug resistance, metabolism, host-pathogen interaction, host-environment reaction, and others. We previously published whole-genome sequences for two Elizabethkingia species: E. anophelis (strain EM361-97; GenBank accession number, LWDS00000000) 15 and E. miricola (strain EM798-26; GenBank accession number, CP023746) 16 . E. miricola strain EM798-26 has been re-classified as E. bruuniana according to the whole-genome analysis 17,18 . However, few studies have investigated the comparative genomics of the six species in the Elizabethkingia genus. In this study, we used several methods to comprehensively analyze and compare the genomic features of all available whole-genome sequences of Elizabethkingia strains in GenBank.

Ethics and experimental biosafety statements. This study was approved by the Institutional Review
Board of E-Da Hospital (EMRP-106-105). The need for patient's informed consent was waived by the Institutional Review Board of E-Da Hospital as the retrospective analysis of clinical data posed no more than minimal risk of harm to subjects and involved no procedures for which written consent was normally required outside of the research context. The experiments in this study were approved by the Institutional Biosafety Committee of E-Da Hospital. All experiments were performed in accordance with relevant guidelines and regulations.
isolates for genome sequencing. Two clinical strains, namely E. anophelis EM361-97 and E. bruuniana EM798-26, were isolated from the blood of two cancer patients in Taiwan. The detailed information has been described previously 15,16 . In brief, the genomic DNA of these two strains was sequenced using an Illumina HiSeq. 2000 sequencing platform (Illumina, San Diego, CA, USA), and the DNA short reads were assembled using SOAP 19 . The sequences of the E. bruuniana strain EM798-26 were further analyzed using PacBio (Pacific Biosciences of California, Menlo Park, CA, USA) and optical mapping (Bionano Genomics, San Diego, CA, USA). Finally, 26 scaffolds and 27 contigs were generated for the E. anophelis strain EM361-97 15 , and the complete whole-genome sequence of the E. bruuniana strain EM798-26 was constructed 16,17 . The gene annotations of these two strains were performed using the Prokaryotic Genome Annotation Pipeline of the National Center for Biotechnology Information (NCBI) 20 . In silico DNA-DNA hybridization. For genome-based species delineation, in silico DNA-DNA hybridization (DDH) was performed using the Genome-to-Genome Distance Calculator (GGDC) (http://ggdc.dsmz. de/home.php) 21 . The results of formula 2 were adopted according to the suggestion of the authors in a previous study 21 . A cutoff value of 70% was used as the delimitation criteria of microorganism species 21 . The heat map was generated using CIMminer (https://discover.nci.nih.gov/cimminer/). Construction of whole-genome sequence-based phylogenetic tree. The whole-genome sequence-based phylogenetic tree was constructed using the Reference sequence Alignment based Phylogeny builder (REALPHY) (https://realphy.unibas.ch/fcgi/realphy) 22 . The whole-genome sequences of the 86 Elizabethkingia strains were submitted to the online pipeline of REALPHY in the FASTA format. The sequence alignments of phylogeny were performed using PhyML. The phylogenetic tree was edited using Dendroscope 23 .

Analysis of orthologous genes.
To evaluate the evolution of ancestor genes in the different species, the online program OrthoVenn (http://www.bioinfogenome.net/OrthoVenn/) was used to evaluate the clusters of orthologous groups (COGs) in the six type strains of the Elizabethkingia genus 24 . The whole-genome sequences of the six type strains, namely E. meningoseptica KC1913 T (=ATCC 13253 T ), E. anophelis R26 T , E. miricola GTC 862 T (=KCTC 12492 T = W3-B1), E. bruuniana G0146 T , E. ursingii G4122 T , and E. occulta G4070 T , were included for comparison of the COGs. The virulence factors were analysed using the Virulence Factor Database (VFDB) 25 . The antimicrobial resistance-associated genes and transposable element (transposon) genes were identified using the Rapid Annotations based on Subsystem Technology (RAST) Prokaryotic Genome Annotation Server (http:// rast.nmpdr.org/) 26 .

Results and Discussion
Species delineation through in silico DDH. Figure 1 presents the results of in silico DDH for the 86 Elizabethkingia strains. The in silico DDH values between E. meningoseptica and the other five species were obviously lower than the values among the other species. The three novel species, namely E. bruuniana, E. ursingii, and E. occulta, were close to E. miricola in the dendrogram. The heat map built from the similarity matrix clearly displayed the delineation of the six species in the Elizabethkingia genus.
With the development of next-generation sequencing, numerous whole-genome sequences are available in the repositories of GenBank, and whole-genome sequencing can be used as a new method for species discrimination. For genome-based species delineation, in silico DDH is considered as an accurate substitution for traditional DDH 21,28-30 . Our study used GGDH, an in silico method for genome-to-genome comparison, to discriminate Elizabethkingia species. Similar to the results of the previous study performed by Nicholson et al. 14 , our study clearly supports that in silico DDH can noticeably distinguish Elizabethkingia species.
Phylogenetic evolution based on whole genomes. Figure 2 illustrates the phylogenetic tree based on the whole-genome sequences of the 86 Elizabethkingia strains, which was constructed using REALPHY. The six species of the Elizabethkingia genus were evidently separate from each other. Similar to the dendrogram generated based on in silico DDH, E. bruuniana, E. ursingii, and E. occulta were located close to E. miricola and were away from E. anophelis and E. meningoseptica. E. anophelis were divided into several sublineages. www.nature.com/scientificreports www.nature.com/scientificreports/ Phylogeny constructed using whole-genome sequences is known to be more congruent than that built using a single gene or a small fraction of the genome, such as 16S rRNA or rpoB genes 31 . Through traditional DDH, Elizabethkingia species were previously classified as genomospecies. Based on the phylogenetic study of whole-genome sequences, E. miricola was found to be the most similar to genomospecies 2, E. anophelis was the closest to genomospecies 1, genomospecies 3 was near E. bruuniana, and genomospecies 4 was proposed as E. ursingii 14 . Moreover, the phylogenetic tree of E. anophelis strains showed several clearly demarcated sublineages in our study. The analysis of genomic features in the Wisconsin outbreak of E. anophelis also revealed different phylogenetic subclusters with distinctive temporal and geographic dynamics 5 . In the present study, we constructed the phylogenetic tree of the 86 Elizabethkingia strains using whole-genome sequences; the phylogenetic tree clearly demonstrated the phylogenetic relationship among these strains. The phylogeny generated using whole genomes provides not only species delineation but also comprehensive insights into comparative analyses of phylogenetic evolution across different species. functional coGs. COGs, also known as orthologs, are a group of genes in different species that have descended from a common gene in the same ancestor 32 . These genes usually retain the original function during the evolution of microorganisms, and they determine the relationships between the genome structure, gene function, and taxonomic classification 32 . Subsequently, it is crucial to recognize COGs and predict their functions, particularly in emerging pathogens with newly sequenced genomes 32 . Figure 3 presents the COGs of the six Elizabethkingia type species. The total number of genes in the six species ranged from 3,066 to 3,629. The E. bruuniana strain G0146 possessed the highest number of genes, and the E. meningoseptica strain KC1913 had the lowest number of genes. A total of 2,609 shared COGs were identified among the six Elizabethkingia type species (Fig. 3; Supplementary File 2). The E. miricola strain GTC 862 had the lowest number of unique gene families (n = 2), and the E. anophelis strain R26 had the highest number of unique gene families (n = 25). The difference in the number of unique gene families may reflect the phenotypical traits that are specific to the group of bacteria 33 .
In the present study, the genome annotation of the six Elizabethkingia type species using the RAST Server revealed abundant putative genes associated with transposable elements. These transposons included hypothetical conjugative transposons BF0131, putative conjugative transposon mobilization protein BF0132, putative mobilization protein BF0133, and conjugative transposon protein TraA, TraB, TraE, TraF, TraG, TraJ, TraK, TraM, TraN, TraO, and TraQ. A recent study investigated putative integrative and conjugative elements (ICEs) in 13 Patients infected with Elizabethkingia have shown a mortality rate of 34% to 60%. The immune status of patients and virulence factors of microorganisms may be associated with the high case-fatality rate of Elizabethkingia infections. Previous studies have shown that patients with Elizabethkingia infections frequently had chronic illnesses. For example, 85% of patients with E. anophelis infections had comorbidities, such as malignancy (45%), cardiovascular diseases (37%), and diabetes mellitus (25%) 6 . In the present study, many homologs of offensive, defensive, nonspecific, and virulence-associated regulatory factors were identified in the COGs of Elizabethkingia isolates (Supplementary File 2). The potential virulence factor homologs and their associated genes of the six Elizabethkingia species predicted using the VFDB are shown in Table 1. These virulence factors included heat shock protein, phospholipase, capsular polysaccharide, catalase, peroxidase, and others. The distribution of virulence factors was similar among E. meningoseptica KC1913 T , E. anophelis R26 T , and E. bruuniana G0146 T . Some virulence factors, such as polar flagella (flmH), exopolysaccharide (galE, pgi, acpXL, hemL, ureB, ureG), Mg 2+ transport (mgtB), and type III secretion system effectors (hopJ1) were identified only in E. miricola GTC 862 T , E. ursingii G4122 T , and E. occulta G4070 T . It is interesting that there are no flagella in Elizabethkingia species. However, the putative flmH was identified in E. miricola, E. ursingii, and E. occulta. Further studies are necessary to investigate the function and source of this putative gene.

Classification of
Functional analysis of the COGs in the 86 Elizabethkingia genomes revealed that the majority of core genomes were associated with metabolism, and the unique gene families were mostly related to "information storage and processing" (Fig. 4A). In the COG analysis, "information storage and processing" includes RNA processing and modification, chromosome dynamics, translation, transcription, replication, recombination, and repair 35 . The function of COGs with "information storage and processing" might be associated with intracellular survival 36 . www.nature.com/scientificreports www.nature.com/scientificreports/ species. No antimicrobial resistance-associated genes were identified on ICEs in the genome annotation using the RAST Server.
Previous studies have shown that Elizabethkingia isolates were frequently resistant to many antimicrobial agents, including most β-lactams, β-lactams/β-lactamase inhibitors, aminoglycosides, macrolides, tetracycline, vancomycin, and carbapenems. But they showed variable susceptibility to piperacillin, piperacillin-tazobactam, fluoroquinolones, minocycline, tigecycline, and trimethoprim-sulfamethoxazole [2][3][4][5][6]37 . Genes associated with drug resistance in the Elizabethkingia genus have been reported. For example, Opota et al. reported that bla  and bla B-9 carbapenemase-encoding genes were identified in a carbapenemase-producing clinical isolate, Elizabethkingia miricola EM_CHUV 38 . Previous reports indicated that E. meningoseptica and E. anophelis were resistant to several classes of antimicrobials 2-6 . A recent study revealed that E. bruuniana has a similar antimicrobial susceptibility pattern to other Elizabethkingia species 18 . The present study also showed that the antimicrobial resistance-associated genes of the three new Elizabethkingia species are similar to those of the other three species.
It is noteworthy that vancomycin resistance gene (vanW) was found in these six Elizabethkingia type species. vanW is included in the vanB gene cluster. The exact function of vanW is still unknown. However, mutations in vanW has been identified in microorganisms with VanB-type glycopeptide resistance 39 . Vancomycin has been anecdotally reported to successfully treat patients with E. meningoseptica meningitis 40 . However, several recent studies revealed that most Elizabethkingia species exhibited a high minimum inhibitory concentration of vancomycin 41,42 . Therefore, the use of vancomycin is not suggested for patients with Elizabethkingia infections 41,42 . Core and pan genome analysis. In 2005, Tettelin et al. proposed pan genome as the whole-genomic repertoire of a microorganism 43 . Pan genome analysis can be used to discriminate the diversity of genomes and explore the core, accessory, and unique genes 44 . To understand the pan genome of Elizabethkingia, the whole-genome sequences of the 86 strains were examined. The distribution of gene families and the number of new genes are shown in Fig. 6A,B, respectively. The core, accessory, and unique genes are presented as a flower plot in Fig. 6C. In the 86 Elizabethkingia strains, 1,154 core (conserved) genes were recognized. In each strain, the number of accessory genes ranged from 996 to 2,738, and the number of unique genes ranged from 0 to 215. With the addition of new genome sequences, the genes of the pan genome increased from 2,110 to 9,794, and core genes decreased from 3,002 to 824 (Fig. 6D).

conclusions
This study presents a comparative genomic analysis of 86 Elizabethkingia strains with whole-genome sequences available in GenBank. Because Elizabethkingia infections have emerged as a critical public health issue worldwide, knowledge on the clinical, molecular, and genetic characteristics is of paramount importance. The results of this study provide information to understand the population genomics, phylogenetic distinctness, evolutionary features, and genetic functions of this emerging and life-threatening pathogen.  Table 2. The antimicrobial resistance-associated proteins of the six Elizabethkingia type species identified using the Rapid Annotations based on Subsystem Technology (RAST) Prokaryotic Genome Annotation Server. *BLI: metal-dependent hydrolases of the beta-lactamase superfamily I.