Introduction

Campylobacter concisus is a Gram-negative motile bacterium that grows under both anaerobic and microaerobic conditions with the presence of hydrogen significantly aiding growth1. The human oral cavity is the natural colonization site of C. concisus, although C. concisus may also colonize the intestinal tract in some individuals2,3.

C. concisus has gained increasing attention in recent years due to its association with enteric diseases, in particular inflammatory bowel disease (IBD) which includes Crohn’s disease (CD) and ulcerative colitis (UC). A number of studies reported a significantly higher detection of C. concisus by PCR in intestinal biopsies collected from patients with IBD as compared to controls4,5,6,7. In addition to IBD, C. concisus was frequently isolated from diarrheal stool samples, suggesting its possible role in human diarrheal disease8,9,10,11.

Previous studies found that some oral C. concisus strains or their toxins were able to damage the intestinal epithelial barrier and induce intestinal epithelial production of proinflammatory cytokines using cell line models11,12,13. These data suggest that translocation of enteric virulent C. concisus strains from the human oral cavity to the intestinal tract may cause enteric diseases in some individuals.

Earlier studies found that some C. concisus strains had only 42–50% DNA-DNA hybridization value with the reference C. concisus strain; however no phenotypic tests were able to differentiate them14. These strains were referred to as different genomospecies15. C. concisus has two genomospecies, which were defined by the analysis of amplified fragment length polymorphisms (AFLP), housekeeping genes and a PCR method targeting the polymorphisms of C. concisus 23S rRNA gene15,16,17,18,19,20,21,22. The two C. concisus genomospecies contained both oral and enteric C. concisus strains15,16,17,18,19,20,21. Strains from the two C. concisus genomospecies appear to have different enteric pathogenic potentials. Oral C. concisus strains that were invasive to intestinal epithelial cells were found in Genomospecies 2 (GS2)10,11. GS2 C. concisus strains were more often isolated from faecal samples collected from patients with bloody diarrhoea and they were more invasive to intestinal epithelial cells as compared to Genomospecies 1 (GS1) strains15,16.

Currently, no studies have compared the genomes of C. concisus strains from different genomospecies. Identification of C. concisus genomospecies-specific genes and other genomic features will provide insights into the evolution and pathogenic potential of this bacterium. We therefore performed comparative genome analysis of 36 C. concisus strains including 27 strains that were sequenced in this study and nine publically available C. concisus genomes, which revealed new genomic features of C. concisus genomospecies and identified novel genomic islands that contain proteins homologous to the type IV secretion system (T4SS) and potential virulence effector proteins.

Results

The draft genomes of 27 C. concisus strains

The genomes of 27 C. concisus strains were sequenced in this study. These 27 C. concisus strains were previously isolated in our laboratory from patients with CD, UC and healthy controls and they were randomly selected for inclusion into this study. Ten of these strains were analysed in our previous studies of grouping C. concisus strains using housekeeping genes3,6,10,23.

The draft genome sizes of these C. concisus strains were 1.80 to 2.21 Mb. The contig numbers ranged from 7 to 76. The fold coverage ranged from 83.98 to 230.58. The summaries of the C. concisus genomes sequenced in this study are in Table 1.

Table 1 Summary of the 27 C. concisus genomes sequenced in this study.

The core-genome and accessory genes

The C. concisus core-genome was derived from 36 C. concisus strains including the 27 C. concisus strains sequenced in this study and nine C. concisus genomes that are publically available3,6,10,23,24.

The C. concisus core-genome of the 36 C. concisus strains consisted of 582 genes, which were 28.7% (582/2025) of the total number of genes present in C. concisus strain 13826. The core-genomes of GS1 and GS2 strains had 1,098 and 1,143 genes respectively. The genes in both GS1 and GS2 C. concisus core-genomes were evenly distributed amongst different Clusters of Orthologous Groups (Supplementary Fig. S1). The accessory genes in the 36 C. concisus strains ranged from 1,163 to 1,521.

The two C. concisus genomospecies identified from analysis of C. concisus core-genome, housekeeping genes and 23S rRNA gene

The phylogenetic tree generated based on the core-genome sequences divided the 36 C. concisus strains into two genomospecies. Most of the strains belonged to GS2 (77.8%, 28/36) while only eight strains belonged to GS1 (22.2%, 8/36). GS1 and GS2 contained both oral and enteric strains. Some individuals carried C. concisus strains from both GS1 and GS2. For example, multiple strains from two individuals (P20CDO-S1, P20CDO-S2, P20CDO-S3, and P20CDO-S4 from a patient with CD as well as H21O-S1, H21O-S2, H21O-S3 and H21O-S5 strains from a healthy individual) were found in different genomospecies (Fig. 1).

Figure 1
figure 1

The phylogenetic tree generated based on C. concisus core-genome sequences.

The phylogenetic tree was generated based on the C. concisus core-genome generated from 36 Campylobacter concisus strains using Roary45. Oral strains from patients with IBD that were sequenced in this study are coloured red. Oral strains from healthy controls that were sequenced in this study are coloured blue. Oral strain ATCC 33237 is coloured purple; this strain was isolated from a patient with gingivitis. Enteric strains are coloured green. The genome of enteric strain P3UCB1, a strain isolated from intestinal biopsies of a patient with UC, was sequenced in this study. The remaining genomes of enteric C. concisus strains are publically available. Enteric strain ATCC 51561 was isolated from faecal samples of a healthy individual. Enteric strains UNSW2, UNSW3 and UNSWCD were isolated from patients with CD24. The remaining enteric strains were isolated from patients with gastroenteritis. Bootstrap values of more than 70 are indicated on the internal branches. GS1 and GS2 indicate Genomospecies 1 and 2 respectively.

Both housekeeping genes and a PCR method targeting the polymorphisms of 23S rRNA gene were previously used to separate C. concisus strains into different groups15,16,17,18,19,20,21. In this study, we compared the assignment of C. concisus strains by housekeeping genes and 23S rRNA gene. The sequences of these housekeeping genes or 23S rRNA gene divided the 36 strains into two clusters, consistent with the GS1 and GS2 grouping assigned based on the C. concisus core-genome (Figs 2 and 3).

Figure 2
figure 2

The phylogenetic tree generated based on housekeeping genes of the 36 Campylobacter concisus strains.

The sequences of six housekeeping genes (asd, aspA, atpA, glnA, pgi and tkt) were extracted from the 36 C. concisus strains and were used to generate the phylogenetic tree using neighbour-joining method, which was performed using molecular evolutionary genetic analysis software version 6.06 (MEGA 6.06) with 1,000 bootstrap replications47. Oral strains from patients with IBD that were sequenced in this study are coloured red. Oral strains from healthy controls that were sequenced in this study are coloured blue. Oral strain ATCC 33237 is coloured purple; this strain was isolated from a patient with gingivitis. Enteric strains are coloured green. The genome of enteric strain P3UCB1, a strain isolated from intestinal biopsies of a patient with UC, was sequenced in this study. The remaining genomes of enteric C. concisus strains are publically available. Enteric strain ATCC 51561 was isolated from faecal samples of a healthy individual. Enteric strains UNSW2, UNSW3 and UNSWCD were isolated from patients with CD24. The remaining enteric strains were isolated from patients with gastroenteritis. Bootstrap values of more than 70 are indicated on the internal branches. Campylobacter jejuni strain NCTC11168 was used as an outgroup (GenBank accession no. NC_002163). GS1 and GS2 indicate Genomospecies 1 and 2 respectively.

Figure 3
figure 3

The phylogenetic tree generated based on the sequences of 23S ribosomal RNA genes of the 36 Campylobacter concisus strains.

The phylogenetic tree was generated based on the sequences of the 23S ribosomal RNA genes. The neighbour-joining method was used to generate the phylogenetic tree, which was performed using Molecular Evolutionary Genetic Analysis software version 6.06 (MEGA 6.06) with 1,000 bootstrap replications47. Oral strains from patients with IBD that were sequenced in this study are coloured red. Oral strains from healthy controls that were sequenced in this study are coloured blue. Oral strain ATCC 33237 is coloured purple; this strain was isolated from a patient with gingivitis. Enteric strains are coloured green. The genome of enteric strain P3UCB1, a strain isolated from intestinal biopsies of a patient with UC, was sequenced in this study. The remaining genomes of enteric C. concisus strains are publically available. Enteric strain ATCC 51561 was isolated from faecal samples of a healthy individual. Enteric strains UNSW2, UNSW3 and UNSWCD were isolated from patients with CD24. The remaining enteric strains were isolated from patients with gastroenteritis. Bootstrap values of more than 70 are indicated on the internal branches. Bootstrap values of more than 70 are indicated on the internal branches. Campylobacter jejuni strain NCTC11168 was used as an outgroup (GenBank accession no. NC_002163). GS1 and GS2 indicate Genomospecies 1 and 2 respectively.

A previous study examining eight C. concisus strains found that the 16S rRNA gene was able to differentiate C. concisus strains isolated from patients with gastroenteritis and CD24. However, in this study, we found that the 16S rRNA gene was unable to differentiate C. concisus genomospecies or their related diseases (Fig. 4).

Figure 4
figure 4

The phylogenetic tree generated based on the sequences of 16S ribosomal RNA genes for the 36 Campylobacter concisus strains.

The phylogenetic tree was generated based on the sequences of the 16S ribosomal RNA genes. The neighbour-joining method was used to generate the phylogenetic tree, which was performed using Molecular Evolutionary Genetic Analysis software version 6.06 (MEGA 6.06) with 1,000 bootstrap replications47. Oral strains from patients with IBD that were sequenced in this study are coloured red. Oral strains from healthy controls that were sequenced in this study are coloured blue. Oral strain ATCC 33237 is coloured purple; this strain was isolated from a patient with gingivitis. Enteric strains are coloured green. The genome of enteric strain P3UCB1, a strain isolated from intestinal biopsies of a patient with UC, was sequenced in this study. The remaining genomes of enteric C. concisus strains are publically available. Enteric strain ATCC 51561 was isolated from faecal samples of a healthy individual. Enteric strains UNSW2, UNSW3 and UNSWCD were isolated from patients with CD24. The remaining enteric strains were isolated from patients with gastroenteritis. Bootstrap values of more than 70 are indicated on the internal branches. Campylobacter jejuni strain NCTC11168 was used as an outgroup (GenBank accession no. NC_002163).

Genomospecies-specific genes

Using Burrows-Wheeler Aligner, BLASTn and BLASTx, we found that some genes that were present in all GS1 C. concisus strains were absent in all GS2 strains and vice versa, showing that these were genomospecies-specific genes. The flanking regions of GS1-specific genes were found in the genomes of all GS2 strains on unbroken contigs, and vice versa, further confirming that they were truly genomospecies specific.

Of the nine GS1-specific genes, three genes encode phosphate transport proteins (PstS, PstA and PstC). The remaining GS1-specific genes encode hypothetical proteins, transporter proteins and enzymes (Table 2). Fourteen GS2-specific genes were found, including genes that encode a protein involved in regulation of osmolarity (aquaporin Z), a protein involved in pH homeostasis and sodium extrusion (Na+/H+ antiporter NhaC), twitching motility protein and the others (Table 2).

Table 2 Genomospecies-specific genes.

CRISPR-associated proteins

Twenty-two C. concisus strains, all belonged to GS2, were found to have genes encoding CRISPR-associated proteins. Cas1, Cas2, Cas3 and Cas4a proteins were found in all 22 strains. Cas5h, Csh1 and Csd2/Csh2 proteins were found in most of these 22 strains, Cas6 protein was found in five strains and the remaining seven CRISPR-associated proteins were found in one or two C. concisus strains (Table 3).

Table 3 CRISPR-associated proteins in Campylobacter concisus strains.

Two different genomic islands containing T4SS homologues and putative effector proteins were found in enteric and oral C. concisus strains respectively

P3UCO1 and P3UCB1 strains were isolated from saliva and intestinal biopsies of a patient with UC. These two strains were genetically closely related (Fig. 1). Interestingly we found a region in the genome of the enteric strain P3UCB1 that was absent in the genome of the oral strain P3UCO1 (Fig. 5A). The size of this region is 31,286 bp, beginning with an integrase. This region contained five proteins homologous to T4SS proteins from the tumour inducing (Ti) plasmid in plant pathogen Agrobacterium tumefaciens, which includes VirB4, VirB8, VirB9, VirB10 and VirB11. Their similarities to the A. tumefaciens VirB proteins were 41%, 42%, 29%, 39% and 50% respectively. Furthermore, this region had proteins homologous to the RP4 plasmid conjugative transfer protein TraQ, the plasmid partitioning protein ParA and to various hypothetical proteins. Collectively, these findings showed that this region is a plasmid derived genomic island, which we have named the C. concisus plasmid integrative island A (CON_PiiA) (Fig. 5A and Table 4). Two additional enteric C. concisus strains, UNSW2 and ATCC 51562 were found to have CON_PiiA based on the annotated proteins. CON_PiiA was identified in 37.5% (3/8) of the enteric C. concisus strains isolated from individuals with enteric disease and interestingly none of the oral strains (0/27), which was statistically different (P = 0.0086). The core-genomes of multiple oral strains collected from some individuals were genetically similar (Fig. 1), which may lead to biased statistical results. Therefore, we re-analysed the data by considering multiple oral C. concisus strains from a given individual as one strain if these strains were in the same small group in Fig. 1. P24CDO-S3, P24CDO-S2 and P24CDO-S4 were considered as one strain, P2CDO3 and P2CDO-S6 were considered as one strain, P20CDO-S1 and P20CDO-S3 were considered as one strain, H21O-S1 and H21O-S5 were considered as one strain. Therefore, the total number of oral strains used for re-analysis was 22 instead of 27. The presence of CON_PiiA in enteric strains isolated from patients with enteric diseases and oral C. concisus strains was still significantly different 37.5% (3/8) vs (0/22) (P = 0.0138).

Table 4 Putative effector proteins and other proteins in CON_PiiA and CON_PiiB genomic islands.
Figure 5
figure 5

Genomic islands CON_PiiA and CON_PiiB.

(A) Comparison of proteins in C. concisus strains P3UCO1 and P3UCB1 shows the insertion of CON_PiiA island in P3UCB1 strain. The identical proteins in these two strains are shaded in dark grey. (B) Proteins in CON_PiiA and CON_PiiB islands. T4SS homologous proteins are coloured orange and the putative effector proteins are coloured purple. The two proteins that had more 40% identities in CON_PiiA and CON_PiiB are shown with light grey lines. The remaining proteins in these two islands had less than 20% amino acid identities.

We found a second genomic island in oral C. concisus strains. A contig in H17O-S1 strain contained the entire island, which was closely examined. Like P3UCB1 strain, H17O-S1 strain had a region containing genes encoding homologues of VirB4 (44% similarity), VirB8 (45%), VirB9 (40%), VirB10 (49%) and VirB11 (49%). Additionally there were proteins homologous to TraQ and various hypothetical proteins. Furthermore, H17O-S1 strain contained genes encoding homologues of VirB5 (33%), VirB6 (32%) and VirD4 (43%) from the Ti plasmid in A. tumefaciens, which were not seen in CON_PiiA (Table 4). Repetitive sequences (AGTCCTGGTGAACCCACCA), indicative of attachment sites, were found between an integrase and tRNA-Met-CAT at the positions of 675,445–675,463 bp and 714,647–714,667 bp. Except for two proteins, this region had less than 20% amino acid identities to proteins in CON_PiiA. We named this region C. concisus plasmid integrative island B (CON_PiiB), which was 38,653 bp in length (Fig. 5B). The nine VirB proteins and some CON_PiiB proteins were also found in the remaining four oral C. concisus strains from two individuals including three strains from one patient with CD (P21CDO-S1, P21CDO-S2, P21CDO-S4), and one strain from a healthy individual (H14O-S1). However, the contigs in the three strains from the patient with CD were not long enough to reveal the entire sequence of CON_PiiB island. CON_PiiB was found in 18.5% (5/27) oral C. concisus strains and none of the enteric strains (0/9), which was not statistically significant (P > 0.05). The prevalence of CON_PiiB in oral strains isolated from healthy individuals and patients with IBD was 20% (2/10) and 18.8% (3/16) respectively, which was not statistically significant (P > 0.05).

Potential effector proteins within CON_PiiA and CON_PiiB islands were found. A number of proteins in both islands had similarities to Legionella pneumophila virulence effector proteins, most of which, such as LepB and LepA, are involved in intracellular survival of the pathogen25,26,27,28,29. One protein had similarities to Helicobacter pylori cytotoxin-associated protein A (CagA), which is a virulence factor associated with more severe disease states in H. pylori infection30. The details of the comparison between proteins in CON_PiiA and CON_PiiB islands and effector proteins are shown in Table 4.

Discussion

We performed comparative genome analysis of 36 C. concisus strains, of which 27 strains were sequenced in this study.

Previous studies using different molecular methods such as AFLP, analysis of housekeeping genes and PCR of the 23S rRNA gene showed that C. concisus has two genomospecies15,16,17,18,19,20,21. There was some evidence that C. concisus strains of these two genomospecies may have different pathogenic potential15,16,17,18,19,20,21. For example, strains invasive to intestinal epithelial cells were often found in GS210,11. Despite these findings, there is a lack of understanding regarding these two C. concisus genomospecies at the genome level.

In this study, for the first time we compared the genomes of C. concisus strains from different genomospecies, which revealed new genomic features of this bacterium. We analysed the nine publically available C. concisus genomes, together with the genomes of additional 27 C. concisus strains that we have sequenced. We generated the C. concisus core-genome from these 36 C. concisus strains. The core-genome, the sequences of six housekeeping genes and the 23S rRNA gene consistently assigned these C. concisus strains into two genomospecies (Figs 1, 2, 3). The enteric strains did not form distinct groups within both genomospecies, further supporting our previous theory that some oral C. concisus strains may cause enteric disease when colonizing the intestinal tract3,31,32. The previous study examining eight C. concisus strains reported that 16S rRNA gene of C. concisus strains was able to differentiate C. concisus strains isolated from patients with CD and gastroenteritis, this was not observed in our study where 36 C. concisus strains were examined (Fig. 4)24.

We found nine genes that were specific to GS1 C. concisus strains and fourteen genes that were specific to GS2 C. concisus strains, some of which encode proteins that may contribute to the survival and pathogenicity of C. concisus (Table 2). For example, three of the nine GS1-specific genes encode proteins involved in phosphate transport (PstS, PstA, PstC), suggesting that strains of GS1 and GS2 may differ in their phosphate uptake. Aquaporin Z was found in all GS2 C. concisus strains, but not in any GS1 strains. Aquaporin Z is a protein that moves water across bacterial membranes to maintain intracellular osmotic pressure33. The finding that GS2 C. concisus strains have aquaporin Z suggests that they may have enhanced abilities in adapting to environments where osmolarity frequently changes.

The type I CRISPR system, which has the Cas3 protein, was found in 78.6% (22/28) of GS2 C. concisus strains (Table 3). However, the number of CRISPR-associated proteins between C. concisus strains varied. Cas6, an endoribonuclease that generates RNAs for defense in the type I CRISPR system, was present in only five C. concisus strains. CRISPR system provides acquired immunity to plasmids and phages34,35. The CRISPR proteins found in C. concisus strains do not seem to be related to CON_phi2 prophage that contains the zonula occludens toxin gene31. The C. concisus Zot was found to damage intestinal epithelial barrier and affect the function of macrophages and the zot gene was detected in C. concisus strains from both GS1 and GS211,23,36.

Two novel C. concisus genomic islands were identified in this study. CON_PiiA and CON_PiiB islands were found in both GS1 and GS2 C. concisus strains. CON_PiiA was found in 37.5% (3/8) of enteric strains isolated from patients with enteric diseases including two patients with IBD and one patient with gastroenteritis, but not in the 27 oral C. concisus strains, a difference that was statistically significant. CON_PiiA was not found in ATCC 51561, an enteric strain isolated from faecal samples of a healthy individual. CON_PiiB was found in 18.5% (5/27) of oral C. concisus strains and none of the enteric strains, this difference did not reach statistical significance. Collectively, these data suggest that the CON_PiiA island may preferably integrate into enteric C. concisus strains isolated from patients with enteric diseases. However, the numbers of enteric C. concisus strains included in this study were small, larger numbers of enteric C. concisus strains need to be examined to confirm this finding.

Both CON_PiiA and CON_PiiB islands contained T4SS homologous proteins. The T4SS system is used by microorganisms to transport macromolecules such as proteins or DNA across the cell envelope37. T4SS may be involved in plasmid conjugation, uptake or release of DNA or transfer effector proteins into host cells38. The well-studied H. pylori cag pathogenicity island encodes proteins homologous to VirB2, VirB4, VirB5, VirB7, VirB9, VirB10, VirB11 and VirD4; these proteins deliver effector proteins such as CagA to host cells through the formation of a pilus39. Putative effector proteins similar to L. pneumophila and H. pylori virulence effector proteins were found in both CON_PiiA and CON_PiiB islands. The virulence effector proteins in L. pneumophila are mainly involved in bacterial survival within macrophages25,26,27,28,29. H. pylori CagA virulence factor is associated with gastric cancer30. Given that the two novel C. concisus genomic islands found in this study contained proteins similar to T4SS and their effector proteins found in human pathogens, CON_PiiA and CON_PiiB islands are likely to be involved in C. concisus virulence. However, the putative effector proteins found in CON_PiiA and CON_PiiB islands had similarities to only a fragment of CagA and L. pneumophila effector proteins. Their true virulence requires confirmation by characterization of individual proteins in these islands.

To our knowledge, this is the first study examining the genomes of C. concisus strains of different genomospecies. We sequenced the genomes of 27 C. concisus strains and performed comparative genome analysis of 36 C. concisus strains. We generated the core-genome from 36 C. concisus strains. The C. concisus core-genome, six housekeeping genes and 23S rRNA gene consistently divided the 36 strains into two genomospecies. We also identified GS1 and GS2 C. concisus specific genes. Furthermore, we identified two novel genomic islands that contained T4SS homologous proteins and putative effector virulence proteins; CON_PiiA appeared to be associated with enteric C. concisus strains isolated from patients with enteric diseases. The new C. concisus genomic features obtained from this study provide novel insights into understanding of the pathogenicity of this emerging opportunistic pathogen.

Methods

C. concisus strains used for genome sequencing

C. concisus strains sequenced in this study were isolated from saliva samples or intestinal biopsies in our previous studies3,6,11,22. The genomes of 27 C. concisus strains were sequenced. C. concisus strains were grown on Horse Blood Agar (HBA) plates as previously described1. DNA was extracted from each C. concisus strain using the Gentra Puregene Yeast/Bacteria Kit according to the manufacturer’s instructions (Qiagen, Hilden, Germany). The quality of DNA was checked using Nanodrop and Qubit Fluorometer. Bacterial genomic DNA (1 ng) was used for genomic library generation in accordance with the Nextera XT protocol (Ver. May 2012). Libraries were sequenced for a 250 bp paired-end sequencing run using Nextera XT V2 on the MiSeq Personal Sequencer running version 1.1.1 MiSeq Control Software (Illumina Inc., San Diego, CA, USA). Reagent contamination was controlled by barcoding all DNA samples and preparation of barcoding index primers for a single use. The quality of reads was assessed based on the Phred quality score of the reads. The reads mapping fold coverage was calculated using qualimap_v2.040. We aimed to get a fold coverage of at least 50X for each genome, which was shown to be adequate for characterization of genomes41.

Draft genome assembly and identification of C. concisus pan- and core-genome

In addition to the above 27 C. concisus strains sequenced in this study, nine C. concisus genomes that are available in NCBI database were also included for analysis, of which seven genomes were from a previous study24. The accession numbers of these nine C. concisus genomes are ANNF00000000, ANNJ00000000, ANNE00000000, AENQ00000000, ANNG00000000, ANNH00000000, ANNI00000000, CP000792.1, NZ_CP012541.1. The genomes of strains 13826 and ATCC 33237 (accession numbers CP000792.1, NZ_CP012541.1) were fully sequenced and the remaining genomes were draft genomes. Thus, a total of 36 C. concisus strains were analysed in this study including 27 oral strains and nine enteric strains.

The raw reads were assembled using St. Petersburg genome assembler to obtain the draft genomes (SPAdes, Ver. 3.6.1)42 (Table 1). Gene annotation was performed using a combination of Rapid Annotations using Subsystems Technology server (RAST, Ver. 2.0) and Prokka (Ver. 1.11)43,44. The pan- and core-genome for the 36 C. concisus strains were defined by the Rapid large-scale prokaryote pan-genome analysis software (Roary, Ver. 3.5.7)45. The genome function analysis was performed as described previously46. Briefly, the protein sequences were extracted from the annotated genomes and blasted against the NCBI COG database (ver. 2014). Genes with COG assignment were then categorised in a list of functional groups.

Phylogenetic analysis based on the C. concisus core-genome, sequences of housekeeping genes, 23S and 16S rRNA genes

The phylogenetic tree based on the C. concisus core-genome was generated using Roary45. The neighbour-joining method was used to generate phylogenetic trees based on housekeeping genes, 23S rRNA genes and 16S rRNA genes of the 36 C. concisus strains examined in this study, which were performed using Molecular Evolutionary Genetic Analysis software version 6.06 (MEGA 6.06) with 1,000 bootstrap replications47. The six housekeeping genes were previously shown to be able to define C. concisus genomospecies, including aspartase A (aspA), glutamine synthetase (glnA), transketolase (tkt), aspartate semialdehyde dehydrogenase (asd), ATP synthase F1 alpha subunit (atpA) and glucose-6-isomerase (pgi)18. The sequences of housekeeping genes, 23S and 16S rRNA genes from a Campylobacter jejuni strain (GenBank accession no. NC_002163) were used as an outgroup.

Identification of genomospecies-specific genes

The annotated genes of the 36 C. concisus strains representing the two genomospecies were compared using Roary to determine candidate genes that were specific to GS1 or GS2. A GS1-specific gene refers to a gene that is present in all GS1 strains and absent in all GS2 strains analysed in this study. Similarly, a GS2-specific gene refers to a gene that is present in all GS2 strains and absent in all GS1 strains. To confirm the presence and absence of genomospecies-specific genes, the assemblies from each of the genome were searched with BLASTn (BLAST+, Ver. 2.2.31) and BLASTx (BLAST+, Ver. 2.2.31)48. To ensure the absence of genomospecies-specific genes were not due to issues with assemblies and sequencing artefacts, raw reads were mapped with Burrows-Wheeler Aligner (BWA, Ver. 0.7.12)49. Finally flanking regions of the absent genes were confirmed to be located on the same contig.

Identification of genomic islands and the putative effector proteins

Two C. concisus genomic islands containing T4SS homologous proteins were identified in this study, which were based on the comparison of the flanked genes in C. concisus strains, the presence of integrases and attachment sites, the sizes of the regions, and the presence of plasmid-associated genes. Clustal Omega was used to compare protein sequences between islands50. The effector proteins were identified by comparing the proteins in the identified genomic islands with the proteins in the T4SS secretion system effector protein database SecReT4 using WU-BLAST on default settings51.

Statistical analysis

Fisher’s exact test (two tailed) was used to compare the prevalence of CON_PiiA and CON_PiiB islands in enteric and oral C. concisus strains. Statistical analysis was performed using GraphPad Prism 6 software (San Diego, CA).

GenBank sequence submission

Raw reads of the 27 C. concisus strains sequenced in this study were submitted to Sequence Reads Archive in GenBank under the BioProject number PRJNA348396.

Additional Information

How to cite this article: Chung, H. K. L. et al. Genome analysis of Campylobacter concisus strains from patients with inflammatory bowel disease and gastroenteritis provides new insights into pathogenicity. Sci. Rep. 6, 38442; doi: 10.1038/srep38442 (2016).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.