Culture-independent metagenomics supports discovery of uncultivable bacteria within the genus Chlamydia

Advances in culture-independent methods have meant that we can more readily detect and diagnose emerging infectious disease threats in humans and animals. Metagenomics is fast becoming a popular tool for detection and characterisation of novel bacterial pathogens in their environment, and is particularly useful for obligate intracellular bacteria such as Chlamydiae that require labour-intensive culturing. We have used this tool to investigate the microbial metagenomes of Chlamydia-positive cloaca and choana samples from snakes. The microbial complexity within these anatomical sites meant that despite previous detection of chlamydial 16S rRNA sequences by single-gene broad-range PCR, only a chlamydial plasmid could be detected in all samples, and a chlamydial chromosome in one sample. Comparative genomic analysis of the latter revealed it represented a novel taxon, Ca. Chlamydia corallus, with genetic differences in regards to purine and pyrimidine metabolism. Utilising statistical methods to relate plasmid phylogeny to the phylogeny of chromosomal sequences showed that the samples also contain additional novel strains of Ca. C. corallus and two putative novel species in the genus Chlamydia. This study highlights the value of metagenomics methods for rapid novel bacterial discovery and the insights it can provide into the biology of uncultivable intracellular bacteria such as Chlamydiae.

Chlamydial genome construction from a snake choana metagenome. Given the fact that more than one bacterial genome was present in these metagenomes, a full chlamydial genome could only be recovered from a single sample (G3/2472-324), the characteristics of which are summarised in Table 2 in comparison with other chlamydial genomes. The single metagenome containing chlamydial chromosomal contigs contained 4,445 contigs over 1,000 bp, seven of which were chlamydial and divided between six predicted chromosomal contigs and one predicted plasmid contig. The combined chromosomal contigs total 1,196,452 bp in length and were predicted to encode 1,076 genes. The GC content of 39.33% was comparable to other chlamydial genomes ( Table 2). The plasmid contig was 7,522 bp, harboured the typical eight open reading frames and has a lower GC content than the chromosome (32.0%), as is expected for plasmids. The average read coverage across the chromosome and plasmid were ~110x and ~40,134x, respectively.
For the remaining four samples, despite lacking a chlamydial chromosome, a plasmid sequence could be recovered, presumably since chlamydial plasmids are found at copy numbers of two to ten times that of the  Information Table 3). To visualise the genetic relationships between this putative new chlamydial species with other members of the genus Chlamydia, a phylogenetic tree was constructed from a concatenation of the eleven gene alignments 5 . The resultant phylogenetic tree reiterates the distinct lineage formed by G3/2742-324, in a major cluster with Ca. Chlamydia sanzinia, C. pneumoniae and C. pecorum (Fig. 1). Based on the nucleotide identities of the analysed genes highlighted in the classification scheme 5 , G3/2742-324 should be classified as a novel Candidatus species within the genus Chlamydia. We propose for it the name Candidatus Chlamydia corallus (strain G3/2742-324), so named for the genus of the Amazon Basin emerald tree boa (Corallus batesii) in which it was detected. Ca. Chlamydia corallus was detected in the choana of a clinically healthy, captive snake in Switzerland.
Almost all chlamydial species, but not all strains, are known to carry a plasmid, and the nucleotide and amino acid sequences are highly conserved between species 26 . The presence of a plasmid has been suggested to contribute to the pathogenicity or tissue tropism of the chlamydial species 27,28 , and plasmid proteins are used for diagnostic targets and vaccine candidates. The chlamydial plasmid is normally organised with eight open reading frames (ORFs), encoding for genes involved in plasmid maintenance and glycogen synthesis 29 . Alignment of the plasmid sequences revealed conservation of these ORFs, with the exception of a gap in the coding region for helicase in the sequence for G6/0661-435, which resulted in a partial plasmid sequence (data not shown).   Table 3. Characteristics of chlamydial plasmids obtained from snake choana and cloaca metagenomes. 1 Possibly incomplete sequence; truncated helicase gene predicted over ends of contig. 2 Nucleotide identity (%) of near-full length 16S rRNA sequence.
Previous research has also shown that there is a co-evolution between the chromosome and plasmid sequences for the chlamydial species 26,30 , so in the absence of chlamydial chromosomal genetic data for these additional samples, we performed phylogenetic analysis on the nucleotide sequences across the plasmid in order to assess the genetic relationship of all strains obtained from the metagenomes in this study. In agreement with the tree constructed from its chromosomal loci, phylogenetic analysis revealed that Ca. Chlamydia corallus clusters with but is genetically distinct from C. pneumoniae. (Fig. 2). Plasmid sequences from G6/0661-435 and G7/2741-436 can also be found in this clade. Notably, G1/1679-8 and G2/2464-204 form two additional branches distinct from Ca. C. corallus, C. pneumoniae and Ca. C. sanzinia, sharing 84.09% of their nucleotides with each other and 72.27% to 83.51% with these three species.
As no typing scheme exists to distinguish species based on plasmid sequence analysis or phylogeny, we used linear regression analysis to assess the relationships between chromosome and plasmid pair-wise patristic distances (sum of branch lengths) within the Chlamydiaceae (Supplementary Information Table 4 and Supplementary Information Fig. 1). Based on the phylogenetic markers used in this study, at the strain level (eg. C. pneumoniae LPCoLN and N16; C. pecorum MC/MarsBar and L1), patristic distances for the chromosome are 0, while for the plasmid they are 0.01-0.02. For closely related species pairs such as C. caviae and C. felis or C. suis and C. trachomatis, chromosomal and plasmid patristic distances are 0.12-0.16 and 0.20-0.22, respectively. For more distantly related species pairs such as C. trachomatis and C. psittaci, or C. pecorum and C. muridarum, chromosomal and plasmid patristic distances are higher: 0.28-0.33 and 0.52-0.57 (Supplementary Information  Table 4). For the sequences obtained in this study, branch lengths between G6/0661-435 and G7/2741-436, and  Information Table 4, Supplementary  Information Fig. 1). On the other hand, the plasmid and extrapolated chromosomal patristic distances between G1/1679-8, G2/2464-204 and C. pneumoniae of 0.18-0.21 and 0.11-0.13, respectively, are slightly lower than those of C. caviae and C. felis, but not as close as that of C. psittaci and C. abortus, highlighting their relatedness to each other (Fig. 2). Meanwhile, their branch lengths with most other members of the genus is 0.36-0.41, which is comparable to most other pair-wise distances (Supplementary Information Table 4), thus may represent two distinct novel species.
These data also fit with initial testing results, in which G1/1679-8 and G2/2464-204 could not be definitively assigned to a species based on a Chlamydiaceae ArrayTube assay designed to detect established species 8 . G1/1679-8 was identified as a Chlamydia species, and G2/2464-204 did not yield a conclusive result (Table 3) 9 . G3/2742-324, G6/0661-435 and G7/2741-436 were designated as C. pneumoniae, which is in line with their close plasmid nucleotide identity. This suggests the assay is less specific when taxa are so closely related, but is robust enough to detect novel species.
Given the above, the phylogenetic position of G1/1679-8 and G2/2464-204 among other species and their evolutionary distances from other species and strains provide strong evidence of additional species-level diversity within the Chlamydiaceae. These data provide (a) evidence that, for some taxa, 16S rRNA sequencing is not sufficient to speciate, (b) validation of the use of genome sequencing to further investigate genetic diversity within and/or between populations, and (c) evidence for the use of plasmid sequence to assess diversity and phylogeny of novel chlamydial species for which plasmids are ubiquitous.
Genetic differences within the plasticity zone of Ca. Chlamydia corallus. In order to further characterise the genome of the novel species, Ca. Chlamydia corallus, in comparison to other chlamydial species, the plasticity zone (PZ) region was analysed. The plasticity zone is a unique region within the Chlamydia genome that has been associated with host adaptation for some chlamydial species 6,31 . The well-known variability between the chlamydial species within this region makes it an appropriate target for understanding the factors that might have influenced the tissue tropism of Ca. Chlamydia corallus.
The plasticity zone of Ca. Chlamydia corallus is approximately 13,700 bp in size and composed of genes required for several biochemical pathways such as Acetyl-CoA-carboxylase (accBC), purine and pyrimidine synthesis genes (guaABadd) and the MAC/perforin gene, as seen in Fig. 3. When compared with other chlamydial species, the plasticity zone harboured by Ca. Chlamydia corallus is structurally most similar to the human-isolated strain of C. pneumoniae AR39 (Fig. 3). Both species have a slightly smaller plasticity zone than other species, due to the absence of any cytotoxin, which is present in C. psittaci and duplicated in C. pecorum 32,33 . The main difference between the PZs of Ca. C. corallus and C. pneumoniae appears to be fragmented or truncated hypothetical proteins in AR39, and the absence of a putative lipoprotein in Ca. Chlamydia corallus, which is present in both strains of C. pneumoniae (Fig. 3) 34 .
Compared to the other known chlamydial species infecting snakes, the plasticity zone of Ca. Chlamydia corallus was genetically variable from Ca. Chlamydia sanzinia (Fig. 3). For instance, the MAC/perforin complex gene was not detected in the plasticity zone of Ca. Chlamydia sanzinia 8 . The function of the MAC/perforin gene in the chlamydial species is unknown, but has been suggested to contribute to the pathogenesis of these species 35 . Additional differences in the PZs of Ca. Chlamydia sanzinia and Ca. Chlamydia corallus lie in the purine ribonucleotide biosynthesis pathways, as highlighted in Fig. 3. The purine ribonucleotide biosynthesis (guaABadd) cluster, detected in Ca. Chlamydia corallus, plays a critical role in both de novo and salvage pathways for purine synthesis in prokaryotes 36 . This cluster is present in some chlamydial species 33,34,37 , but was never detected in the other recently described snake chlamydia, Ca. Chlamydia sanzinia 8 . Chlamydial species that do not encode for this gene cluster are most likely able to synthesise purines through alternative pathways 38 . Its absence in other bacterial species, such as Helicobacter pylori, has been found to have an effect on their rate of growth 36 . The absence of the guaABadd genes in several of the chlamydial species, however, indicates that these genes are not needed by the chlamydial species; its absence in Ca. Chlamydia sanzinia also suggests that these genes are not necessary for the chlamydial species to establish infection in snakes 8,37 .
No tryptophan operon (trpAB) was detected in the plasticity zone or other genomic regions of Ca. C. corallus. Tryptophan is a necessary amino acid for chlamydial growth 39 , however, host cell defence mechanisms against chlamydial infections exist in which interferon gamma (IFN-γ) production depletes intracellular tryptophan stores 40 . Certain strains of C. trachomatis encode for an intact tryptophan operon (trpAB), which is absent or incomplete in other chlamydial species 37 , suggesting that not all chlamydial species are able to synthesise tryptophan. For urogenital strains of C. trachomatis, the vaginal microflora is believed to provide indole, allowing for synthesis of tryptophan 39 . The absence of a tryptophan operon would suggest that Ca. C. corallus either has alternative pathways for synthesising tryptophan or is possibly completely auxotrophic for tryptophan. Notably, neither Ca. C. corallus nor Ca. C. sanzinia encode for an aromatic amino acid hydroxylase, which has been suggested to contribute to tryptophan metabolism in the absence of trpAB. As has been suggested for C. trachomatis 39 , the diverse microflora in these snakes may provide nutrients or substrates for chlamydial synthesis of amino acids. Metagenomic mining revealed several tryptophan synthesis pathway or rescue genes encoded by the bacteria present in the cloaca and choana samples, for example, tryptophan synthase, tryptophanase and indole-3-glycerol phosphatase were detected among the samples, encoded by Achromobacter sp., Serratia marcescens. Clostridium sp., Salmonella enterica and Chitinophagaceae. Previous studies also describe the presence of indole-producing bacteria at these sites [21][22][23] .
In the current study, we have used culture-independent metagenome analysis to investigate the microbial metagenome of snake choana and cloaca samples. In doing so, we have shown that this method provides a wealth of biological information for novel species discovery through microbial community profiling, and have described the presence of highly abundant bacterial species at these sites, some of which have not previously been described. The animal and public health implications of these findings are unknown, but the repeated observations of human pathogens in the microflora of snakes 21,22 and other reptiles warrants further investigation. The metagenomic method used is particularly useful for characterising novel species or strains with no reference genome such as novel uncultivable bacteria (eg. members of the phylum Chlamydiae). The complexity within these anatomical sites meant that despite previous detection of chlamydial 16S rRNA sequences by PCR, only a chlamydial plasmid could be detected in all samples, and a single chlamydial chromosome. Nonetheless, comparative analysis of the novel chlamydial species with other Chlamydia sp. revealed genetic differences in regards to purine and pyrimidine metabolism. The detection of chlamydial plasmids in all samples, which was only possible using this method, highlights additional diversity within the Chlamydiae, which appears to be a growing trend with increased breadth and depth of sampling and advances in molecular techniques. Further studies, such as metatranscriptomic analysis would better elucidate the complex role of the microbiota on chlamydial pathogenesis and vice versa.

Materials and Methods
Sample preparation. Suspected novel genotypes (n = 5) of C. pneumoniae were recently detected in collections of captive snakes in Switzerland 8 . Clinical swabs were taken from either the cloana or choana of clinically healthy snakes, and DNA was extracted as previously described 8 . All samples were subjected to host methylated DNA depletion prior to multiple displacement amplification, as previously described 9 .
Ethics approval and consent to participate. The collection and molecular analysis of the snake samples was approved and performed in accordance with the relevant guidelines and regulations of the Veterinary Office of Canton Zurich (authorization no. ZH010/15).
Metagenome assembly and analysis. Deep 42 . Each assembly was assessed through QUAST 43 . To obtain chlamydial contigs from the assembled metagenome, contigs were subject to BLAST analysis against an in-house chlamydial genome database, and subsequent analysis against the NCBI nucleotide database. Contigs with hits against chlamydial sequences were automatically annotated using RAST 44 and manually curated in Artemis 45 .
Metaxa was employed initially to assess the species richness within the resulting metagenomes, detecting ribosomal RNA subunits of various origins 46 . MaxBin was used to construct partial or complete draft genomes for the microbial species detected in the samples and determine genome completeness for each assembly 47 .
Burrows-Wheeler aligner, SAMtools and BEDtools were used to map reads and assess read coverage across the various metagenomic components 48-50 . Phylogenetic analysis. The genetic relationships of the novel species described in this study to other chlamydial species was assessed using the classification system published by Pillonel et al. 5 . Individual genes were extracted from the assembled genome and established chlamydial species, including Simkania negevensis as an out group. Extracted genes were concatenated and aligned using MAFFT 51 and a phylogenetic tree based on the resulting alignment was constructed using FastTree 52 ; both were performed in Geneious v7.1 53 .
Plasmid phylogeny was performed based on the alignment of nucleotide sequences using MAFFT 51 and tree construction using FastTree 52 . In order to include plasmid sequences from all possible species, nucleotide sequences were re-ordered and large gaps were removed so that each resulting plasmid sequence was 5,522-6,170 bp, thus comparable to C. suis plasmid which lacks the parA and pgp-6 genes.
Data availability. The metagenomic sequence data obtained for the Ca. Chlamydia corallus chromosome and plasmid was deposited in Genbank under accession numbers as part of bioproject PRJNA312988.