The phylum Chlamydiae consists of obligate intracellular bacteria including major human pathogens and diverse environmental representatives. Here we investigated the Rhabdochlamydiaceae, which is predicted to be the largest and most diverse chlamydial family, with the few described members known to infect arthropod hosts. Using published 16 S rRNA gene sequence data we identified at least 388 genus-level lineages containing about 14 051 putative species within this family. We show that rhabdochlamydiae are mainly found in freshwater and soil environments, suggesting the existence of diverse, yet unknown hosts. Next, we used a comprehensive genome dataset including metagenome assembled genomes classified as members of the family Rhabdochlamydiaceae, and we added novel complete genome sequences of Rhabdochlamydia porcellionis infecting the woodlouse Porcellio scaber, and of ‘Candidatus R. oedothoracis’ associated with the linyphiid dwarf spider Oedothorax gibbosus. Comparative analysis of basic genome features and gene content with reference genomes of well-studied chlamydial families with known host ranges, namely Parachlamydiaceae (protist hosts) and Chlamydiaceae (human and other vertebrate hosts) suggested distinct niches for members of the Rhabdochlamydiaceae. We propose that members of the family represent intermediate stages of adaptation of chlamydiae from protists to vertebrate hosts. Within the genus Rhabdochlamydia, pronounced genome size reduction could be observed (1.49–1.93 Mb). The abundance and genomic distribution of transposases suggests transposable element expansion and subsequent gene inactivation as a mechanism of genome streamlining during adaptation to new hosts. This type of genome reduction has never been described before for any member of the phylum Chlamydiae. This study provides new insights into the molecular ecology, genomic diversity, and evolution of representatives of one of the most divergent chlamydial families.
The phylum Chlamydiae was originally regarded as a small group of obligate intracellular bacteria infecting humans and few animal species . Today, the chlamydiae are known to be associated with a broad spectrum of host organisms including protists, arthropods, and diverse vertebrates [2,3,4,5,6]. Some of those may also infect mammalian cells and have thus been proposed to represent emerging human pathogens [7,8,9]. While cultured representatives of only six families are available to date, molecular surveys suggest that a large undiscovered diversity exists, with over one thousand family-level lineages in various environments worldwide [6, 10].
All chlamydiae share a common ancestor that has lived around one billion years ago, and there is evidence that the emergence of their unique and strictly intracellular lifestyle dates back to these Precambrian times [11,12,13]. The characteristic biphasic developmental cycle of characterized representatives consists of the infective elementary bodies (EBs) that enter eukaryotic host cells and transform into replicative reticulate bodies (RBs). Inside the host cells, chlamydiae stay in host-derived vacuoles termed inclusions. Eventually, RBs differentiate back to EBs, exit the host cell either by lysis or extrusion and start a new infection cycle .
Genomics has helped to gain fundamental insights into chlamydial biology, host adaptation, and evolution. Chlamydiae generally have small, reduced genomes, and lack metabolic pathways that are complemented by importing host cell metabolites . Despite recent advances in genetic manipulation of members of the well-studied family Chlamydiaceae, like Chlamydia trachomatis [16, 17], genomics remains of utmost importance to study the more elusive chlamydiae found in the environment, collectively referred to as environmental chlamydiae. For instance, genomics revealed that the chlamydial developmental cycle, including major virulence mechanisms such as the type III secretion system, is well conserved also among the environmental representatives [3, 11, 12, 18]. Yet, the genetic repertoire of environmental chlamydiae is generally more versatile than that of Chlamydiaceae, including more complete metabolic pathways and richer arsenals of predicted effector proteins to interact with their evolutionary distinct eukaryotic host cells [19,20,21]. More recently, single cell genomics and large-scale metagenomics revealed a surprising biological variability of environmental chlamydiae, including evidence for motility and a widespread potential for anaerobic metabolism [6, 22,23,24].
The family Rhabdochlamydiaceae is putatively one of the largest and most diverse - yet poorly studied clades - within the phylum Chlamydiae . The only known hosts of rhabdochlamydiae are arthropods including ticks, spiders, cockroaches, and woodlice [25,26,27,28]. An infection with rhabdochlamydiae was reported to be detrimental for cockroaches and woodlice, leading to severe abdominal swelling [26, 29] or heavy tissue damage  in the respective host. However, the prevalence was reported to be low, accounting to 1% on average for ticks , and to 15% on average for woodlice . Although rhabdochlamydiae are potentially important members of the phylum, so far there is only one described draft genome sequence available from ‘Candidatus Rhabdochlamydia helvetica’ (hereafter R. helvetica) . Here we add the complete genome sequences of Rhabdochlamydia porcellionis  and the new species ‘Candidatus Rhabdochlamydia oedothoracis’ (hereafter R. oedothoracis) , and we use a collection of metagenome assembled genomes (MAGs) to investigate the biology and evolution of members of the Rhabdochlamydiaceae. We provide evidence for a large, yet undiscovered diversity of rhabdochlamydiae especially in freshwater and soil ecosystems. We show that their genomic setup suggests a host spectrum beyond arthropods and identified transposable elements as drivers of genome size reduction during host adaptation.
Results and discussion
Rhabdochlamydiae thrive in soil and freshwater environments
Previous analyses of metagenomic and 16 S ribosomal RNA gene-based surveys predicted the Rhabdochlamydiaceae as one of the most diverse families within the phylum Chlamydiae [6, 10]. Since then, available sequencing data increased manifold, e.g., by one order of magnitude in the publicly available high throughput sequencing repository SRA, from ~1000 TB in 2014 to ~10,000 TB in 2020 (trace.ncbi.nlm.nih.gov/Traces/sra/). To get an up-to-date overview we screened the SRA for 16 S rRNA gene sequences using the database IMNGS . This analysis suggested that the family Rhabdochlamydiaceae consists of at least 388 genus-level lineages and 14,051 species-level operational taxonomic units (OTUs; clustered using a sequence similarity threshold of 95 and 99%, respectively). We calculated this lower bound estimate using only sequences covering the V3-V4 region of the 16 S rRNA gene as this is the most well-covered region in our dataset, comprising about 72% of all sequences. Considering also other variable regions would likely result in an overestimation of OTUs as two sequences spanning different regions of the same 16 S rRNA gene would appear as two separate genus-level OTUs in this analysis (see Materials and Methods). Compared to the few Rhabdochlamydiaceae full-length sequences reported to date, these estimates predict a staggering high natural diversity for members of this chlamydial family. A prime example lending support for this finding is a recent study investigating fecal microbiota from more than 400 insectivorous barn swallows during two breeding seasons . The rhabdochlamydial 16 S rRNA gene sequences detected in this longitudinal study alone contribute to 80 different genus-level lineages. The placement of representative sequences of all putative genus-level OTUs into a reference tree consisting of chlamydial full-length sequences illustrated the broad diversity of the Rhabdochlamydiaceae and showed that the predicted OTUs indeed span the entire family clade, including lineages both closely related and distant to previously recognized members (Fig. 1).
Although all known representatives of the family Rhabdochlamydiaceae are associated with arthropod hosts [25, 26, 28], our data show that most OTUs originate from soil (43%) and freshwater (33%) samples, suggesting the presence of additional, yet unknown hosts (Fig. 1). Protists are abundant and important members of microbial communities in those environments [34, 35] and might thus serve as hosts for many of these lineages. Consistent with this, only 5% of all identified rhabdochlamydial OTUs were detected in animal microbiomes from molluscs, birds, fish, and diverse mammals, and categorized as host-associated in our analysis (Fig. 1). Most of these sequences, however, originate from feces or gut samples, and it is thus conceivable that rhabdochlamydiae are taken up with food and do not represent active infections. In fact, there is no general discernible pattern or pronounced correlation of phylogeny and relationship with environmental origins or putative host taxa in our dataset. We still noted a monophyletic group comprising all known arthropod associated Rhabdochlamydiaceae, i.e., the three described Rhabdochlamydia species. This clade contains in addition 107 genus-level lineages found in diverse environments, including many detected in feces from insectivorous birds (Fig. 1). Taken together, our data suggests that while there is evidence for yet unknown environmental hosts, diverse animals may serve as either transient hosts or simply act as vectors for distributing rhabdochlamydiae through food uptake and excrements.
Genome features and gene content distinguish rhabdochlamydiae from other chlamydial families
To learn more about members of the Rhabdochlamydiaceae, we next collected all available whole genome sequences and high quality MAGs (n = 9; see Material and Methods; Table S1) and compared them to the most well-studied chlamydial families with known hosts, namely the Chlamydiaceae (human and other vertebrate hosts)  and the Parachlamydiaceae (amoeba hosts) . We did not consider members of other described families due to the limited number of genome sequences and the lack of knowledge about their natural hosts, respectively. In addition, we determined the complete genome sequences of two rhabdochlamydiae from arthropod hosts: R. porcellionis infecting the woodlouse Porcellio scaber , and the new species R. oedothoracis, which is associated with the linyphiid dwarf spider Oedothorax gibbosus  (Table S2; for a formal candidatus species description see Text S1). In order to compare the different chlamydial families, we first clustered all genes into orthologous groups (OG) representing gene families [37, 38]. Next, we compared all genomes based on their gene content, i.e., abundance of gene families (Fig. 2A, B). This analysis confirmed previous observations that the human and animal pathogens of the Chlamydiaceae clearly differ from the amoeba symbionts of the Parachlamydiaceae with respect to their genetic repertoire . Further, the number of genes shared within a chlamydial family is generally higher than the number of genes shared by the whole phylum . These conserved family-specific genomic backbones have been interpreted to reflect adaptations to different niches or, as chlamydiae are obligate intracellular bacteria, to different hosts. Notably, our gene content analysis revealed that members of the Rhabdochlamydiaceae are clearly distinct from the Chlamydiaceae and the Parachlamydiaceae for both highly conserved (Fig. 2A) and chlamydiae-specific gene families (Fig. 2B). The different genome composition is also reflected in the degree of genome reduction, with rhabdochlamydiae showing intermediate genome sizes compared to the Chlamydiaceae and the Parachlamydiaceae (Fig. 2C). Together, this suggests that the Rhabdochlamydiaceae have a different niche, for instance a different host range, in comparison to the other well-studied chlamydial families.
In many host-associated bacteria there is a correlation between genome size and GC content [39, 40], with smaller genomes tending to have a lower GC content. However, this does not seem to apply to chlamydiae [18, 40], suggesting evolutionary forces other than relaxed selection and genetic drift shaping the genomic GC content of members of this phylum. Several other factors are known to drive the base composition in bacteria including environmental conditions and niche-specialization [41, 42]. Within the family Rhabdochlamydiaceae we observe a clear divide with respect to the genomic GC content, with known arthropod-associated rhabdochlamydiae (i.e., the members of the genus Rhabdochlamydia: R. helvetica, R. oedothoracis, R. porcellionis) differing pronouncedly from those of other rhabdochlamydiae (35.4–36.2% vs. 42.3–45.2% on average; Figs. 2C and S2). This might indicate that the latter thrive in a different niche, i.e., are associated with hosts other than arthropods. However, more Rhabdochlamydiaceae genome sequences from arthropod hosts are needed to corroborate these observations.
The Rhabdochlamydiaceae pangenome
To explore the genomic setup of the Rhabdochlamydiaceae in more detail we conducted a pangenome analysis. The pangenome describes all genes in a certain group of organisms and consists of genes present in all individuals in that group, the core genome, and genes that are specific for only some of them, referred to as accessory genome . For this analysis we selected all nine Rhabdochlamydiaceae genomes from our dataset (Table S1; Fig. S3). The family pangenome comprises 5178 OGs of which 665 are present in >90% of all genomes, representing the core genome. This includes almost all of the genes constituting the chlamydial core genome [18, 24], such as the type III secretion system , nucleotide transport proteins (Ntt1/Ntt2) , the master regulator of the chlamydial developmental cycle (Euo)  as well as major effector proteins (CopN, Pkn5) [47, 48] that interfere with host cellular pathways. Further, glycogen metabolism is conserved among all Rhabdochlamydiaceae, this is consistent with the importance of glycogen as storage compound for many known chlamydiae .
The accessory genome contains lineage-specific genes representing adaptations to different niches [18, 24, 43, 50]. In general, the arthropod-associated Rhabdochlamydia species tend to have smaller accessory genomes (246–395 genes) than other members of the family Rhabdochlamydiaceae (366–588 genes) with unknown hosts (Wilcoxon rank sum test; p = 0.05) (Fig. S4). When we grouped the accessory genes into functional categories inferred from annotations in the eggNOG database, we could not recognize clear differences between the individual genomes (Fig. S5). However, among the gene families differentiating known arthropod-associated rhabdochlamydiae from other Rhabdochlamydiaceae, i.e., those gene families that are unique to or completely missing in the genus Rhabdochlamydia, we found several genes associated with cell wall or membrane biosynthesis (Table S3). Whether any of these are related to the rod-shaped EBs and the characteristic five-layered cell envelope of arthropod-associated Rhabdochlamydia species remains to be determined [29, 30, 51].
The genus Rhabdochlamydia
Next, we further focused on the genus Rhabdochlamydia, which represents the best studied clade in the family, because: (i) it includes the only cultured representatives of the Rhabdochlamydiaceae, (ii) the hosts of all three described species are known, and (iii) its members are represented by two closed and one high-quality draft genome, including one plasmid each. Calculation of the genome-wide average amino acid identity (AAI) confirmed their classification into a single genus (AAI > 80%; Fig. S6; ). The Rhabdochlamydia genus pangenome comprises 1875 OGs, where most of them belong to the core genome (1007, 54%). The sizes of the accessory genomes vary between the species and correlate with genome size (Fig. 3A). Between 21% and 37% of the accessory genomes mapped to known gene families in the eggNOG database; the larger proportion consists of orphan genes and genes with remote homology to genes of unknown function.
We noted, however, that the genomes of R. helvetica and R. porcellionis include a complete pathway for the de novo synthesis of polyamines. Polyamines play an important role in virulence and response to various stressors [53,54,55]. The complete pathway is an unusual feature of chlamydial genomes , seems incomplete or absent in other rhabdochlamydial genomes, and is also absent in the closest cultured relative outside the Rhabdochlamydiaceae, Simkania negevensis.
All members of the genus Rhabdochlamydia carry a large plasmid between 20 and 39 kB in size (Fig. 3A). Plasmids are small DNA molecules replicating independently from the chromosome, known to mediate horizontal gene transfer, and are considered important for the adaptation to different environments . Plasmids have been identified as important drivers of genome evolution in the phylum Chlamydiae [57, 58], and the highly conserved Chlamydiaceae plasmid is recognized for its role in virulence in human and animal hosts [57,58,59]. In total, Rhabdochlamydia plasmids encode 83 proteins that belong to 39 different gene families. More than half of the gene families have representatives on at least one other Rhabdochlamydia chromosome or plasmid. This indicates a high degree of gene flow between chromosomes and plasmids, an observation also described for other chlamydial plasmids . All Rhabdochlamydia plasmids contain genes considered to be important for plasmid maintenance in the Chlamydiaceae, such as Pgp2, plasmid partitioning protein ParA, and the integrase Pgp8 . Interestingly, the Rhabdochlamydia plasmids encode major outer membrane (MOMP)-like proteins, in addition to the respective chromosomal copies. MOMPs are highly conserved among chlamydiae. They function as porins and adhesins and are prominently recognized by the host immune system in members of the Chlamydiaceae . The MOMP-like proteins of Rhabdochlamydia show little to no similarity with the canonical MOMPs of the Chlamydiaceae. However, they belong to a large number of orthologs also found in the only distantly related chlamydiae Waddlia chondrophila and S. negevensis, with yet unknown function [18, 61, 62].
To more systematically compare the Rhabdochlamydia accessory gene sets, we performed an enrichment analysis taking into account functional category annotations from eggNOG (Fig. 3B). The R. oedothoracis accessory genome is enriched in the category “replication, recombination and repair” (FDR adjusted p < 0.001), which includes transposases and genes for their maintenance. R. helvetica on the other hand is enriched in several categories and includes a large number of genes with unknown function (FDR adjusted p < 0.001) (Fig. 3B). In addition, the accessory genomes of R. oedothoracis and R. helvetica include a range of functions that are linked to communication with the environment like defense mechanisms and cell wall/membrane/envelope biogenesis that are missing in R. porcellionis. Together with the smaller genome size of R. porcellionis, this indicates a prolonged association with the woodlouse host and may reflect an adaptation to the limited competition with other bacteria in the hepatopancreas - the target organ of infection [25, 63].
Insertion sequences as key players in genome reduction
Reduced genomes are a hallmark of all chlamydiae [6, 18]. Yet, the evolutionary trajectories leading to their streamlined and highly specialized genomes are poorly understood. Members of the genus Rhabdochlamydia with their differences in genome size might offer an interesting perspective to learn more about the process of genome size reduction and host adaptation in these bacteria. To this end, one of the most striking differences between the known Rhabdochlamydia genomes is the presence of a high number of transposases in R. oedothoracis and its mere absence in the smallest genome of R. porcellionis.
Transposases are indicative of transposable elements (TEs), which in their simplest version as insertion sequences (ISs) contain only a transposase gene flanked by inverted repeats . There are several reports of ISs being associated with genome reduction in beneficial bacterial symbionts [65,66,67,68,69,70,71]. In most of these cases, the symbionts were recently acquired from a free-living stage. During the adaptation to the host and the intracellular environment, the symbionts accumulate ISs in their genomes . The ISs may interrupt genes that then accumulate mutations, especially deletions, as a consequence of relaxed selection, which ultimately leads to a reduction of genome size . Here, we suggest that a similar process drove the evolution of Rhabdochlamydia genomes.
As ISs are known to cause breaks in genome assemblies and are often not properly annotated by automated tools , we limited our in-depth analysis to the closed genomes of R. oedothoracis and R. porcellionis, and we manually curated transposase annotations (Fig. 4B; see Methods for details). In total, we could identify 415 transposase genes in R. oedothoracis and only 20 in R. porcellionis. Apart from 129 transposases in R. oedothoracis, most of those do not appear to be functional; they are either truncated, contain premature stop codons, or are interrupted by other transposases (Table S4). Notably, (functional) transposases are also encoded on the plasmid of R. oedothoracis yet absent on other rhabdochlamydial plasmids (Table S4). It was shown previously that plasmids need to exceed a certain minimum size (~20 kB) to be able to carry TEs . This threshold would explain the absence of transposases on the plasmids of R. porcellionis. The presence of representatives of the most abundant chromosomal transposase families on the plasmid of R. oedothoracis, however, may suggest a role of the plasmid in IS expansion. The higher copy number of plasmids and their replication independent of the chromosome  might support the proliferation of ISs.
Apart from a high number of TEs, increased pseudogenization is indicative for genomes under degradation . We therefore used pseudofinder  to identify genes under relaxed selection in the genome of R. oedothoracis by comparing it to R. porcellionis. This approach assumes that due to its small size and the low number of transposases, most genes are under purifying selection in the reference genome of R. porcellionis. In total, 276 R. oedothoracis genes were marked as cryptic pseudogenes i.e., genes that are structurally intact but likely experience relaxed selection (dN/dS ratios >= 0.3) . A broad range of functions is affected by this ongoing pseudogenization, including diverse metabolic pathways, as well as genes involved in replication and regulation (Fig. 4C).
Taken together, with its small size and the low number of transposases, the genome of R. porcellionis is the most streamlined genome in the genus, suggesting an ancient association with its P. scaber host. In contrast, there is still evidence for the process of genome reduction in the case of R. oedothoracis, given the high number of (functional) transposases and pseudogenes, possibly as a consequence of a relatively recent host switch. Notably, the distribution of transposases in the R. oedothoracis genome correlates with positions where the synteny of the two genomes is disrupted (Fig. 4A, D). This further illustrates the putative role of ISs in genome rearrangements and genome size reduction in Rhabdochlamydia. Consistent with this, there is further evidence for a nascent stage of genome reduction in R. oedothoracis: Although the GC content of the rearrangement regions generally matches that of the surrounding regions, the characteristic asymmetrical pattern of circular chromosomes in cumulative GC skew analyses  is less pronounced (Fig. 4A).
To learn more about the origin of the transposases present in the genome of R. oedothoracis, we performed phylogenetic analyses for the three most abundant transposase families with functional representatives in R. oedothoracis. Surprisingly, all investigated transposases showed a phylogenetic relatedness to transposases found in other chlamydiae (Fig. S7), suggesting the existence of an ancient pool of transposases in chlamydial ancestors and sequential loss in several lineages.
A scenario for the evolution of the genus Rhabdochlamydia
Our observations regarding diversity, environmental distribution, and genomics of members of the family Rhabdochlamydiaceae provide clues about genome evolution and the adaptation of chlamydiae from symbionts of unicellular eukaryotes to animal hosts.
We show that members of the Rhabdochlamydiaceae are highly diverse, occur in different environments and mostly lack a clear association with animal hosts (Fig. 1). This suggests that the majority of rhabdochlamydiae infect other, yet unknown and likely unicellular hosts. Surprisingly, however, members of the Rhabdochlamydiaceae differ pronouncedly in their genetic make-up and genome size from recognized chlamydial symbionts of heterotrophic amoeba (Fig. 2A–C). Yet, there is a wide range of protists with very different lifestyles e.g., phototrophic, or saprotrophic protists feeding on decaying organic matter, that could serve as natural hosts for rhabdochlamydiae . According to the “melting pot” hypothesis, symbionts in amoebae retain larger genomes than closely related bacteria infecting animals as there is a high level of competition and possibilities for gene acquisition by lateral gene transfer in amoebae that feed on complex microbial communities [76, 77]. In phototrophic or saprotrophic protists, the competition and interaction with other bacteria would be much lower, leading to smaller genome sizes and differences in the genetic repertoire as seen for the Rhabdochlamydiaceae (Fig. 2A–C). We thus suggest that members of the family include widespread symbionts of protist hosts different from the phagotrophic, free-living amoeba recognized so far as hosts for other chlamydiae. Of note, there is recent evidence for diverse chlamydial symbionts including rhabdochlamydiae in the cellular slime mold Dictyostelium discoideum .
Within the family Rhabdochlamydiaceae, the similar GC content, a large core genome and shared membrane features distinguish the genus Rhabdochlamydia from all other members (Fig. 2A, C). This is consistent with them sharing a similar niche in arthropod hosts and putatively originating from rhabdochlamydiae thriving in environmental protists (Fig. 1). By infecting hosts equipped with an innate immune response, members of the genus Rhabdochlamydia might represent an intermediate step towards adaptation of chlamydiae to vertebrate animals with adaptive immunity. In this scenario, food or water would be a conceivable entry route for the uptake of protist-associated rhabdochlamydiae by arthropod hosts. We suggest that the subsequent transition and adaptation to arthropod hosts triggered genomic changes in the last common ancestor of Rhabdochlamydia species, resulting in reduced and specialized genomes of extant members of the genus. This process was putatively facilitated by IS expansion, inactivating genes under relaxed selection and eventually leading to genome size reduction (Fig. 5). Genome reduction mediated by transposable elements is common in inherited, vertically transmitted beneficial symbionts [41, 67]. To our knowledge, such a scenario has not yet been described for horizontally transmitted intracellular bacteria representing commensals or pathogens as it is the case for members of the phylum Chlamydiae. The extent of genome streamlining might be dependent on the arthropod hosts, the site of infection and the extent of competition with other microbes. The digestive glands of P. scaber, the target organ of R. porcellionis, for example, harbors only a few other bacteria . The same is true for the hindgut of the spider host of R. oedothoracis . The tick Ixodes ricinus, on the other hand, contains a diverse microbiome, creating a more competitive environment for R. helvetica and opportunities for genetic exchange .
In conclusion, we have demonstrated that Rhabdochlamydiaceae are distributed globally and represent a major, yet heavily underexplored chlamydial group. We show that they provide opportunities to study adaptation and genome evolution of chlamydiae during the transition from protist to animal hosts. We have identified transposable elements as an important factor underlying genome size reduction in the phylum Chlamydiae, and we propose a scenario for the adaptation of Rhabdochlamydia species to their arthropod hosts. A limitation of our study is the low number of available high-quality Rhabdochlamydia genome sequences. Sequencing more arthropod-associated chlamydiae is needed to verify the evolutionary scenario proposed here. Further, the in-depth analysis of members of the family Rhabdochlamydiaceae is hampered by the dramatic lack of cultured representatives and information about host organisms. Future efforts targeting understudied protist taxa and recovering symbionts together with their hosts from complex environmental samples might help to overcome this. Taken together, the current study provides a comprehensive framework for investigating the ecology and evolution of one of the most widespread lineages within the phylum Chlamydiae.
Materials and methods
16S rRNA gene phylogeny
We downloaded all available unique near-full length 16 S rRNA gene sequences of chlamydiae (n = 233) and other Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) members (n = 205) from SILVA v138 SSU Ref NR 99  and added 78 16 S rRNA genes from published chlamydial genomes from RefSeq  and GenBank . In addition, we added 79 near-full length chlamydial sequences from Schulz et al. . We dereplicated the sequences at 99%-identity using USEARCH (v11)  with “–cluster_smallmem” and aligned the clustered sequences with SINA . Afterwards, we trimmed the alignment with trimAl (v1.4.15)  “–noallgaps” and removed the highly variable positions using noisy (v1.5.12) . The phylogenetic tree was then calculated with IQ-TREE (v1.6.2) . Model testing was performed with “–m TESTNEW” (Best model: SYM + R9), and initial support values were inferred from 1000 non-parametric bootstraps using “–bb 1000”. The final tree was edited and visualized using iTOL (v4) .
16S rRNA gene-based diversity and environmental distribution
We queried the IMNGS database, which is a collection of pre-clustered NCBI SRA sequencing data  on 09 June 2020 for 16 S rRNA genes with at least 90% identity to the reference 16S rRNA sequence of R. porcellionis 15 C. We removed singletons, only kept sequences >400 bp, and removed duplicates and sequences with ambiguous bases using mothur (v.1.42.3). 16 S rRNA genes were aligned to SILVA Ref NR 99 SSU (v138)  using mothur (v.1.42.3) , and the alignment was trimmed with trimAl (v1.4.15)  using the “–noallgaps” parameter. Afterwards, we clustered the sequences in OTUs using USEARCH (v11.0.667)  “–cluster_otus” to reduce redundancy, and finally on 95% sequence identity level using “–cluster_smallmem”. In order to belong to one cluster, sequences were required to overlap to 90% (“–query_cov 0.9”). Centroid sequences were aligned to the 16 S rRNA full-length alignment using MAFFT (v7.427) (“–addfragments”) , and variable positions were removed using trimAl (“–selectcols”) (v1.4.15) . We then placed the centroids to the 16 S rRNA full-length reference tree using EPA-ng (v0.2.1)  (model: SYM + R9), and manually selected all centroids that were placed in the family Rhabdochlamydiaceae. This step significantly reduced the number of centroids from 2162 to 938. For the final tree, only rhabdochlamydiae centroids were placed into the 16 S rRNA full-length tree. We selected only sequences covering the V3–V4 region of the 16 S rRNA gene as considering also other variable regions would likely result in an overestimation of OTUs as two sequences spanning different regions of the same 16 S rRNA gene would appear as two separate genus-level OTUs in this analysis. When considering also those sequences covering other 16 S rRNA gene regions, we retrieved an additional 550 genus-level OTU candidates (262 OTUs for V4–V5; 87 for V5–V6; 201 for V6–V8). The final tree was edited and visualized using iTOL (v4) . For the analysis of the relative abundance in the environment of rhabdochlamydiae centroids in total 14,051 sequences were analyzed. The metadata was provided by IMNGS and retrieved from the SRA. The broad categories provided by the SRA were manually curated and each rhabdochlamydiae sequence assigned to one of the following categories: freshwater, freshwater-sediment, marine, plant-associated, soil, and host-associated. The sequences assigned to host-associated were further categorized based on the organisms they originated from. Sequences that originated from gut or stool samples were also classified as host associated. In total, 670 sequences were assigned to the category host-associated, 4515 to freshwater, 141 to freshwater-sediment, 6002 to soil, 1714 to plant-associated, 194 to marine and 815 to engineered. The bar charts were created by counting the total number of sequences represented by a centroid and calculating the relative abundances for each category.
Genome sequencing and assembly - R. porcellionis 15C
R. porcellionis 15 C was cultivated in Sf9 insect cells (Spodoptera frugiperda) as described in Sixt et al. . For DNA isolation we harvested Sf9 cells infected with R. porcellionis 15 C and lysed the host cells with lysis buffer (1 M Tris-HCl, 0.5 M EDTA, 5 M NaCl, SDS, Proteinase K). Afterwards, the host DNA was digested using DNase I (1 U/μL, Thermo Scientific; Thermo Fisher Scientific; Waltham, MA, USA). Bacterial gDNA isolation was carried out using the DNeasy Blood and Tissue Kit (Qiagen; Hilden, Germany). To remove remaining RNA, we treated the isolated gDNA with RNAse A (10 mg/mL, Thermo Scientific; Thermo Fisher Scientific; Waltham, MA, USA). Finally, we checked the quality of the gDNA using Qubit4 (Invitrogen; Thermo Fisher Scientific; Waltham, MA, USA) and the dsDNA HS Assay Kit (Invitrogen; Thermo Fisher Scientific; Waltham, MA, USA) and Nanodrop 1000 Spectrophotometer (Thermo Fisher Scientific; Waltham, MA, USA).
Before library preparation for the long read sequencing the gDNA was measured with Nanodrop and the length of the DNA fragments was measured with a Bioanalyzer. Library preparation was done using the Ligation Sequencing Kit (Oxford Nanopore, Oxford, UK; ONT). Sequencing was performed on an Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA), using the 100-bp-paired-end sequencing mode. Additional long-read sequencing was performed using a MinION sequencer (Oxford Nanopore, Oxford, UK).
For the assembly we trimmed the Illumina reads using bbduk (v37.61) (sourceforge.net/projects/bbmap/) (“–qtrim = rl –trimq = 18 –minlen = 70”) and removed adapters and barcodes from the Nanopore reads using ONT’s qcat (“–trim”). We assembled the Illumina and Nanopore reads in a hybrid assembly using unicycler (v0.4.6) . The quality of the assembly was checked by visually inspecting the assembly graph  and checkM (v1.0.18) .
Genome sequencing and assembly - R. oedothoracis W744xW776
DNA was isolated from a single field-captured O. gibbosus individual from the Walenbos population (W815). DNA isolation and Illumina sequencing were carried out as described in Hendrickx et al. . The Illumina assembly was done using SPades (v3.9.1, “–meta”) . The contigs were then binned using mmgenome . Finally, reads were mapped to the metagenome assembled genomes (MAGs) and reassembled with SPades (v3.9.1, “–meta”) . The quality of the MAGs was checked with checkM (v1.0.6) .
Sequencing data from the offspring of O. gibbosus individual W744 and W776 (Walenbos population) were obtained from Hendrickx et al. 2021 . Contigs were classified using a custom Kraken (v2.0.8)  database including reference libraries for archaea, bacteria, viruses, protists, humans, fungi, and plants as well as MAG W815, and all reads classified as Rhabdochlamydiaceae were collected (MAG W744xW776).
PacBio reads were mapped to MAG W815 and W744xW776, respectively using minimap2 (v2.17) . As the coverage of the PacBio data was too high, the mapped reads were subsampled to a coverage of 70x. Finally, the reads mapped to MAG W815 and W744xW776 were merged, and duplicates were removed. Illumina reads were mapped to MAG W815 and W744xW776 using bbmap (v37.61) (sourceforge.net/projects/bbmap/) and merged and deduplicated afterwards. The final sets of Illumina and PacBio reads were then used for a hybrid assembly using unicycler (v0.4.6) . The quality of the assembly was checked by visually inspecting the assembly graph  and checkM (v1.0.18) .
Dataset compilation, quality control, and annotation
We downloaded 36 chlamydial reference genomes from GenBank/ENA/DDBJ  and RefSeq  on 25 June 2019 and added nine high-quality MAGs from the Genomes of the Earth’s Microbiome initiative . Only genomes with a completeness >94% and containing neither detectable strain heterogeneity nor contaminations were used, resulting in nine genome sequences and MAGs from the Rhabdochlamydiaceae, 17 Chlamydiaceae, and 19 Parachlamydiaceae genomes (Table S1). The quality of the genomes was checked using checkM (v1.1.3, “‘taxonomy_wf domain Bacteria”) , and basic statistics were calculated using QUAST (v5.0.2) . Initial gene calling and annotation was performed with prokka (v1.14.6, “–mincontiglen 200”, “–gram neg”) .
The assembled genomes from R. porcellionis 15 C and R. oedothoracis were annotated using prokka (v1.14.6) . In addition, RNAs were annotated using the Rfam database  and cmscan (v1.1.3, “–cut_tc”, “–mid”)  and tRNAscan-SE (v2.0.5) . The origin of replication was determined using the OriginX (v1.0) software . Transposases were manually annotated by searching transposase sequences predicted by prokka against the ISfinder database  and manually curating the annotations using UGENE . The R. helvetica genome contained in total 41 transposases predicted by prokka. This genome could, however, not be manually curated as it is not complete and thus neither the absence nor the misassembly of transposases can be excluded.
We mapped all proteins against the eggNOG database (v4.5.1)  using emapper (v1.0.1, “–d bact”)  to cluster them into orthologous groups. For all unmapped proteins we performed an all-against-all blastp search and clustered proteins with an e-value < 0.001 de novo with SiLiX (v1.2.11) with default parameters . We used the following definitions for the pangenome components: core - present in more than 90% of genomes; accessory - present in only one of the genomes. Only for the pangenome of the genus Rhabdochlamydia we required a core protein to be present in all the genomes. For functional annotation we used eggNOG (v4.5.1) , and blastp  against the NCBI nr database for the de novo OGs. Further, we mapped all proteins to the Kyoto Encyclopedia of Genes and Genomes (KEGG)  orthologs (KOs) using GhostKOALA (v2.2) .
Comparison of R. oedothoracis and R. porcellionis genomes
We used pseudofinder (v1.0)  and the “selection” function to identify genes under degradation in the genome of R. oedothoracis W744xW776 in comparison to R. porcellionis 15 C. Pseudofinder identifies homologous sequences in the two genomes and calculates the ratio of non-synonymous to synonymous substitution rates (dN/dS) for each set of genes. We used a threshold of 0.3 to distinguish between pseudogenes (>0.3) and genes under purifying selection (<=0.3).
To show synteny between R. porcellionis and R. oedothoracis the two genomes were blasted against each other using blastn . Further, GC skews were calculated for both genomes using a custom python script (window size = 1000). The genomes were visualized using Circos (v0.69.9) . To show disruption of synteny by transposases in more detail a short syntenic segment (R. oedothoracis: 360–500 kb, R. porcellionis: 150–260 kb) was picked and visualized using the “genoplotR” package (v0.8.11)  in R (v4.0.3) .
All statistical tests and data analysis were performed in R (v4.0.3)  and visualized using ggplot2 (v3.3.3) . NMDS was calculated using eggNOG (v 4.5.1) and de novo clustered OGs and the “metaMDS” function (“vegan” package v2.5-7)  using “distance = bray’”. To test whether members of the genus Rhabdochlamydia are associated with smaller accessory genomes we calculated the size of accessory genomes for all nine Rhabdochlamydiaceae genomes and used the “wilcox.test” function (“stats” package v4.0.3) for statistical evaluation. The enrichment analysis of functional categories based on eggnog (v4.5.1) was carried out using a hypergeometric test with the “phyper” function (“stats” package v4.0.3). The p value was corrected using the “p.adjust” function (“stats” package v4.0.3) and “method = BH”. We considered p < 0.001 as significant.
16S rRNA gene data used in this study are available via the SILVA database (https://www.arb-silva.de/) and IMNGS database (https://www.imngs.org/). Metadata for sequences received from the IMNGS database can be accessed via the Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra). Genome sequences generated in this study have been deposited in GenBank under the accession numbers CP075585-CP075586 (R. porcellionis) and CP075587-CP075588 (R. oedothoracis). Accession numbers for reference genomes and Metagenome-assembled genomes (MAG) are available in Supplementary Table S1. Metagenomic data are available through the IMG/M portal (https://img.jgi.doe.gov/). MAG sequences from the Genomes from Earth’s Microbiomes initiative are available at https://genome.jgi.doe.gov/GEMs. The collection of genomes and proteomes and the data of the IMNGS search used in this study are available at zenodo (https://doi.org/10.5281/zenodo.4723235).
Everett KD, Bush RM, Andersen AA. Emended description of the order Chlamydiales, proposal of Parachlamydiaceae fam. nov. and Simkaniaceae fam. nov., each containing one monotypic genus, revised taxonomy of the family Chlamydiaceae, including a new genus and five new species, and standards for the identification of organisms. Int J Syst Bacteriol. 1999;49:415–40.
Horn M. Chlamydiae as symbionts in eukaryotes. Annu Rev Microbiol. 2008;62:113–31.
Taylor-Brown A, Vaughan L, Greub G, Timms P, Polkinghorne A. Twenty years of research into Chlamydia-like organisms: a revolution in our understanding of the biology and pathogenicity of members of the phylum Chlamydiae. Pathog Dis. 2015;73:1–15.
Bayramova F, Jacquier N, Greub G. Insight in the biology of Chlamydia-related bacteria. Microbes Infect. 2018;20:432–40.
Borel N, Polkinghorne A, Pospischil A. A review on chlamydial diseases in animals: still a challenge for pathologists? Vet Pathol. 2018;55:374–90.
Collingro A, Köstlbacher S, Horn M. Chlamydiae in the environment. Trends Microbiol. 2020;28:877–88.
Corsaro D, Greub G. Pathogenic potential of novel Chlamydiae and diagnostic approaches to infections due to these obligate intracellular bacteria. Clin Microbiol Rev. 2006;19:283–97.
Greub G, Boyadjiev I, La Scola B, Raoult D, Martin C. Serological hint suggesting that Parachlamydiaceae are agents of pneumonia in polytraumatized intensive care patients. Ann NY Acad Sci. 2003;990:311–9.
Lamoth F, Aeby S, Schneider A, Jaton-Ogay K, Vaudaux B, Greub G. Parachlamydia and rhabdochlamydia in premature neonates. Emerg Infect Dis. 2009;15:2072–5.
Lagkouvardos I, Weinmaier T, Lauro FM, Cavicchioli R, Rattei T, Horn M. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. ISME J. 2014;8:115–25.
Greub G, Raoult D. History of the ADP/ATP-translocase-encoding gene, a parasitism gene transferred from a Chlamydiales ancestor to plants 1 billion years ago. Appl Environ Microbiol. 2003;69:5530–5.
Horn M, Collingro A, Schmitz-Esser S, Beier CL, Purkhold U, Fartmann B, et al. Illuminating the evolutionary history of chlamydiae. Science. 2004;304:728–30.
Kamneva OK, Liberles DA, Ward NL. Genome-wide influence of indel Substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method. Genome Biol Evol. 2010;2:870–86.
Abdelrahman YM, Belland RJ. The chlamydial developmental cycle. FEMS Microbiol Rev. 2005;29:949–59.
Bachmann NL, Polkinghorne A, Timms P. Chlamydia genomics: providing novel insights into chlamydial biology. Trends Microbiol. 2014;22:464–72.
Bastidas RJ, Valdivia RH. Emancipating chlamydia: advances in the genetic manipulation of a recalcitrant intracellular pathogen. Microbiol Mol Biol Rev. 2016;80:411–27.
Sixt BS, Valdivia RH. Molecular genetic analysis of chlamydia species. Annu Rev Microbiol. 2016;70:179–98.
Collingro A, Tischler P, Weinmaier T, Penz T, Heinz E, Brunham RC, et al. Unity in variety–the pan-genome of the Chlamydiae. Mol Biol Evol. 2011;28:3253–70.
Taylor-Brown A, Madden D, Polkinghorne A. Culture-independent approaches to chlamydial genomics. Microb Genom. 2018;4. https://doi.org/10.1099/mgen.0.000145.
Omsland A, Sixt BS, Horn M, Hackstadt T. Chlamydial metabolism revisited: interspecies metabolic variability and developmental stage-specific physiologic activities. FEMS Microbiol Rev. 2014;38:779–801.
Domman D, Collingro A, Lagkouvardos I, Gehre L, Weinmaier T, Rattei T, et al. Massive expansion of Ubiquitination-related gene families within the Chlamydiae. Mol Biol Evol. 2014;31:2890–904.
Collingro A, Köstlbacher S, Mussmann M, Stepanauskas R, Hallam SJ, Horn M. Unexpected genomic features in widespread intracellular bacteria: evidence for motility of marine chlamydiae. ISME J. 2017;11:2334–44.
Dharamshi JE, Tamarit D, Eme L, Stairs CW, Martijn J, Homa F, et al. Marine sediments illuminate chlamydiae diversity and evolution. Curr Biol. 2020;30:1032–48.e7.
Köstlbacher S, Collingro A, Halter T, Schulz F, Jungbluth SP, Horn M. Pangenomics reveals alternative environmental lifestyles among chlamydiae. Nat Commun. 2021;12:4021.
Kostanjšek R, Štrus J, Drobne D, Avguštin G. ‘Candidatus Rhabdochlamydia porcellionis’, an intracellular bacterium from the hepatopancreas of the terrestrial isopod Porcellio scaber (Crustacea: Isopoda). Int J Syst Evol Microbiol. 2004;54:543–9.
Corsaro D, Thomas V, Goy G, Venditti D, Radek R, Greub G. ‘Candidatus Rhabdochlamydia crassificans’, an intracellular bacterial pathogen of the cockroach Blatta orientalis (Insecta: Blattodea). Syst Appl Microbiol. 2007;30:221–8.
Vanthournout B, Hendrickx F. Endosymbiont dominated bacterial communities in a dwarf spider. PLoS ONE. 2015;10:e0117297.
Pillonel T, Bertelli C, Aeby S, de Barsy M, Jacquier N, Kebbi-Beghdadi C, et al. Sequencing the obligate intracellular rhabdochlamydia helvetica within its tick host ixodes ricinus to investigate their symbiotic relationship. Genome Biol Evol. 2019;11:1334–44.
Radek R. Light and electron microscopic study of a Rickettsiella species from the cockroach Blatta orientalis. J Invertebr Pathol. 2000;76:249–56.
Kostanjšek R, Pirc Marolt T. Pathogenesis, tissue distribution and host response to Rhabdochlamydia porcellionis infection in rough woodlouse Porcellio scaber. J Invertebr Pathol. 2015;125:56–67.
Pilloux L, Aeby S, Gaümann R, Burri C, Beuret C, Greub G. The high prevalence and diversity of Chlamydiales DNA within Ixodes ricinus ticks suggest a role for ticks as reservoirs and vectors of Chlamydia-related bacteria. Appl Environ Microbiol. 2015;81:8177–82.
Lagkouvardos I, Joseph D, Kapfhammer M, Giritli S, Horn M, Haller D, et al. IMNGS: a comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies. Sci Rep. 2016;6:33721.
Kreisinger J, Kropáčková L, Petrželková A, Adámková M, Tomášek O, Martin J-F, et al. Temporal stability and the effect of transgenerational transfer on fecal microbiota structure in a long distance migratory bird. Front Microbiol. 2017;8:50.
Geisen S, Mitchell EAD, Adl S, Bonkowski M, Dunthorn M, Ekelund F, et al. Soil protists: a fertile frontier in soil biology research. FEMS Microbiol Rev. 2018;42:293–323.
Singer D, Seppey CVW, Lentendu G, Dunthorn M, Bass D, Belbahri L, et al. Protist taxonomic and functional diversity in soil, freshwater and marine ecosystems. Environ Int. 2021;146:106262.
Elwell C, Mirrashidi K, Engel J. Chlamydia cell biology and pathogenesis. Nat Rev Microbiol. 2016;14:385–400.
Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016;44:D286–93.
Miele V, Penel S, Duret L. Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics. 2011;12:116.
McCutcheon JP, Moran NA. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 2011;10:13–26.
Bohlin J, Sekse C, Skjerve E, Brynildsrud O. Positive correlations between genomic %AT and genome size within strains of bacterial species. Environ Microbiol Rep. 2014;6:278–86.
Foerstner KU, von Mering C, Hooper SD, Bork P. Environments shape the nucleotide composition of genomes. EMBO Rep. 2005;6:1208–13.
Agashe D, Shankar N. The evolution of bacterial DNA base composition. J Exp Zool B Mol Dev Evol. 2014;322:517–28.
Brockhurst MA, Harrison E, Hall JPJ, Richards T, McNally A, MacLean C. The ecology and evolution of pangenomes. Curr Biol. 2019;29:R1094–103.
Peters J, Wilson DP, Myers G, Timms P, Bavoil PM. Type III secretion à la Chlamydia. Trends Microbiol. 2007;15:241–51.
Schmitz-Esser S, Linka N, Collingro A, Beier CL, Neuhaus HE, Wagner M, et al. ATP/ADP translocases: a common feature of obligate intracellular amoebal symbionts related to Chlamydiae and Rickettsiae. J Bacteriol. 2004;186:683–91.
Rosario CJ, Tan M. The early gene product EUO is a transcriptional repressor that selectively regulates promoters of Chlamydia late genes. Mol Microbiol. 2012;84:1097–107.
Verma A, Maurelli AT. Identification of two eukaryote-like serine/threonine kinases encoded by Chlamydia trachomatis serovar L2 and characterization of interacting partners of Pkn1. Infect Immun. 2003;71:5772–84.
Archuleta TL, Du Y, English CA, Lory S, Lesser C, Ohi MD, et al. The Chlamydia effector chlamydial outer protein N (CopN) sequesters tubulin and prevents microtubule assembly. J Biol Chem. 2011;286:33992–8.
Colpaert M, Kadouche D, Ducatez M, Pillonel T, Kebbi-Beghdadi C, Cenci U, et al. Conservation of the glycogen metabolism pathway underlines a pivotal function of storage polysaccharides in Chlamydiae. Commun Biol. 2021;4:296.
Azarian T, Huang I-T, Hanage WP. Structure and dynamics of bacterial populations: pangenome ecology. In: Tettelin H, Medini D (eds). The Pangenome: Diversity, Dynamics and Evolution of Genomes. 2020. Springer, Cham (CH).
Drobne D, Strus J, Znidarsic N, Zidar P. Morphological description of bacterial infection of digestive glands in the terrestrial isopod porcellio scaber (Isopoda, crustacea). J Invertebr Pathol. 1999;73:113–9.
Konstantinidis KT, Tiedje JM. Towards a genome-based taxonomy for prokaryotes. J Bacteriol. 2005;187:6258–64.
Di Martino ML, Campilongo R, Casalino M, Micheli G, Colonna B, Prosseda G. Polyamines: emerging players in bacteria-host interactions. Int J Med Microbiol. 2013;303:484–91.
Michael AJ. Polyamines in eukaryotes, bacteria, and archaea. J Biol Chem. 2016;291:14896–903.
Guerra PR, Herrero-Fresno A, Pors SE, Ahmed S, Wang D, Thøfner I, et al. The membrane transporter PotE is required for virulence in avian pathogenic Escherichia coli (APEC). Vet Microbiol. 2018;216:38–44.
Rodríguez-Beltrán J, DelaFuente J, León-Sampedro R, MacLean RC, San Millán Á. Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nat Rev Microbiol. 2021;19:347–59.
Köstlbacher S, Collingro A, Halter T, Domman D, Horn M. Coevolving plasmids drive gene flow and genome plasticity in host-associated intracellular bacteria. Curr Biol. 2021;31:346–57.e3.
Szabo KV, O’Neill CE, Clarke IN. Diversity in Chlamydial plasmids. PLoS ONE. 2020;15:e0233298.
Zhong G. Chlamydial plasmid-dependent pathogenicity. Trends Microbiol. 2017;25:141–52.
Gitsels A, Van Lent S, Sanders N, Vanrompay D. Chlamydia: what is on the outside does matter. Crit Rev Microbiol. 2020;46:100–19.
Bertelli C, Collyn F, Croxatto A, Rückert C, Polkinghorne A, Kebbi-Beghdadi C, et al. The Waddlia genome: a window into chlamydial biology. PLoS ONE. 2010;5:e10890.
Aistleitner K, Anrather D, Schott T, Klose J, Bright M, Ammerer G, et al. Conserved features and major differences in the outer membrane protein composition of chlamydiae. Environ Microbiol. 2015;17:1397–413.
Wang Y, Brune A, Zimmer M. Bacterial symbionts in the hepatopancreas of isopods: diversity and environmental transmission. FEMS Microbiol Ecol. 2007;61:141–52.
Siguier P, Filée J, Chandler M. Insertion sequences in prokaryotic genomes. Curr Opin Microbiol. 2006;9:526–31.
Plague GR, Dunbar HE, Tran PL, Moran NA. Extensive proliferation of transposable elements in heritable bacterial symbionts. J Bacteriol. 2008;190:777–9.
Burke GR, Moran NA. Massive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids. Genome Biol Evol. 2011;3:195–208.
Schmitz-Esser S, Penz T, Spang A, Horn M. A bacterial genome in transition–an exceptional enrichment of IS elements but lack of evidence for recent transposition in the symbiont Amoebophilus asiaticus. BMC Evol Biol. 2011;11:270.
Oakeson KF, Gil R, Clayton AL, Dunn DM, von Niederhausern AC, Hamil C, et al. Genome degeneration and adaptation in a nascent stage of symbiosis. Genome Biol Evol. 2014;6:76–93.
Manzano-Marín A, Latorre A. Snapshots of a shrinking partner: genome reduction in Serratia symbiotica. Sci Rep. 2016;6:32590.
Hendry TA, Freed LL, Fader D, Fenolio D, Sutton TT, Lopez JV. Ongoing transposon-mediated genome reduction in the luminous bacterial symbionts of deep-sea ceratioid anglerfishes. MBio. 2018;9:e01033–18.
McCutcheon JP, Boyd BM, Dale C. The life of an insect endosymbiont from the cradle to the grave. Curr Biol. 2019;29:R485–95.
Moran NA, Plague GR. Genomic changes following host restriction in bacteria. Curr Opin Genet Dev. 2004;14:627–33.
Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8:382–92.
Syberg-Olsen M, Garber A, Keeling P, McCutcheon J, Husnik F. Pseudofinder: detection of pseudogenes in prokaryotic genomes. bioRxiv. 2021. https://doi.org/10.1101/2021.10.07.463580.
Rocha EP, Danchin A, Viari A. Universal replication biases in bacteria. Mol Microbiol. 1999;32:11–16.
Moliner C, Fournier P-E, Raoult D. Genome analysis of microorganisms living in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev. 2010;34:281–94.
Bertelli C, Greub G. Lateral gene exchanges shape the genomes of amoeba-resisting microorganisms. Front Cell Infect Microbiol. 2012;2:110.
Haselkorn TS, Jimenez D, Bashir U, Sallinger E, Queller DC, Strassmann JE, et al. Novel Chlamydiae and Amoebophilus endosymbionts are prevalent in wild isolates of the model social amoeba Dictyostelium discoideum. Environ Microbiol Rep. 2021;13:708–19.
Hernández-Jarguín A, Díaz-Sánchez S, Villar M, de la Fuente J. Integrated metatranscriptomics and metaproteomics for the characterization of bacterial microbiota in unfed Ixodes ricinus. Ticks Tick Borne Dis. 2018;9:1241–51.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–6.
Haft DH, DiCuccio M, Badretdin A, Brover V, Chetvernin V, O’Neill K, et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2018;46:D851–60.
Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2020;48:D84–6.
Schulz F, Eloe-Fadrosh EA, Bowers RM, Jarett J, Nielsen T, Ivanova NN, et al. Towards a balanced view of the bacterial tree of life. Microbiome. 2017;5:140.
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–1.
Pruesse E, Peplies J, Glöckner FO. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics. 2012;28:1823–9.
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3.
Dress AWM, Flamm C, Fritzsch G, Grünewald S, Kruspe M, Prohaska SJ, et al. Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol. 2008;3:7.
Minh BQ, Nguyen MAT, von Haeseler A. Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol. 2013;30:1188–95.
Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:W256–9.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, Community-Supported Software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.
Katoh K, Misawa K, Kuma K-I, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66.
Barbera P, Kozlov AM, Czech L, Morel B, Darriba D, Flouri T, et al. EPA-ng: massively parallel evolutionary placement of genetic sequences. Syst Biol. 2019;68:365–9.
Sixt BS, Kostanjšek R, Mustedanagic A, Toenshoff ER, Horn M. Developmental cycle and host interaction of Rhabdochlamydia porcellionis, an intracellular parasite of terrestrial isopods. Environ Microbiol. 2013;15:2980–93.
Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017;13:e1005595.
Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31:3350–2.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
Hendrickx F, De Corte Z, Sonet G, Van Belleghem SM, Köstlbacher S, Vangestel CA. masculinizing supergene underlies an exaggerated male reproductive morph in a spider. Nat Ecol Evol. 2021;6:195–206.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.
Karst SM, Kirkegaard RH, Albertsen M. mmgenome: a toolbox for reproducible genome extraction from metagenomes. bioRxiv.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:1–13.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.
Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, et al. A genomic catalog of Earth’s microbiomes. Nat Biotechnol. 2021;39:499–509.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9.
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–41.
Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–5.
Chan PP, Lowe TM. tRNAscan-SE: searching for tRNA genes in genomic sequences. Methods Mol Biol. 2019;1962:1–14.
Worning P, Jensen LJ, Hallin PF, Stærfeldt H-H, Ussery DW. Origin of replication in circular prokaryotic chromosomes. Environ Microbiol. 2006;8:353–61.
Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34:D32–6.
Okonechnikov K, Golosova O, Fursov M. UGENE team. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28:1166–7.
Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol. 2017;34:2115–22.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:1–9.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.
Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 2016;428:726–31.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.
Guy L, Roat Kultima J, Andersson SGE. genoPlotR: comparative gene and genome visualization in R. Bioinformatics. 2010;26:2334–5.
R Core Team. R: A Language and Environment for Statistical Computing. 2020. R Foundation for Statistical Computing, Vienna, Austria.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. 2016. Springer-Verlag New York.
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, et al. vegan: Community Ecology Package. 2020. R package (version 2.5-7). https://CRAN.R-project.org/package=vegan.
Library preparation and long read sequencing was performed by the Next Generation Sequencing Facility at Vienna BioCenter Core Facilities (VBCF), member of the Vienna BioCenter (VBC), Austria. Illumina sequencing was carried out by the Norwegian High-Throughput Sequencing Centre (NSC), Oslo. The Life Science Compute Cluster (LiSC; http://cube.univie.ac.at/lisc) was used for computational analysis. This project has received funding from the University of Vienna (uni:docs to TH) and the Austrian Science Fund FWF (grant numbers DOC 69-B and P32112).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Halter, T., Köstlbacher, S., Collingro, A. et al. Ecology and evolution of chlamydial symbionts of arthropods. ISME COMMUN. 2, 45 (2022). https://doi.org/10.1038/s43705-022-00124-5