Members of the bacterial genus Rickettsia were originally identified as causative agents of vector-borne diseases in mammals. However, many Rickettsia species are arthropod symbionts and close relatives of ‘Candidatus Megaira’, which are symbiotic associates of microeukaryotes. Here, we clarify the evolutionary relationships between these organisms by assembling 26 genomes of Rickettsia species from understudied groups, including the Torix group, and two genomes of ‘Ca. Megaira’ from various insects and microeukaryotes. Our analyses of the new genomes, in comparison with previously described ones, indicate that the accessory genome diversity and broad host range of Torix Rickettsia are comparable to those of all other Rickettsia combined. Therefore, the Torix clade may play unrecognized roles in invertebrate biology and physiology. We argue this clade should be given its own genus status, for which we propose the name ‘Candidatus Tisiphia’.
Symbiotic bacteria are vital to the function of most living eukaryotes, including microeukaryotes, fungi, plants, and animals1,2,3,4. The symbioses formed are often functionally important to the host with effects ranging from mutualistic to detrimental. Mutualistic symbionts may provide benefits through the biosynthesis of metabolites, or by protecting their hosts against pathogens and parasitoids5,6. Parasitic symbionts can be detrimental to the host due to resource exploitation or through reproductive manipulation that favours its own transmission over the host’s7,8. Across these different symbiotic relationships, symbionts are often important determinants of host ecology and evolution.
The Rickettsiales (Alphaproteobacteria) represent an order of largely obligate intracellular bacteria that form symbioses with a variety of eukaryotes9. Deianiraea, an extracellular parasite of Paramecium, is the one known exception10. Within Rickettsiales, the family Rickettsiaceae represent a diverse collection of bacteria that infect a wide range of eukaryotic hosts and can act as symbionts, parasites, and pathogens. Perhaps the best-known clade of Rickettsiaceae is the genus Rickettsia, which was initially described as the cause of spotted fever and other rickettsioses in vertebrates that are transmitted by ticks, lice, fleas, and mites11.
Rickettsia have been increasingly recognised as heritable arthropod symbionts. Since the description of a maternally inherited male-killer in ladybirds12, we now know that heritable Rickettsia are common in arthropods13,14. Further, Rickettsia-host symbioses are diverse, with different symbionts being capable of reproductive manipulation, nutritional and protective symbiosis, as well as influencing thermotolerance and pesticide susceptibility15,16,17,18,19,20,21.
Our understanding of the evolution and diversity of the genus Rickettsia and its allies has increased in recent years, with the taxonomy of Rickettsiaceae developing as more data becomes available14,22. Weinert et al.14 loosely defined 13 different groups of Rickettsia based on 16 S rRNA phylogeny, which showed two early branching clades that appeared genetically distant from other members of the genus. One of these was a symbiont of Hydra and designated as Hydra group Rickettsia, which has since been assigned its own genus status, ‘Candidatus Megaira’23. ‘Ca. Megaira’ forms a related clade to Rickettsia and is found in ciliates, amoebae, chlorophyte and streptophyte algae, and cnidarians24. Members of this clade are found in hosts from aquatic, marine and soil habitats which include model organisms (e.g., Paramecium, Volvox) and economically important vertebrate parasites (e.g., Ichthyophthirius multifiliis, the ciliate that causes white spot disease in fish)24. Whilst symbioses between ‘Ca. Megaira’ and microeukaryotes are pervasive, there is no publicly available complete genome and the impact of these symbioses on the host are poorly understood.
A second early branching clade was described from Torix tagoi leeches and is commonly coined Torix group Rickettsia25. Symbionts in the Torix clade have since been found in a wide range of invertebrate hosts from midges to freshwater snails to fish-parasitic amoeba13. The documented diversity of hosts is wider than other Rickettsia groups, which are to date only found in arthropods and their associated vertebrate or plant hosts14. Torix clade Rickettsia are known to be heritable symbionts, but their impact on host biology is poorly understood, despite the economic and medical importance of several hosts (inc. bed bugs, black flies, and biting midges). Rare studies have described the potential effects on the host, which include larger body size in leeches25; a small negative effect on growth rate and reproduction in bed bugs26; and an association with parthenogenesis in Empoasca Leafhoppers27.
Current data suggest an emerging macroevolutionary scenario where the members of the Rickettsia clade originated as symbionts of microeukaryotes, before diversifying to infect invertebrates23,28,29. Many symbionts belonging to the Rickettsiaceae (e.g., ‘Ca. Megaira’, ‘Candidatus Trichorickettsia’, ‘Candidatus Phycorickettsia’, ‘Candidatus Sarmatiella’ and ‘Candidatus Gigarickettsia’) circulate in a variety of microeukaryotes23,30,31,32,33. The Torix group Rickettsia retained a broad range of hosts from microeukaryotes to arthropods13. The remaining members of the genus Rickettsia evolved to be arthropod heritable symbionts and vector-borne pathogens14,34. However, a lack of genomic and functional information for symbiotic clades limits our understanding of evolutionary transitions within Rickettsia and its related groups. No ‘Ca. Megaira’ genome sequences are currently publicly available and of the 165 Rickettsia genome assemblies available on the NCBI (as of 29/04/21), only two derive from the Torix clade and these are both draft genomes. In addition, dedicated heritable symbiont clades of Rickettsia, such as the Rhyzobius group, have no available genomic data, and there is a single representative for the Adalia clade. Despite the likelihood that heritable symbiosis with microeukaryotes and invertebrates was the ancestral state for this group of intracellular bacteria, available genomic resources are heavily skewed towards pathogens of vertebrates.
In this study we establish a richer base of genomic information for heritable symbionts Rickettsia and ‘Ca. Megaira’, then use these resources to clarify the evolution of these groups. We broaden available genomic data through a combination of targeted sequencing of strains without complete genomes, and metagenomic assembly of Rickettsia strains from arthropod genome projects. We report the first closed circular genome of a ‘Ca. Megaira’ symbiont from a streptophyte alga (Mesostigma viride) and provide a draft genome for a second ‘Ca. Megaira’ from a chlorophyte (Carteria cerasiformis). In addition, we present the complete genomes of two Torix Rickettsia from a midge (Culicoides impunctatus) and a bed bug (Cimex lectularius) as well as a draft genome for Rickettsia from a tsetse fly (Glossina morsitans submorsitans, an important vector species), and a new strain from a spider mite (Bryobia graminum). A metagenomic approach established a further 22 draft genomes for insect symbiotic strains, including previously unsequenced Rhyzobius and Meloidae group draft genomes. We utilize these to conduct pangenomic, phylogenomic, and metabolic analyses of our extracted genome assemblies, with comparisons to existing Rickettsia.
Results and discussion
We have expanded the available genomic data for several Rickettsia groups through a combination of draft and complete genome assembly. This includes an eight-fold increase in available Torix-group genomes, and genomes for previously unsequenced Meloidae and Rhyzobius groups. We further report initial reference genomes for ‘Ca. Megaira’.
Complete and closed reference genomes for Torix Rickettsia and ‘Ca. Megaira’
The use of long-read sequencing technologies produced complete genomes for two subclades of the Torix group limoniae (RiCimp) and leech (RiClec). Sequencing depth of the Rickettsia genomes from C. impunctatus (RiCimp) and C. lectularius (RiClec) were 18X and 52X, respectively. The RiCimp genome provides evidence of plasmids in the Torix group (pRiCimp001 and pRiCimp002) (Table 1). Notably, the two plasmids share more similarities between them than to other Rickettsia plasmids. However, both plasmids contain distant homologs of the DnaA_N domain-containing proteins previously found in other Rickettsia plasmids35. In addition, only two components of the type IV conjugative transfer system known as RAGEs (Rickettsiales Amplified Genetic Elements)36 were present on the plasmids including homologs of the proteins TrwB/TraD and TraA/MobA. The majority of the RAGE elements including both the F-like (tra) and P-like type IV components have been incorporated in the main chromosome. The presence of RAGE elements, alongside the fact conjugation apparatuses have narrow host-ranges37, suggest horizontal transfer of these plasmids is likely within the Rickettsiaceae and could occur between Torix and the main Rickettsia clade, considering co-infections of these genera have been noted previously38,39. We additionally assembled a complete closed reference genome of ‘Ca. Megaira’ from Mesostigma viride (MegNEIS296) from previously published genome sequencing efforts. Likewise, MegNEIS296 genome contains a plasmid which bears features of other Rickettsia plasmids including the presence of a tra conjugative element and the presence of two DnaA_N-like protein paralogs.
General features of both genomes are consistent with previous genomic studies of the Torix group (Table 1). A single full set of rRNAs (16 S, 5 S and 23 S) and a GC content of ~33% was observed. Notably, the two complete Torix group genomes show a distinct lack of synteny (Supplementary Fig. 1), a genomic feature that is compatible with our phylogenetic analyses that placed these two lineages in different subclades (leech/limoniae) (Fig. 1 and Supplementary Fig. 3). Gene order breakdown due to intragenomic recombination has been previously associated with the expansion of mobile genetic elements in both Rickettsia40 and Wolbachia41, another member of the Rickettsiales. Both RiCimp and RiClec genomes predicted to encode for a high number of transposable elements with circa 96 and 119 annotated putative transposases, respectively. This expansion of transposable elements along with their phylogenetic distance is likely responsible for the extreme synteny breakdown between RiCimp and RiClec. Of note within the closed reference genomes MegNEIS296 and RiCimp is the presence of a putative non-ribosomal peptide synthetase (NRPS) and a hybrid non-ribosomal peptide/polyketide synthetase (NRPS/PKS) respectively (Supplementary Fig. 2). Although, the exact products of these putative pathways are uncertain, in silico prediction by Norine suggests some similarity with both cytotoxic and antimicrobial peptides hinting at a potential defensive role (Supplementary Fig. 2). Further homology comparison with other taxa did not provide links with any specific functions or phenotypes. Previously, an unrelated hybrid NRPS/PKS cluster has been reported in Rickettsia buchneri on a mobile genetic element, providing potential routes for horizontal transmission42. The strongest blastp hits of MegNEIS296 NRPS proteins occur in Cyanobacteria (Supplementary Fig. 2)42. In addition, putative toxin-antitoxin systems similar to one associated with cytoplasmic incompatibility in Wolbachia have recently been observed on the plasmid of Rickettsia felis in a parthenogenetic booklouse35. Toxin-antitoxin systems are thought to be part of an extensive bacterial mobilome network associated with reproductive parasitism43. A BLAST search found a very similar protein in Oopac6 to the putative large pLbAR toxin found in R. felis (88% aa identity), and a more distantly related protein in the C. impunctatus plasmid (25% aa identity).
Sequencing and de novo assembly of other Rickettsia and ‘Ca. Megaira’ genomes
Our direct sequencing efforts enabled assembly of draft genomes for a second ‘Ca. Megaira’ strain from the alga Carteria cerasiformis, and for Rickettsia associated with tsetse flies and Bryobia spider mites. The Rickettsia genome retrieved from a wild caught Tsetse fly, RiTSETSE, is a potentially chimeric assembly of closely related Transitional group Rickettsia. We identified an excess of 3584 biallelic sites (including 3369 SNPs and 215 indels) when the raw Illumina reads were mapped back to the assembly. High read depth of 104X indicate that this could be a symbiotic association, reflecting previous observations in Tsetse fly cells44. However, there is a possibility that RiTSETSE is not a heritable symbiont but comes from transient infection from a recent blood meal.
From the SRA accessions, the metagenomic pipeline extracted 29 full symbiont genomes for Rickettsiales across 24 host species. Five of 29 were identified as Wolbachia and discarded from further analysis, one was a Rickettsia discarded for low quality, and another was a previously assembled Torix Rickettsia, RiCNE45. Thus, 22 high quality Rickettsia metagenomes were obtained from 21 host species. One beetle (SRR6004191) carried coinfecting Rickettsia Lappe3 and Lappe4 (Table 2). The high-quality Rickettsia genomes covered the Belli, Torix, Transitional, Rhyzobius, Meloidae and Spotted Fever Groups (Table 2 and Supplementary Data 1).
Beetles, particularly rove beetle (Staphylinidae) species, appear in this study as a possible hotspot of Rickettsia infection. Rickettsia has historically been commonly associated with beetles, including ladybird beetles (Adalia bipunctata), diving beetles (Deronectes sp.) and bark beetles (Scolytinae)14,17,34,46,47. Though a plausible and likely hotspot, this observation needs be approached with caution as this could be an artefact of skewed sampling efforts.
Phylogenomic analyses and taxonomic placement of assembled genomes
The phylogeny and network illustrate the distance of Torix from ‘Ca. Megaira’ and other Rickettsia, along with an extremely high level of within-group diversity in Torix compared to any other group (Fig. 1 and Supplementary Fig. 3). No significant discordance was detected between the core and ribosomal phylogenies. The phylogenies generated using core genomes are consistent with previously identified Rickettsia and host associations using more limited genetic markers13,14,48,49. For instance, Pfluc4 from Proechinophthirus fluctus lice is grouped on the same branch as a previously sequenced Rickettsia from a different individual of P. fluctus48. The following groups were identified in the 22 genomes assembled from the SRA screening: 4 Transitional, 1 Spotted Fever, 1 Adalia, 8 Belli and 7 Torix limoniae. Targeted sequences were confirmed as: Torix limoniae (RiCimp), Torix leech (RiClec), Transitional (RiTSETSE), ‘Ca. Megaira’ (MegCarteria and MegNEIS296), and a deeply diverging Torix clade provisionally named Moomin (Moomin) (Table 2, Fig. 1, Supplementary Fig. 3 and 4). The extracted Torix genomes include one double infection giving a total of 10 new genomes across 9 potential host species. The double infection is found within the rove beetle Labidopullus appendiculatus, forming two distinct lineages, Lappe3 and Lappe4 (Fig. 1 and Supplementary Fig. 3).
We also report a putative Rhyzobius group Rickettsia genomes extracted from the staphylinid beetle Oxypoda opaca (Oopac6) and Meloidae group Rickettsia from the firefly Pyrocoelia pectoralis (Ppec13). They have high completeness, low contamination, and consistently group away from the other draft and completed genomes (Figs. 1, 2, and Supplementary Data 1). MLST analyses demonstrate that these bacteria are most like the Rhyzobius and Meloidae groups described by Weinert et al.14 (Supplementary Fig. 4). Phylogenies of Oopac6 and Ppec13 suggest that Rhyzobius sits as sister group to all other Rickettsia groups, and Meloidae is more closely associated with Belli (Fig. 1, Supplementary Fig. 3–5). Further genome construction will help clarify this taxon and its relationship to the rest of the Rickettsiaceae. The sequencing data for the wasp, Diachasma alloeum, used here has previously been described to contain a pseudogenised nuclear insert of Rickettsia material, but not a complete Rickettsia genome50. The construction of a full, non-pseudogenised genome with higher read depth than the insect contigs, low contamination (0.95%) and high completion (93.13%) suggests that these reads likely represent a viable Rickettsia infection in D. alloeum. However, these data do not exclude the presence of an additional nuclear insert. It is possible for a whole symbiont genome to be incorporated into the host’s DNA like in the case of Wolbachia51, or the partial inserts of ‘Ca. Megaira’ genomes in the Volvox carteri genome52. The presence of both the insert and symbiont need confirmation through appropriate microscopy methods.
Recombination is low within the core genomes of Rickettsia and ‘Ca. Megaira’ but may occur between closely related clades that are not investigated here. Across all genomes, the PHI score is significant in 6 of the 74 core gene clusters, suggesting putative recombination events. However, it is reasonable to assume that most of these may be a result of systematic error due to the divergent evolutionary processes at work across Rickettsia genomes. Patterns of recombination can occur by chance rather than driven by evolution which cannot be differentiated by current phylogenetic methods53. The function of each respective cluster can be found in Supplementary Data 1.
Gene content, pangenome and metabolic analysis
Across all genomes used in the gene content comparison analysis (Supplementary Fig. 6), Anvi’o identified only 208 core gene clusters of which 74 are represented by single-copy genes. It is particularly evident the large size of the accessory genome across the main Rickettsia and the Torix clades. Out of the 2470 predicted ortholog clusters for the Torix clade 1296 (52.5%) are uniquely found among the Torix genomes, while for Rickettsia 2460 unique ortholog clusters were predicted from a total of 3811 (64.5%) (Fig. 3). However, if we account for the number of genomes available in each clade then Torix shows higher rates of gene cluster and unique gene clusters accumulation with each additional genome (Fig. 4). Our results indicate that the main Rickettsia clade and especially the Torix clade, seem to have a high degree of genome diversity, suggesting a wider repertoire of genes and potentially greater rates of gene turnover. As expected, the more genomes that are included in analyses, the smaller the core genome extracted. However, gene content analysis results of increasingly diverged genomes should be always interpreted with caution as true homology relationship between genes/proteins might get obscured by their sequence divergence.
Torix is a distinctly separate clade sharing less than 65% AAI similarity to any Rickettsia or ‘Ca. Megaira’ genomes (Fig. 2). It contains at least five species-level clusters with >95% ANI similarity that reflect its highly diverse niche in the environment (Fig. 2)13,54,55. With only two examples, the true diversity of ‘Ca. Megaira’ is underestimated here. Overall, our results indicate higher genomic plasticity within Torix clade in terms of gene content compared to Rickettsia.
We also investigated whether Torix and Rickettsia clades are enriched for particular COGs (Supplementary Data 1). Among the most highly enriched genes in Torix clade were genes encoding for invasion associated proteins like the exopolysaccharide synthesis protein ExoD (COG3932) and the invasion associated protein IalB (COG5342), a carbonic anhydrase (COG0288) and a Chloramphenicol resistance associated protein (COG3896). Both carbonic anhydrase and ExoD homologs has been already reported in Torix clade45 and our results here further support their important role in Torix biology. ExoD has been previously reported as essential for successful nodule invasion of the nitrogen-fixing endosymbiont Rhizobium56. When we consider both Torix and ‘Ca. Megaira’ clades the genes involved in the non-oxidative phase of the PPP pathway were the most highly enriched genes (Supplementary Data 1). It is noteworthy that a large fraction of the enriched genes in both Rickettsia and Torix clades are related to cell wall and membrane biogenesis. These are likely associated with differences in the biology of the two clades at the host-microbe interface.
Rickettsia lineages group together based on gene presence/absence and produce repeated patterns of accessory genes that reliably occur within each clade (Supplementary Fig. 6). AAI scores separate Torix group, Rickettsia and ‘Ca. Megaira’ out into genus groups with no score above 65% similarity outside of each respective clade (Fig. 2)57. ANI scores suggest that Torix and the remaining Rickettsia clades are multispecies clusters with less than 95% similarity between genomes in the same groups except for the Spotted Fever Group (Fig. 2)57.
Rickettsial genomes extracted from SRA samples are generally congruent with the metabolic potential of their respective groups (Fig. 5). Torix and ‘Ca. Megaira’ all have complete pentose phosphate pathways (PPP); a unique marker for these groups which seems to have been lost in the other Rickettsia clades45. The PPP generates NADPH, precursors to amino acids, and is known to protect against oxidative injury in some bacteria58, as well as conversion of hexose monosaccharides into pentose used in nucleic acid and exopolysaccharide synthesis. The PPP has also been associated with establishing symbiosis between the Alphaproteobacteria Sinorhizobium meliloti and its plant host Medicago sativa59. This pathway has previously been highlighted in Torix45 and its presence in all newly assembled Torix and ‘Ca. Megaira’ draft genomes consolidates its importance as an identifying feature for these groups (Fig. 5, Supplementary Data 1). Considering the trend towards gene loss, the PPP is likely an ancestral feature that was lost in the main Rickettsia clade45,60.
Metabolic pathways for Glycolysis, gluconeogenesis, and cofactor/vitamin synthesis are absent or incomplete across all Rickettsia included in these analyses, except in the Rhyzobius group member, Oopac6. Oopac6 has a putatively complete biotin synthesis pathway (Fig. 5, Supplementary Fig. 7) and is likely a separate genus according to GTDBtk analysis (Supplementary Data 1). The Oopac6 biotin synthesis pathway is related to, but distinct from, the Rickettsia biotin pathway from Rickettsia buchneri36 with which it shares between 85% to 92% amino acid sequence similarity across genes (Supplementary Fig. 7)36. Moreover, there is no sequence similarity outside of the biotin operon. This, along with the presence on a plasmid in Rickettsia buchneri makes it likely that Oopac6 operon is a result of horizontal gene transfer. Animals cannot synthesize B-vitamins, so they either acquire them from diet or from microorganisms that can synthesize them. Oopac6 has retained or acquired a complete biotin operon where this operon is absent in other members of the genus. Biotin pathways in insect symbionts can be an indicator of nutritional symbioses61, so Rhyzobius Rickettsia could contribute to the feeding ecology of the beetle O. opaca. However, like other aleocharine rove beetles, O. opaca is likely predaceous, omnivorous or fungivorous (analysis of gut contents from a related species, O. grandipennis, revealed a high prevalence of yeasts62). We posit no obvious reason for how these beetles benefit from harbouring a biotin-producing symbiont. One theory is that this operon could be a hangover from a relatively recent host shift event and may have been functionally important in the original host. Similarly, if the symbiont is undergoing genome degradation, a once useful biotin pathway may be present but not functional63,64. Although the pimeloyl-ACP biosynthesis pathway is partially present (Fig. 5), a bioH homolog is not found within or outside the biotin operon (Supplementary Fig. 7) suggesting that this pathway may not be functional (as observed in some Buchnera aphidicola64,65) or that it may be used in a different way. As this is the only member of this group with a whole genome so far, further research is required to firmly establish the presence and function of this pathway.
A 75% complete dTDP‐L‐rhamnose biosynthesis pathway was observed in 4 of the draft belli assemblies (Gdoso1, Choog2, Drufa1, Blapp1) (Fig. 5). Two host species are bird lice (Columbicola hoogstraali, Degeeriella rufa), one is a butterfly (Graphium doson), and one is a ground beetle (Bembidion lapponicum). dTDP‐L‐rhamnose is an essential component of human pathogenic bacteria like Pseudomonas, Streptococcus and Enterococcus, where it is used in cell wall construction66. This pathway67 may be involved in the moulting process of Caenorhabditis elegans68, and it is a precursor to rhamnolipids that are used in quorum sensing69. In the root symbiont Azospirillium, disruption of this pathway alters root colonisation, lipopolysaccharide structure and exopolysaccharide production70. No Rickettsia from typically pathogenic groups assessed in Fig. 5 has this pathway, and the hosts of these four bacteria are not involved with human or mammalian disease. Presence in feather lice provides little opportunity for this Rickettsia to be pathogenic to their vertebrate hosts because feather lice are not blood feeders, and Belli group Rickettsia are rarely pathogenic. Further, this association does not explain its presence in a butterfly and ground beetle; it is most likely that this pathway, if functional, would be involved in establishing infection in the insect host or host-symbiont recognition.
A partial NAD biosynthesis pathway is present only in the Moomin genome. NAD is used as a coenzyme in numerous reactions as well as a substrate in some synthesis pathways, such as ADP-Ribosyltransferases which are used in bacterial toxin-antitoxin systems71,72. NAD pathways have previously found in two other members of Rickettsiaceae, ‘Ca. Sarmatiella mevalonica’ and Occidentia massiliensis31,73. The most likely explanation for rare occurrence in Rickettsiaceae is either a lateral transfer event or remnants from ancestral occurrence.
Designation of ‘Candidatus Tisiphia’
In all analyses, Torix group consistently clusters away from the rest of Rickettsia as a sister taxon. Despite the relatively small number of Torix genomes, its within group diversity is greater than any divergence between previously described Rickettsia in any other group (Fig. 1, Supplementary Figs. 3 and 4). Additionally, Torix shares characteristics with both ‘Ca. Megaira’ and Rickettsia, but with many of its own unique features (Figs. 3 and 5). The distance of Torix from other Rickettsia and ‘Ca. Megaira’ is confirmed in both the phylogenomic and metabolic function analyses to the extent that Torix should be separated from Rickettsia and assigned its own genus. This is supported by GTDB-Tk analysis which places all Torix genomes separate from Rickettsia (Supplementary Data 1) alongside AAI percentage similarity scores less than 65% in all cases (Fig. 2a). To this end, we propose the name ‘Candidatus Tisiphia’. This name follows the fury Tisiphone, reflecting the genus ‘Ca. Megaira’ being named after her sister Megaera.
The bioinformatics approach has successfully extracted a substantial number of Rickettsia and ‘Ca. Megaira’ genomes from existing SRA data, including genomes for putative Rhyzobius Rickettsia and several ‘Ca. Tisiphia’ (formerly Torix group Rickettsia). Successful completion of two ‘Ca. Megaira’ and two ‘Ca. Tisiphia’ genomes provide solid reference points for the evolution of Rickettsia and its related groups. From this, we can confirm the presence of a complete Pentose Phosphate Pathway in ‘Ca. Tisiphia’ and ‘Ca. Megaira’, suggesting that this pathway was lost during Rickettsia evolution. We also describe previously unsequenced Meloidae and Rhyzobius Rickettsia and show that Rhyzobius group Rickettsia has the potential to be a nutritional symbiont due to the presence of a complete biotin pathway. These genomes provide a much-needed expansion of available data for symbiotic Rickettsia clades and clarification on the evolution of Rickettsia from ‘Ca. Megaira’ and ‘Ca. Tisiphia’.
Genomic data collection and construction
We employed two different workflows to assemble genomes for ‘Ca. Megaira’ and Rickettsia symbionts (Fig. 6). a) Targeted sequencing and assembly of focal ‘Ca. Megaira’ and Torix Rickettsia. b) Assembly from SRA deposits of ‘Ca. Megaira’ from Mesostigma viride NIES296 and the 29 arthropods identified in Pilgrim et al.13 that potentially harbour Rickettsia. These were analysed alongside previously assembled genomes from the genus Rickettsia, and the outgroup taxon Orientia tsutsugamushi, a distant relative of Rickettsia species74.
DNA preparation, sequencing strategies and symbiont assembly methodologies varied between species and are listed below. The pipeline used to assemble genomes from Short Read Archive (SRA) data is deposited on Zenodo75.
Sample collection for targeted genome assembly
Cimex lectularius were acquired from the ‘S1’ isofemale colony maintained at the University of Bayreuth described in Thongprem et al.26. Culicoides impunctatus females were collected from a wild population in Kinlochleven, Scotland (56° 42’ 50.7“N 4° 57’ 34.9“W) on the evenings of the 2nd and 3rd September 2020 by aspiration. Carteria cerasiformis strain NIES 425 was obtained from the Microbial Culture Collection at the National Institute for Environmental Studies, Japan. The Glossinia morsitans submorsitans specimen Gms8 was collected in Burkina Faso in 2010 and Rickettsia infection was present alongside other symbionts as described in Doudoumis et al.76. The assembly itself is a result of later thesis work77.
A Bryobia mite community was sampled from herbaceous vegetation in Turku, Finland. The Moomin isofemale line was established by isolating a single adult female and was maintained on detached leaves of Phaseolus vulgaris L. cv Speedy at 25 °C, 60% RH, and a 16:8 light:dark photoperiod. The Moomin spider mite line was morphologically identified as Bryobia graminum by Prof Eddie A. Ueckermann (North-West University).
Previously published Rickettsia genomes
A total of 86 published Rickettsia genomes, and one genome from Orientia tsutsugamushi were retrieved from the European Nucleotide Archive and assessed with CheckM v1.0.1378. Inclusion criteria for genomes were high completeness (CheckM > 90%), low contamination (CheckM < 2%) and low strain heterogeneity (Check M < 50%) except in the case of Adalia for which there is only one genome (87.6% completeness). Filtering identified 76 high quality Rickettsia genomes that were used in all subsequent analyses (Supplementary Data 1).
High molecular weight DNA extraction, assembly, and annotation of complete genomes for two ‘Ca. Tisiphia’ (==Torix group Rickettsia) from Culicoides impunctatus and Cimex lectularius
High molecular weight (HMW) genomic DNA was prepared using four-hundred and eighty whole C. impunctatus and 45 C. lectularius heads, the latter of which had been symbiont-enriched using a protocol designed to eliminate host nuclei through filtration79. Culicoides impunctatus individuals were pooled and homogenised in two 1.5 ml Eppendorf tubes containing 0.9 ml of buffer G2 (Qiagen) using a pestle while the filtrate from the enriched C. lectularius heads was also split and diluted to the same volumes. Twenty-five μL of proteinase K (50 mg/ml) was added to each Eppendorf before incubation at 56 °C for 90 minutes with gentle inversion every 30 min. The respective lysates were centrifuged at 12,000 x g for 20 minutes before the supernatants were pooled and diluted to 3 ml with buffer G2. After equilibrating a Genomic-tip 20/G (Qiagen) with 1 ml QBT buffer, the lysate was gently inverted before being poured onto the tip membrane. The tip was washed four times with 1 ml of QC buffer (Qiagen) before elution of the DNA using buffer QF (Qiagen). Using wide-bore pipette tips, 667 µL of the eluate was pipetted into three 1.5 mL Eppendorf tubes before the addition of 467 µL isopropanol to each tube and mixing by gentle inversion 10 times. Genomic DNA was pelleted by centrifuging for 20 min at 15,000 x g at 4 °C and washing twice with 70% ethanol before resuspending in buffer EB (Qiagen). Quality control of HMW DNA was then confirmed by running on a gel and assessment by Qubit fluorometric quantitation.
Long-read libraries for Oxford Nanopore sequencing were generated using the SQK-LSK109 Ligation Sequencing Kit and sequenced on Minion R9.4.1 flow cells at the Centre for Genomic Research, University of Liverpool, United Kingdom. Raw Nanopore reads were base called using Guppy version 4.0.15 (Oxford Nanopore) using the high accuracy model option (-c dna_r9.4.1_450bps_hac.cfg). All reads which were over 500 bp in length and had an average phred (Q) score of > 10 were filtered using NanoFilt version 2.7.180. These reads were assembled with Flye version 2.8.181 using default options.
Assembled circular contigs of ~1.5 Mb in length were confirmed for Rickettsia identity by BLASTing against a Rickettsia genomic database45. High quality short-read libraries were also generated from the same DNA samples and used to correct the nanopore assemblies. C. impunctatus paired-end library (2 x 150 bp) was prepared using a Kapa HyperPrep kit (Roche) and sequenced by BGI Genomics (Hong Kong) on a DNBseq platform, whereas C. lectularius sequencing was carried out by BGI Genomics (Hong Kong) on a Hiseq Xten PE150 platform. Data cleaning and filtering was performed by BGI Genomics’ using SOAPnuke version 2.1.482 removing adapters and any reads with 50% of bases having phred scores lower than 20.
Remaining reads were assembled with MEGAHIT version 1.2.983 using default parameters and contigs were binned using MetaBAT 2 version 2.12.184. The identities of bins were checked with CheckM version 1.1.378 and DNBseq reads were mapped to contigs from the Rickettsia allocated bin using ‘perfect mode’ in BBMap version 38.8785 and filtered using SAMtools version 1.1186. Filtered Rickettsia reads were then used to polish the Flye assembled Rickettsia genomes using two rounds of polishing with Pilon version 1.2387 and the ‘—bases’ option for correcting SNPs and small indels. Annotation of the polished genomes was accomplished using PROKKA version 1.1388 and identification of polyketide and non-ribosomal peptide synthases was conducted by antiSMASH version 6.089.
Extraction and assembly of a complete ‘Ca. Megaira’ from Mesostigma viride
‘Ca. Megaira’ genome was extracted from recently published reads of Mesostigma viride NIES296 (from accession PRJNA509752). Illumina reads were de novo assembled using MEGAHIT version v1.2.983, reads were mapped back to the assembled contigs. Contigs were clustered and binned based on nucleotide composition and coverage using MetaBAT2 v2:2.1584 and a minimum contig length of 1.5 kb. The quality of ‘Ca. Megaira’ genome bin was inspected using CheckM78. The PacBio reads were mapped on the Illumina draft assembly and reads of ‘Ca. Megaira’ origin were extracted. Due to the excessive number of obtained reads a sub-sample (reads > 10 k and 1/3 of the total) was taken using seqkit90 and used for subsequent analysis. This sub-sample of PacBio reads was assembled using Canu version 1.891 under default parameters. The final assembly, consisted of two contigs, was manually inspected for circularization and trimmed accordingly. The final and circular assembly was further polished by a combination of PacBio and Illumina reads using Pilon v1.2287.
Extraction of Transitional Rickettsia, RiTSETSE, from Glossina morsitans submorsitans
All methods described here for the extraction of G. morsitans submorsitans originate from a thesis by Frances Blow77.
DNA was extracted immediately using the CTAB (Cetyl trimethylammonium bromide) method and was stored at −20 °C. Whole Genome Shotgun (WGS) libraries were prepared with the Illumina TruSeq Nano DNA kit following the manufacturers’ instructions. Samples were sequenced on two lanes of Illumina HiSeq with 250 bp paired-end reads. Raw sequencing reads were de-multiplexed and converted to FASTQ format with CASAVA version 1.8 (Illumina). Cutadapt version 1.2.192 was used to trim Illumina adapter sequences from FASTQ files. Reads were trimmed if 3 bp or more of the 3’ end of a read matched the adapter sequence. Sickle version 1.20093 was used to trim reads based on quality: any reads with a window quality score of less than 20, or which were less than 10 bp long after trimming, were discarded.
Metagenomic reads were assembled with DISCOVAR94 and contigs shorter than 500 bp were removed and mapping with Bowtie295 was used to assess coverage. Taxonomy was assigned to contigs with BLAST and the GC content of contigs assessed with the Blobology package96. Contigs were filtered based on GC content, coverage and taxonomy, and reads were extracted using scripts implemented in Blobology. Extracted reads were re-assembled with SPAdes version 3.7.197 and mapped to contigs with Bowtie2. Assembly statistics were calculated with custom perl scripts and Qualimap version 2.298.
DNA extraction of Moomin ‘Ca. Tisiphia’ (== Torix group Rickettsia) from Bryobia graminum str. moomin
Genomic DNA was extracted from ~1000 adult females using the Quick-DNA Universal kit (BaseClear, the Netherlands) and was sequenced by GENEWIZ on an Illumina NovaSeq instrument. Rickettsia sequence was extracted from illumina reads as described for other MAGs.
DNA extraction of ‘Ca. Megaira’ from Carteria cerasiformis
Symbiont enriched DNA was extracted from culture using a modified version of the protocol of Stouthamer et al.79. Specifically, prior to homogenization the Carteria cerasiformis culture was filtered through a 100um filter/mesh to reduce bacterial contamination. DNA extraction was performed using the QIAGEN DNAeasy™ Blood & Tissue Kit. Short read sequencing was carried out by BGI Genomics (Hong Kong) on a Hiseq Xten PE150 platform. Rickettsia sequences were assembled from Illumina reads as described for other MAGs.
Assembly, and annotation of Rickettsia genomes from publicly available SRA data
Pilgrim et al.13 identified 29 SRA deposits containing Rickettsia DNA. We used these datasets to extract and assemble 22 new high quality draft Rickettsia genomes. Briefly, short reads from each SRA library were assembled using MEGAHIT v1.2.983, mapped with Minimap 2 v2.17-r94199 and contigs were binned based on tetranucleotide frequencies using MetaBAT2 v2:2.1584. Rickettsia like bins were quality inspected with CheckM v1.0.1378. Bins with a completeness score of over 50% and contamination below 2% marked as Rickettsiales or Rickettsia were then retained onward for further refinement, annotation, and scrutiny.
To refine MAGs, insect SRA contigs were compared against a local Rickettsia genome database using Blastn100. Contigs with significant matches to the database were extracted, non-Rickettsia contigs were identified with blastx against the nr database and contigs with atypical coverage were discarded. MetaBAT 2 filtered out reads less than 1.5 kb long for accuracy, but these reads are potentially informative in small symbiont genomes, so contigs with a length of 1–2.5 kb were manually examined and added to MetaBAT 2 assembled genomes. Those with improved CheckM score and no Wolbachia in the original host are used as the final draft genome for the Rickettsia. The additional genome for the leech Rickettsia, RiTBt, was found to contain Cardinium contamination during separate examination. RiTBt contigs identified as Cardinium using blastx were removed from the genome, reducing contamination from 9.48% to 0.95%. The final pipeline resulted in 22 MAGs each with completeness >90% and contamination <2%.
Genome content comparison and pangenome construction
Anvi’o 7101 was used to construct a pangenome. Included in this were the 22 MAGs retrieved from SRA data, 2 ‘Ca. Megaira’ genomes and 4 targeted Torix Rickettsia genomes, and one Transitional group Rickettsia genome acquired in this study. To these were added the 76 published and 1 Orientia described above, giving a total of 104 genomes. Individual Anvi’o genome databases were additionally annotated with HMMER, KofamKOALA, and NCBI COG profiles102,103,104. For the pangenome itself, orthologs were identified with NCBI blast, mcl inflation was set to 2, and minbit to 0.5. Average nucleotide sequence identity was calculated using pyANI105 within Anvi’o 7 and Average Amino Acid identity was calculated though the Kostas Lab enveomics online calculator106. Networks of ANI and AAI results were produced in Gephi 0.9.2107 with Frutcherman Reingold layout and annotated in Inkscape 0.92108. Exact code and a list of packages used is available on Zenodo75.
KofamKOALA annotation103 in Anvi-o 7 was used to estimate completeness of metabolic pathways and Pheatmap109 in R 3.4.4110 was then used to produce heatmaps of metabolic potential. Annotations for function and Rickettsia group were added post hoc in Inkscape.
The biotin operon found in the genome Rhyzobius Rickettsia, Oopac6, was identified from metabolic prediction. To confirm Oopac6 carries a complete biotin pathway that shares ancestry with the existing Rickettsia biotin operon, Oopac6 biotin was compared to biotin pathways from five other related symbionts: Cardinium, Lawsonia, Buchnera aphidicola, Rickettsia buchneri, and Wolbachia. Clinker111 with default options was used to compare and visualise the similarity of genes within the biotin operon region of all 6 bacteria. Clinker by default displays the highest similarity comparisons based on an all-vs-all similarity matrix.
All generated draft and complete reference genomes were annotated using the NCBI’s Prokaryotic Genome Annotation Pipeline (PGAP)112. Secondary metabolite biosynthetic gene clusters were identified using AntiSMASH version 6.089 along with Norine113 which searched for similarities to predicted non-ribosomal peptides. BLASTp analysis was additionally used to identify the closest homologues of these biosynthetic gene clusters.
Functional enrichment analyses between the main Rickettsia clade and the Torix – ‘Ca. Megaira’ clades were performed using the Anvi’o program anvi-get-enriched-functions-per-pan-group and the “COG_FUNCTION” as annotation source. A gene cluster presence – absence table was exported using the command “anvi-export-tables”. This was used to create an UpSet plot using the R package ComplexUpset114 to visualize unique and shared gene clusters between different Rickettsia groups. A gene cluster was considered unique to a specified Rickettsia group when it was present in at least one genome belonging to that group. Gene cluster accumulation curves were performed for the pan-, core- and unique-genomes based on the same presence-absence matrix using a custom-made R script115. In each case the cumulative number of gene clusters were computed based on randomly sampled genomes using 100 permutations. The analysis was performed separately for Torix group and the combined remaining Rickettsia. Curves were plotted using the ggplot2 R package116.
Phylogeny, network, and recombination
The single-copy core of all 104 genomes was identified in Anvi’o 7 and is made up of 74 single-copy gene (SCG) clusters. Protein alignments from SCG were extracted and concatenated using the command “anvi-get-sequences-for-gene-clusters”. Maximum likelihood phylogeny was constructed in IQ-TREE v2.1.2117. Additionally, 43 ribosomal proteins were identified through Anvi’o 7 to test phylogenomic relationships. These gene clusters were extracted from the pangenome and used for an independent phylogenetic analysis. The best model according to the Bayesian Information Criterion (BIC) was selected with Model Finder Plus (MFP)118 as implemented in IQ-TREE; this was JTTDCMut+F + R6 for core gene clusters and JTTDCMut+F + R3 for ribosomal proteins. Both models were run with Ultrafast Bootstrapping (1000 UF bootstraps)119 with Orientia tsutsugamushi as the outgroup.
The taxonomic placement of Oopac6, Ppec13 and Dallo3 genomes within the Rhyzobius, Meloidae and Belli groups respectively were confirmed in a smaller phylogenetic analysis, performed as detailed in Pilgrim et al.13 using reference MLST sequences (gltA, 16 S rRNA, 17 kDa OMP, COI) from other previously identified Rickettsia profiles (Source Data). The selected models used in the concatenated partition scheme were as follows: 16 S rRNA: TIM3e + I + G4; 17Kda OMP: GTR + F + I + G4; COI: TPM3u + F + I + G4; gltA: K3Pu + F + I + G4a.
A nearest neighbour network was produced for core gene sets with default settings in Splitstree4 to further assess distances and relationships between Rickettsia, ‘Ca. Megaira’ and Torix clades. All annotation was added post hoc in Inkscape. Furthermore, recombination signals were examined by applying the Pairwise Homoplasy Index (PHI) test to the DNA sequence of each core gene cluster extracted with Anvio-7. DNA sequences were aligned with MUSCLE120 and PHI scores calculated for each of the 74 core gene cluster with PhiPack121.
The taxonomic identity for genomes was established with GTDB-Tk122 to support the designation of taxa through phylogenetic comparison of marker genes against an online reference database.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The genomes and raw read sets generated in this study have been deposited in the GenBank database under accession code PRJNA763820. The assemblies produced from previously published third party data have been deposited in the GenBank database under accession code PRJNA767332. The genome content data and data for figures generated in this study are provided in the Source Data and Supplementary Data. Accessions and metadata for pre-existing genomic data are listed in the Supplementary Data 1 file.
All code and bioinformatics pipelines used to extract and construct bacterial genomes from SRA data can be found on Zenodo (https://doi.org/10.5281/zenodo.6396821), and the R script for generating pangenome accumulation curves can be found on GitHub (https://github.com/SioStef/panplots and here 10.5281/zenodo.6408803). The full pangenome Anvi’o database is available on Figshare (https://doi.org/10.6084/m9.figshare.14865576.v3). An interactive html version of Fig. 5 and its associated ‘json’ file is available on Figshare (https://doi.org/10.6084/m9.figshare.14865567.v5). html of bonzai module information for Supplementary Fig. 2 is available on Figshare (https://doi.org/10.6084/m9.figshare.14865570.v4).
Clay, K., Holah, J. & Rudgers, J. A. Herbivores cause a rapid increase in hereditary symbiosis and alter plant community composition. Proc. Natl. Acad. Sci. 102, 12465–12470 (2005).
Boettcher, K. J., Ruby, E. G. & McFall-Ngai, M. J. Bioluminescence in the symbiotic squid Euprymna scolopes is controlled by a daily biological rhythm. J. Comp. Physiol. A 179, 65–73 (1996).
Douglas, A. E. Lessons from studying insect symbioses. Cell Host Microbe 10, 359–367 (2011).
Fujishima, M. & Kodama, Y. Endosymbionts in Paramecium. Eur. J. Protistol. 48, 124–137 (2012).
Oliver, K. M., Degnan, P. H., Burke, G. R. & Moran, N. A. Facultative symbionts in aphids and the horizontal transfer of ecologically important traits. Annu. Rev. Entomol. 55, 247–266 (2010).
Hendry, T. A., Hunter, M. S. & Baltrus, D. A. The facultative symbiont Rickettsia protects an invasive whitefly against entomopathogenic Pseudomonas syringae strains. Appl. Environ. Microbiol. 80, 7161–7168 (2014).
Leclair, M. et al. Consequences of coinfection with protective symbionts on the host phenotype and symbiont titres in the pea aphid system. Insect Sci. 24, 798–808 (2017).
Engelstädter, J. & Hurst, G. D. D. The ecology and evolution of microbes that manipulate host reproduction. Annu. Rev. Ecol. Evol. Syst. 40, 127–149 (2009).
Weinert, L. A., Araujo-Jnr, E. V., Ahmed, M. Z. & Welch, J. J. The incidence of bacterial endosymbionts in terrestrial arthropods. Proc. R. Soc. B Biol. Sci. 282, 20150249 (2015).
Castelli, M. et al. Deianiraea, an extracellular bacterium associated with the ciliate Paramecium, suggests an alternative scenario for the evolution of Rickettsiales. ISME J. 13, 2280–2294 (2019).
Angelakis, E. & Raoult, D. 187 - Rickettsia and Rickettsia-Like Organisms. In Infectious Diseases (Fourth Edition) (eds. Cohen, J., Powderly, W. G. & Opal, S. M.) 1666–1675.e1 (Elsevier, 2017). https://doi.org/10.1016/B978-0-7020-6285-8.00187-8.
Werren, J. H. et al. Rickettsial relative associated with male killing in the ladybird beetle (Adalia bipunctata). J. Bacteriol. 176, 388–394 (1994).
Pilgrim, J. et al. Torix Rickettsia are widespread in arthropods and reflect a neglected symbiosis. GigaScience 10, giab021 (2021).
Weinert, L. A., Werren, J. H., Aebi, A., Stone, G. N. & Jiggins, F. M. Evolution and diversity of Rickettsia bacteria. BMC Biol. 7, 6 (2009).
Bodnar, J. L., Fitch, S., Rosati, A. & Zhong, J. The folA gene from the Rickettsia endosymbiont of Ixodes pacificus encodes a functional dihydrofolate reductase enzyme. Ticks Tick.-Borne Dis. 9, 443–449 (2018).
Łukasik, P., Guo, H., van Asch, M., Ferrari, J. & Godfray, H. C. J. Protection against a fungal pathogen conferred by the aphid facultative endosymbionts Rickettsia and Spiroplasma is expressed in multiple host genotypes and species and is not influenced by co-infection with another symbiont. J. Evol. Biol. 26, 2654–2661 (2013).
Hurst, G. D. D., Purvis, E. L., Sloggett, J. J. & Majerus, M. E. N. The effect of infection with male-killing Rickettsia on the demography of female Adalia bipunctata L. (two spot ladybird). Heredity 73, 309–316 (1994).
Giorgini, M., Bernardo, U., Monti, M. M., Nappo, A. G. & Gebiola, M. Rickettsia symbionts cause parthenogenetic reproduction in the parasitoid wasp pnigalio soemius (Hymenoptera: Eulophidae). Appl. Environ. Microbiol. 76, 2589–2599 (2010).
Brumin, M., Kontsedalov, S. & Ghanim, M. Rickettsia influences thermotolerance in the whitefly Bemisia tabaci B biotype. Insect Sci. 18, 57–66 (2011).
Chiel, E. et al. Assessments of fitness effects by the facultative symbiont rickettsia in the sweetpotato whitefly (Hemiptera: Aleyrodidae). Ann. Entomol. Soc. Am. 102, 413–418 (2009).
Kontsedalov, S. et al. The presence of Rickettsia is associated with increased susceptibility of Bemisia tabaci (Homoptera: Aleyrodidae) to insecticides. Pest Manag. Sci. 64, 789–792 (2008).
Gillespie, J. J. et al. Plasmids and rickettsial evolution: Insight from Rickettsia felis. PLoS One 2, e266 (2007).
Schrallhammer, M. et al. ‘Candidatus Megaira polyxenophila’ gen. nov., sp. nov.: Considerations on Evolutionary History, Host Range and Shift of Early Divergent Rickettsiae. PLoS One 8, e72581 (2013).
Lanzoni, O. et al. Diversity and environmental distribution of the cosmopolitan endosymbiont “Candidatus Megaira”. Sci. Rep. 9, 1179 (2019).
Kikuchi, Y. & Fukatsu, T. Rickettsia Infection in Natural Leech Populations. Microb. Ecol. 49, 265–271 (2005).
Thongprem, P., Evison, S. E. F., Hurst, G. D. D. & Otti, O. Transmission, tropism, and biological impacts of torix rickettsia in the common bed bug Cimex lectularius (Hemiptera: Cimicidae). Front. Microbiol. 11, (2020).
Aguin-Pombo, D., Rodrigues, M. C. P. A., Voetdijk, B. & Breeuwer, J. A. J. Parthenogenesis and Sex-Ratio Distorting Bacteria in Empoasca (Hemiptera: Cicadellidae) Leafhoppers. Ann. Entomol. Soc. Am. 114, 738–749 (2021).
Kang, Y.-J. et al. Extensive diversity of Rickettsiales bacteria in two species of ticks from China and the evolution of the Rickettsiales. BMC Evol. Biol. 14, 167 (2014).
Driscoll, T., Gillespie, J. J., Nordberg, E. K., Azad, A. F. & Sobral, B. W. Bacterial DNA Sifted from the Trichoplax adhaerens (Animalia: Placozoa) Genome Project Reveals a Putative Rickettsial Endosymbiont. Genome Biol. Evol. 5, 621–645 (2013).
Yurchenko, T. et al. A gene transfer event suggests a long-term partnership between eustigmatophyte algae and a novel lineage of endosymbiotic bacteria. ISME J. 12, 2163–2175 (2018).
Castelli, M. et al. ‘Candidatus Sarmatiella mevalonica’ endosymbiont of the ciliate Paramecium provides insights on evolutionary plasticity among Rickettsiales. Environ. Microbiol. 23, 1684–1701 (2021).
Sabaneyeva, E. et al. Host and symbiont intraspecific variability: The case of Paramecium calkinsi and “Candidatus Trichorickettsia mobilis”. Eur. J. Protistol. 62, 79–94 (2018).
Vannini, C. et al. Flagellar Movement in Two Bacteria of the Family Rickettsiaceae: A Re-Evaluation of Motility in an Evolutionary Perspective. PLoS One 9, e87718 (2014).
Perlman, S. J., Hunter, M. S. & Zchori-Fein, E. The emerging diversity of Rickettsia. Proc. R. Soc. B Biol. Sci. 273, 2097–2106 (2006).
Gillespie, J. J. et al. Genomic Diversification in Strains of Rickettsia felis Isolated from Different Arthropods. Genome Biol. Evol. 7, 35–56 (2015).
Gillespie, J. J. et al. A Rickettsia genome overrun by mobile genetic elements provides insight into the acquisition of genes characteristic of an obligate intracellular lifestyle. J. Bacteriol. 194, 376–394 (2012).
Pukall, R., Tschäpe, H. & Smalla, K. Monitoring the spread of broad host and narrow host range plasmids in soil microcosms. FEMS Microbiol. Ecol. 20, 53–66 (1996).
Yan, P. et al. Microbial diversity in the tick Argas japonicus (Acari: Argasidae) with a focus on Rickettsia pathogens. Med. Vet. Entomol. 33, 327–335 (2019).
Dally, M. et al. Cellular localization of two rickettsia symbionts in the digestive system and within the ovaries of the mirid bug, macrolophous pygmaeus. Insects 11, 530 (2020).
Fuxelius, H.-H., Darby, A., Min, C.-K., Cho, N.-H. & Andersson, S. G. E. The genomic and metabolic diversity of Rickettsia. Res. Microbiol. 158, 745–753 (2007).
Comandatore, F. et al. Supergroup C Wolbachia, mutualist symbionts of filarial nematodes, have a distinct genome structure. Open Biol. 5, 150099 (2015).
Hagen, R., Verhoeve, V. I., Gillespie, J. J. & Driscoll, T. P. Conjugative transposons and their Cargo genes vary across natural populations of Rickettsia buchneri Infecting the Tick Ixodes scapularis. Genome Biol. Evol. 10, 3218–3229 (2018).
Gillespie, J. J. et al. A tangled web: Origins of reproductive parasitism. Genome Biol. Evol. 10, 2292–2309 (2018).
Mediannikov, O., Audoly, G., Diatta, G., Trape, J.-F. & Raoult, D. New Rickettsia sp. in tsetse flies from Senegal. Comp. Immunol. Microbiol. Infect. Dis. 35, 145–150 (2012).
Pilgrim, J. et al. Torix group Rickettsia are widespread in Culicoides biting midges (Diptera: Ceratopogonidae), reach high frequency and carry unique genomic features. Environ. Microbiol. 19, 4238–4255 (2017).
Küchler, S. M., Kehl, S. & Dettner, K. Characterization and localization of Rickettsia sp. in water beetles of genus Deronectes (Coleoptera: Dytiscidae). FEMS Microbiol. Ecol. 68, 201–211 (2009).
Zchori-Fein, E., Borad, C. & Harari, A. R. Oogenesis in the date stone beetle, Coccotrypes dactyliperda, depends on symbiotic bacteria. Physiol. Entomol. 31, 164–169 (2006).
Boyd, B. M. et al. Two bacterial genera, Sodalis and Rickettsia, associated with the seal louse Proechinophthirus fluctus (Phthiraptera: Anoplura). Appl. Environ. Microbiol. 82, 3185–3197 (2016).
Guillotte, M. L. et al. Lipid A structural divergence in Rickettsia pathogens. mSphere 6, e00184–21 (2021).
Tvedte, E. S. et al. Genome of the Parasitoid Wasp Diachasma alloeum, an emerging model for ecological speciation and transitions to asexual reproduction. Genome Biol. Evol. 11, 2767–2773 (2019).
Hotopp, J. C. D. et al. Widespread lateral gene transfer from intracellular bacteria to multicellular eukaryotes. Science 317, 1753–1756 (2007).
Kawafune, K. et al. Two Different Rickettsial Bacteria Invading Volvox carteri. PLoS One 10, e0116192 (2015).
Murray, G. G. R., Weinert, L. A., Rhule, E. L. & Welch, J. J. The phylogeny of rickettsia using different evolutionary signatures: How Tree-Like is Bacterial Evolution? Syst. Biol. 65, 265–279 (2016).
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Rodriguez-R, L. M., Jain, C., Conrad, R. E., Aluru, S. & Konstantinidis, K. T. Reply to: “Re-evaluating the evidence for a universal genetic boundary among microbial species”. Nat. Commun. 12, 4060 (2021).
Reed, J. W. & Walker, G. C. The exoD gene of Rhizobium meliloti encodes a novel function needed for alfalfa nodule invasion. J. Bacteriol. 173, 664–677 (1991).
Konstantinidis, K. T., Rosselló-Móra, R. & Amann, R. Uncultivated microbes in need of their own taxonomy. ISME J. 11, 2399–2406 (2017).
Christodoulou, D. et al. Reserve Flux Capacity in the Pentose Phosphate Pathway Enables Escherichia coli’s Rapid Response to Oxidative Stress. Cell Syst. 6, 569–578.e7 (2018).
Hawkins, J. P., Ordonez, P. A. & Oresnik, I. J. Characterization of mutations that affect the nonoxidative pentose phosphate pathway in Sinorhizobium meliloti. J. Bacteriol. 200, e00436–17 (2018).
Driscoll, T. P. et al. Wholly Rickettsia! Reconstructed metabolic profile of the quintessential bacterial parasite of eukaryotic cells. mBio 8, e00859–17 (2017).
Douglas, A. E. The B vitamin nutrition of insects: the contributions of diet, microbiome and horizontally acquired genes. Curr. Opin. Insect Sci. 23, 65–69 (2017).
Klimaszewski, J. et al. Molecular and microscopic analysis of the gut contents of abundant rove beetle species (Coleoptera, Staphylinidae) in the boreal balsam fir forest of Quebec, Canada. ZooKeys 353, 1–24 (2013).
Blow, F. et al. B-vitamin nutrition in the pea aphid-Buchnera symbiosis. J. Insect Physiol. 126, 104092 (2020).
van Ham, R. C. H. J. et al. Reductive genome evolution in Buchnera aphidicola. Proc. Natl Acad. Sci. 100, 581–586 (2003).
Manzano-Marı́n, A. et al. Serial horizontal transfer of vitamin-biosynthetic genes enables the establishment of new nutritional symbionts in aphids’ di-symbiotic systems. ISME J. 14, 259–273 (2020).
van der Beek, S. L. et al. Streptococcal dTDP-L-rhamnose biosynthesis enzymes: functional characterization and lead compound identification. Mol. Microbiol. 111, 951–964 (2019).
Jiang, N., Dillon, F. M., Silva, A., Gomez-Cano, L. & Grotewold, E. Rhamnose in plants - from biosynthesis to diverse functions. Plant Sci. 302, 110687 (2021).
Feng, L., Shou, Q. & Butcher, R. A. Identification of a dTDP-rhamnose biosynthetic pathway that oscillates with the molting cycle in Caenorhabditis elegans. Biochem. J. 473, 1507–1521 (2016).
Daniels, R., Vanderleyden, J. & Michiels, J. Quorum sensing and swarming migration in bacteria. FEMS Microbiol. Rev. 28, 261–289 (2004).
Jofré, E., Lagares, A. & Mori, G. Disruption of dTDP-rhamnose biosynthesis modifies lipopolysaccharide core, exopolysaccharide production, and root colonization in Azospirillum brasilense. FEMS Microbiol. Lett. 231, 267–275 (2004).
Aravind, L., Zhang, D., de Souza, R. F., Anand, S. & Iyer, L. M. The Natural History of ADP-Ribosyltransferases and the ADP-Ribosylation System. In Endogenous ADP-Ribosylation (ed. Koch-Nolte, F.) 3–32 (Springer International Publishing, 2015). https://doi.org/10.1007/82_2014_414.
Poltronieri, P. & Čerekovic, N. Roles of Nicotinamide Adenine Dinucleotide (NAD+) in Biological Systems. Challenges 9, 3 (2018).
Mediannikov, O. et al. High quality draft genome sequence and description of Occidentia massiliensis gen. nov., sp. nov., a new member of the family Rickettsiaceae. Stand. Genom. Sci. 9, 9 (2014).
Tamura, A., Ohashi, N., Urakami, H. & Miyamura, S. Classification of Rickettsia tsutsugamushi in a New Genus, Orientia gen. nov., as Orientia tsutsugamushi comb. nov. Int. J. Syst. Evol. Microbiol. 45, 589–591 (1995).
Davison, H. R. VibrantStarling/Code-used-to-extract-bacterial-genomes-from-invertebrate-genomes: SRA-dive v1.0.0. (Zenodo, 2022). https://doi.org/10.5281/zenodo.6396821.
Doudoumis, V. et al. Challenging the Wigglesworthia, Sodalis, Wolbachia symbiosis dogma in tsetse flies: Spiroplasma is present in both laboratory and natural populations. Sci. Rep. 7, 4699 (2017).
Blow, F. Variation in the structure and function of invertebrate-associated bacterial communities. (University of Liverpool, 2017). https://doi.org/10.17638/03009325.
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Stouthamer, C. M., Kelly, S. & Hunter, M. S. Enrichment of low-density symbiont DNA from minute insects. J. Microbiol. Methods 151, 16–19 (2018).
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7, gix120 (2018).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. Peer J. 7, e7359 (2019).
Bushnell, B. BBMap. http://sourceforge.net/projects/bbmap/.
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Walker, B. J. et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29–W35 (2021).
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One 11, e0163962 (2016).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Joshi N. A. & Fass J. N. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files. (2011). https://github.com/najoshi/sickle.
Weisenfeld, N. I. et al. Comprehensive variation discovery in single human genomes. Nat. Genet. 46, 1350–1355 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Kumar, S., Jones, M., Koutsovoulos, G., Clarke, M. & Blaxter, M. Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Front. Genet. 4, (2013).
Nurk, S. et al. Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads. In Research in Computational Molecular Biology (eds. Deng, M., Jiang, R., Sun, F. & Zhang, X.) 158–170 (Springer, 2013). https://doi.org/10.1007/978-3-642-37195-0_13.
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinforma. 10, 421 (2009).
Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).
Galperin, M. Y. et al. COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res. 49, D274–D281 (2021).
Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011).
Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G. & Toth, I. K. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal. Methods 8, 12–24 (2015).
Rodriguez-R, L. M. & Konstantinidis, K. T. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. https://peerj.com/preprints/1900 (2016) https://doi.org/10.7287/peerj.preprints.1900v1.
Bastian, M., Heymann, S. & Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. Proc. Int. AAAI Conf. Web Soc. Media 3, 361–362 (2009).
Inkscape Project. Inkscape. (2020). https://inkscape.org.
Kolde R. pheatmap: Pretty Heatmaps. (2019). https://cran.r-project.org/web/packages/pheatmap/index.html.
R: A Language and Environment for Statistical Computing. (2021). https://www.R-project.org/.
Gilchrist, C. L. M. & Chooi, Y.-H. clinker & clustermap.js: Automatic generation of gene cluster comparison figures. Bioinformatics 37, 2473–2475 (2021).
Tatusova, T. et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 44, 6614–6624 (2016).
Flissi, A. et al. Norine: Update of the nonribosomal peptide resource. Nucleic Acids Res. 48, D465–D469 (2020).
Krassowski, M. krassowski/complex-upset: v0.7.4. (Zenodo, 2020). https://doi.org/10.5281/zenodo.4308552.
Siozios, S. SioStef/panplots: (Zenodo, 2022). https://doi.org/10.5281/zenodo.6408803.
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2016). https://doi.org/10.1007/978-0-387-98141-3.
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: Fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 5, 113 (2004).
Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. Genetics 172, 2665–2681 (2006).
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2020).
Grant supporting these works: NE/L002450/1 NERC ACCE Doctoral Training Programme, HRD. Ghent University 01P03420 BOF post-doctoral fellowship and 1513719 N Research Foundation - Flanders (FWO) Research Grant, NW. Funding for tsetse fly genomics were to ACD IP BBSRC projects BB/J017698/1 and BB/K501773/1 FB, the materials from which were provided by Philippe Solano (Institut de Recherche pour le Développement, Montpellier, France) and Jean-Baptiste Rayaisse Centre International de Recherche-Développement sur l’Élevage en zone Subhumide (CIRDES), Bobo Dioulasso, Burkina Faso. Jean-Baptiste died a few years ago but he was a fantastic person to work with and a great field entomologist. We also wish to thank Dr David Montagnes for teaching skills associated with algal culture.
We wish to thank Dr Débora Pires Paula (Embrapa) for granting permission to use SRA data for sample number SRR5651504, Iridian Genomes for allowing use of their SRA data, and the Microbial Culture Collection at the National Institute for Environmental Studies, Japan for use of the sample Carteria cerasiformis NIES-425.
The authors declare no competing interests.
Peer review information
Nature Communications thanks Joseph Gillespie and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Davison, H.R., Pilgrim, J., Wybouw, N. et al. Genomic diversity across the Rickettsia and ‘Candidatus Megaira’ genera and proposal of genus status for the Torix group. Nat Commun 13, 2630 (2022). https://doi.org/10.1038/s41467-022-30385-6