Widespread Torix Rickettsia in New Zealand amphipods and the use of blocking primers to rescue host COI sequences

Endosymbionts and intracellular parasites are common in arthropod hosts. As a consequence, (co)amplification of untargeted bacterial sequences has been occasionally reported as a common problem in DNA barcoding. While identifying amphipod species with universal COI primers, we unexpectedly detected rickettsial endosymbionts belonging to the Torix group. To map the distribution and diversity of Rickettsia species among amphipod hosts, we conducted a nationwide molecular screening of seven families of New Zealand freshwater amphipods. In addition to uncovering a diversity of Torix Rickettsia species across multiple amphipod populations from three different families, our research indicates that: (1) detecting Torix Rickettsia with universal primers is not uncommon, (2) obtaining ‘Rickettsia COI sequences’ from many host individuals is highly likely when a population is infected, and (3) obtaining ‘host COI’ may not be possible with a conventional PCR if an individual is infected. Because Rickettsia COI is highly conserved across diverse host taxa, we were able to design blocking primers that can be used in a wide range of host species infected with Torix Rickettsia. We propose the use of blocking primers to circumvent problems caused by unwanted amplification of Rickettsia and to obtain targeted host COI sequences for DNA barcoding, population genetics, and phylogeographic studies.

Although Rickettsia species are known as common pathogens or endosymbionts in arthropod hosts, these agents have never been reported in crustaceans. Rickettsia-like organisms (RLO) have been reported in some groups of crustaceans including crabs, crayfish, lobsters, shrimps, and amphipods 24 . However, most reports of these RLOs were based on morphological similarity with Rickettsia and were rarely confirmed by molecular data. In amphipods, RLOs were reported in several species of gammarids, as well as other taxa (e.g., Crangonyx floridanus and Diporeia sp.) [25][26][27][28] . 16S rRNA (rrs) sequences of RLOs are available for Diporeia sp. and some gammarids, but none of them belong to the genus Rickettsia 28,29 .
While identifying amphipod species with DNA barcoding for cophylogenetic analyses between microsporidians and their amphipod hosts 30 , we obtained a suspicious COI sequence from Paracalliope fluviatilis, the most common freshwater amphipod species in New Zealand. According to a blast search in GenBank, this sequence obtained from a stream in the Southland region was ~ 99.5% identical to 12 sequences of the same 'amphipod' species available in GenBank from a previous study conducted in our lab 31 . These sequences were obtained from two different locations (Waikouaiti River and Waitaki River) in the Otago region. However, all sequences were highly divergent from that obtained from other populations (~ 57%) of the same host group (Paracalliope species complex). DNA from the individual amphipod was extracted from its legs (i.e., low chance of contamination due to gut contents). We obtained a clear chromatogram with no ambiguous peaks. Furthermore, these sequences were similar to other COI sequences obtained from diverse insects (Coleoptera, Diptera, Hemiptera, Hymenoptera, Odonata) in GenBank with sequence similarity ranging from 80 to 99%. However, these sequences were also similar (~ 92%) to sequences of rickettsial endosymbionts of insects and spiders that have been recently registered in GenBank. Because such highly conserved COI sequences among distantly related arthropod groups are unlikely, we assumed that these sequences were actually obtained from their rickettsial endosymbionts. We independently confirmed the presence of Rickettsia in our amphipod hosts using three genetic markers that were designed to be specific to Rickettsia.
In fact, this phenomenon of the amplification of non-targeted COI sequences with widely used DNA barcoding primers has already been reported. Řezáč et al. (2014) 32 obtained a rickettsial COI sequence from a spider species in a study with a different purpose. Ceccarelli et al. (2016) 33 obtained COI sequences of Torix Rickettsia from six individuals of a spider species while conducting DNA barcoding, and these authors formally discussed the presence of misidentified COI sequences in GenBank. However, despite these early reports, the deposition of misidentified sequences to GenBank has continued until recently. A very recent survey on BOLD reported that 0.41% of the barcode submission in BOLD are actually from Rickettsia, which is higher than that from Wolbachia (0.17%) 34 .
Because Rickettsia species are endosymbionts within host cells, DNA extracts from infected host tissue will inevitably include DNA of endosymbionts as well. If binding sites for 'universal primers' are conserved in both hosts and their endosymbionts, PCR products obtained from these mixed templates may result in mixed signals in chromatograms, or in the amplification of endosymbiont instead of host sequences 33 . Using primers that are designed to bind uniquely and specifically to host templates would reduce this problem. However, designing group-specific primers is not always possible, especially when reference sequences are scarce or not available. Also, finding conserved regions across a given taxonomic group may not be achievable. Alternatively, blocking primers can be used to prevent the amplification of unwanted or dominant sequences among DNA templates 35 . For example, this method has been successfully applied to identify prey items (by suppressing the amplification of predator DNA in gut contents), or to obtain rare mammal sequences from ancient DNA (by blocking the amplification of human DNA) 36,37 . Because COI sequences of Torix Rickettsia are highly conserved in diverse host groups, we were able to design blocking primers that are intended to specifically block the amplification of Torix Rickettsia but allow amplification of the COI region of (any) host mtDNA.
In this study, we first screened rickettsial infections in diverse amphipods collected throughout New Zealand to determine their prevalence and distribution. Secondly, we characterized the genetic diversity of the newly found Rickettsia species in relation to other Torix Rickettsia using 4 distinct markers, namely rrs, gltA, atpA, and COI, and expand current understanding of Rickettsia phylogeny. Thirdly, we demonstrate that unwanted amplification of rickettsial COI sequences during DNA barcoding is a common problem, and that such sequences have been frequently reported and misidentified in GenBank. Fourthly, we suggest that using blocking primers in addition to universal primers for PCR is an effective solution to obtain targeted host COI sequences. Finally, we discuss the implications of these pseudo-sequences in public databases, ways to reduce this problem, and possible applications of blocking primers for similar problems.  Table 1). Because pooled samples were used, accurate prevalence in each population could not be obtained. However, a relative comparison was possible among the populations in which the same number of individuals per sample and the same total number was used (i.e. populations with a total of 48 individuals, with 12 samples each containing 4 individuals) (Supplementary Table 1). With parsimonious interpretation, among 18 populations, 12 populations showed at least 10% prevalence (> 5 positive tubes/12 tubes tested). Seven populations showed at least 20% prevalence (> 10 positive tubes/12 tubes tested). And three populations showed 100% positive tubes (12/12), with prevalence thus possibly ranging from 25-100%. Although Rickettsia was detected in both the North and South Islands, its distribution was confined to the southern parts of both islands (Fig. 1). . A Bayesian tree based on rrs sequences (Fig. 2) shows that all sequences obtained from New Zealand amphipods belong to the Torix group of Rickettsia. Even when the rrs conserved marker was used, all sequences obtained from amphipods (except S39_1542), were grouped in the same clade and distinct from other sequences, although this clade was not strongly supported (PP = 0.84). Several subgroups were identified in the gltA tree (Fig. 3). Most Rickettsia from amphipods were grouped within the same clade, similar to that revealed in the rrs tree. Also, a Bayesian tree inferred from COI sequences (Fig. 4) shows that Rickettsia from amphipods and some insects obtained in New Zealand are closely related, and the clade containing them is strongly supported (PP = 0.96).

Distribution of
testing and validating blocking primers. The ratio of the amplicons of host COI to Rickettsia COI (based on their expected fragment sizes) was calculated to compare the efficiency of each primer under different conditions (Table 1). Not all fragment analyses were successful, but we were able to compare some effects. Except for Bloc_R, which always resulted in amplifying an excess of Rickettsia COI (ratio 0.06 ~ 0.57), all primers showed some blocking effects. Specifically, Bloc_F, Bloc_F2, and HCO_DPO primers showed increased blocking effects (higher ratio of hostCOI:RickettsiaCOI) when more blocking primers were added. An increased number of PCR cycles resulted in a high number of host COI fragments. However, this increased somewhat the amplification of Rickettsia COI as well. Overall, HCO_DPO showed the highest efficiency among tested primers by preventing the amplification of Rickettsia COI, even at low concentrations.

Discussion
Our nationwide molecular screening results show that the Torix group of Rickettsia is widespread in freshwater amphipod hosts in New Zealand, and is to our knowledge the first report of Rickettsia in crustacean hosts.
Because of the lack of information on Rickettsia infections in other groups of amphipods in other parts of the world, when and how these bacteria colonised and spread among New Zealand amphipods remain in question. Because freshwater amphipods have limited dispersal abilities 38 , the widespread distribution of Torix Rickettsia in New Zealand amphipods may be explained by an ancient acquisition followed by vertical transmission, or by many independent events of recent horizontal transmission from other organisms. Several lines of evidence indicate that both horizontal and vertical transmission may have played roles in spreading and maintaining these bacteria in New Zealand amphipod hosts. The monophyletic relationships of most Rickettsia from New Zealand amphipod populations inferred by rrs and gltA sequences (Figs. 2, 3) seem to support the ancient acquisition scenario. Also, genetically closely related Paracalliope populations harbored the same genotype of Rickettsia, which also strongly supports their long-lasting relationship probably maintained by vertical transmission. Meanwhile, sharing of Rickettsia genotypes between sympatric amphipod species of different families suggests host shifts among genetically distant host species within the same order. Such a complex evolutionary history involving both vertical transmission and horizontal transfers has been reported for other insect/endosymbiotic systems 39 . The Bayesian tree obtained with COI sequences provides some hints for horizontal transmission among amphipods and other arthropods (Fig. 4). Rickettsia sequences from darkling beetles (Pimelia sp.) obtained in Europe were highly similar to the sequences identified in New Zealand amphipods (96 ~ 98% similarity) 40 . Moreover, Rickettsia sequences obtained from several unspecified arthropod species (Mandibulata sp., Endopterygota sp., and Formicidae sp.) in New Zealand (although these were originally identified as invertebrate COI sequences) 41 are closely related to those of New Zealand amphipods (96 ~ 99% similarity), providing strong evidence of recent horizontal transmission among them. Unfortunately, details regarding the host specimens and local origins of these sequences in New Zealand are not available. Direct detection of Rickettsia from these arthropod species and multi-gene analyses will be necessary to elucidate their transmission routes. Interestingly, with a larger dataset with multigene data, Pilgrim et al. (2020) 34 have recently inferred frequent horizontal transmissions of Torix Rickettsia among distantly related hosts. This supports the recent horizontal transmission of Rickettsia among amphipods and insects in the shared habitat, warranting further investigation. Our findings support the early observation that the Torix group of Rickettsia may be highly associated with aquatic and damp environments 16 , which was based on the detection of this group in leeches, amoeba, Diptera, and Coleoptera 14,19,22,42 . Pilgrim et al. (2017) provided support for this view by detecting Rickettsia from 38% of Culicoides species tested and hypothesized that Torix Rickettsia may be dominant in insects with aquatic larval stages, and the 'aquatic hotspot' hypothesis has recently been strongly supported by a comparison of the incidence of Torix Rickettsia between terrestrial and aquatic hosts 34 . Horizontal transmission of Torix Rickettsia among genetically distantly related but spatially co-occurring species may have occurred frequently 16 . The high prevalence of Torix Rickettsia and their stable association with their hosts suggest negligible pathogenic effects of this group 19,22,42,43 . Some Torix Rickettsia may even be beneficial for their hosts. For example, infected leeches can have larger body sizes than uninfected individuals, although the possibility that larger individuals are more likely to acquire Rickettsia via horizontal transmission cannot be ruled out 14,20 . Ecological impacts of Torix Rickettsia on their hosts, and direct evidence of horizontal transmission among aquatic host groups, could be better answered with targeted community-level studies. www.nature.com/scientificreports/ With the advancement of molecular techniques, our knowledge of the diversity of the sister groups of Rickettsia is also increasing and changing rapidly. Earlier studies focused mainly on the pathogenic and medically important species in arthropod hosts 44,45 . Until 2005, only two genera, Rickettsia and Orientia, were known within the family Rickettsiaceae, which now contains seven more genus-level taxa 46,47 . All these new genera are exclusively found in aquatic environments, mostly within ciliate hosts. It seems that adaptation to the use of arthropod hosts occurred several times independently within the family Rickettsiaceae 46 . In addition, the phylogenetic status and  The Torix group is largely different from the other groups of Rickettsia in many respects, including host range and habitat. The Torix group includes not only endosymbionts of diverse aquatic invertebrates (that are more complex than ciliates), but also diverse terrestrial arthropod hosts. Also, the Torix group is genetically distinct from other groups of Rickettsia, which all are sister to Torix Rickettsia. Specifically, the genetic divergences between the Torix and the Bellii groups are 96% in rrs, 78% in gltA, and 76% in COI sequences. The delimitation criteria we used for the Torix group in this study were 98.1% in rrs, 87.6% in gltA, and 89% in COI (broadly 80%; see Fig. 4). Two genome sequences of Torix Rickettsia recently became available 23,43 . These genomes have the typical characteristics of Rickettsia (e.g. reduced genome size, and biosynthetic and catabolic capacity) but also have unique characteristics different from the other groups of Rickettsia (e.g. the presence of non-oxidative PPP, methionine salvage pathway, and glycolysis). It would be interesting to see how these two sister lineages, one mainly pathogenic and the other nonpathogenic, evolved and diverged from their common ancestor.
Interestingly, COI sequences revealed a diversity of Torix Rickettsia, as much as other popular markers for Rickettsiales such as rrs and gltA, even though COI has rarely been used as a marker of choice. Among 17 studies that have generated COI sequences of Rickettsia, only 3 were specifically intended to obtain COI sequences from Rickettsia 23,49,50 . The remaining 14 studies obtained Rickettsia COI sequences as a byproduct of other research objectives (i.e. host identification, population genetic studies, or DNA barcoding) with PCR using universal primers. rrs is a widely used marker but may be too conserved to resolve phylogenetic relationships among closely related species, while the gltA gene shows more variability. Only 6 studies (including the present one) produced both rrs and gltA sequences 22,23,43,51,52 . gltA sequences are not available for most Torix Rickettsia including endosymbionts of leeches and amoeba, as the reports of Torix Rickettsia from these groups precede the first use of gltA sequences to study Torix Rickettsia 19,20,42,53 . Conversely, only gltA sequences are available for some species found in spiders and dipterans, which made it difficult to resolve the phylogenetic position of these rickettsial endosymbionts along with other Rickettsia 14,53 . 'Limoniae' and 'Leech' groups were used within Torix Rickettsia in some studies based on the gltA gene and concatenated sequences of gltA and rrs genes, although the 'Leech' group was found not to be monophyletic 22,54 . Our gltA tree showed two main lineages which correspond to the clades found in previous studies. Similarly, two main linages were identified in the Bayesian tree inferred from COI sequences. However, whether the clade containing all endosymbionts from spiders represents the 'Leech' group could not be confirmed without direct multi-gene data from the same group. It is likely that there are many more subgroups, given the limited number of targeted studies available to date.
Endosymbionts and vertically transmitted intracellular parasites are common in arthropod hosts 55,56 . In the context of the growing recognition of the 'Holobiont' concept 57,58 , obtaining bacterial sequences from DNA extracts from host tissue is not surprising. Most bacterial 'contaminations' are filtered out during processing of metabarcoding data 59,60 . The frequent recovery of COI sequences from Torix Rickettsia can be partly explained by their nucleotide sequence similarity with mitochondria. The Proto-mitochondrion (the hypothetical common ancestor of all mitochondria) is often recovered as a sister to the Rickettsiales or within the Rickettsiales 61 . The alignment of COI sequences among several lineages of Rickettsia shows the high similarity between priming sites and the sequences used for universal primers (Fig. 5). The priming site for the forward primer is 80% (20/25 nucleotides) identical to the LCO1490 sequences, and the priming site for the reverse primer is 84.6% (22/26 nucleotide) identical to the HCO2198 sequences. In addition, priming sites for universal primers are not conserved in many groups, which necessitates the need for group-specific or degenerate markers 62,63 . However, Table 1. PCR conditions and results of fragment analyses. PCR products obtained under different PCR conditions (different primers with different concentrations, annealing temperature, different DNA templates, number of PCR cycles) were run on capillary electrophoresis. The number of amplicons of host COI is followed by the number of amplicons of Rickettsia COI separated by (/), and their ratio is shown in parenthesis. The effects of blocking primers are highlighted in the grey boxes. Universal to blocking primer ratios that were higher than 0.8 are highlighted in red. NA: fragment analysis was conducted but the result was not available. www.nature.com/scientificreports/ this does not explain the more frequent reports of Torix COI in GenBank, because priming sites are also highly conserved in other groups of Rickettsia (Fig. 5). Pilgrim et al. (2020) 34 proposed that the lack of SNP near the 3′ end of the priming site of Torix Rickettsia may be responsible for this bias. Additionally, we hypothesize that overall high prevalence of Torix Rickettsia compared to other groups of Rickettsia in an infected host population can partly explain the bias, even though a formal comparison of prevalence between Torix and non-Torix groups was not made (but see Weinert et al., 2015 15 for prevalence of diverse groups of Rickettsia in host populations). Therefore, several individuals from a given population may all be infected and could yield rickettsial COI, as illustrated in some previous studies 31,33 , and in the current study. These problems can be managed, as they are with Wolbachia 11 . As mentioned earlier, Torix COI is highly conserved across diverse hosts. Therefore, comparing newly obtained (and suspicious) COI sequences with known Torix Rickettsia COI sequences can be easily done to distinguish Torix Rickettsia. Comparing sequences from the same taxon or genetically closely related groups could be useful. Checking for the presence of stop codons could largely decrease this problem, as for numts 10 . Bacterial sequences will show stop codons with the translation table for invertebrates, but will be in an open reading frame with the translation table for Bacteria. In addition to the high prevalence of Torix Rickettsia in many populations, high copy numbers of Rickettsia in host cells also make it difficult to obtain genuine host COI sequences, once a population is infected. In this case, applying blocking primers is a practical solution. Unfortunately, using blocking primers for Rickettsia does not always guarantee the amplification of host COI because other symbionts or parasites might still be amplified. Nevertheless, blocking primers can be widely used for any host groups that are infected by Rickettsia, and for both next-generation-sequencing as well as Sanger sequencing. These sequences should not be confounded with those of hosts, yet these 'unwanted sequences' or 'contaminations' can provide useful information about their endosymbionts and parasites. Our current study provides an example, as we confirm the presence of Torix Rickettsia after discovering contaminated sequences from a previous study 31 . Similarly, the COI sequence obtained from a damselfly suggested the presence of Rickettsia in this host group (order Odonata); this has recently been confirmed by a study with a targeted screening 64 . Targeted studies are likely to uncover a huge but under-detected diversity of Torix Rickettsia, and with more data, we will be able to answer questions regarding transmission, host switching, and the evolution of pathogenicity. Furthermore, detailed research on a finer scale is needed to elucidate the impact of these widespread endosymbionts on their diverse hosts. Figure 5. COI sequence alignments showing priming sites for both forward and reverse primers. Conserved regions to which universal primers bind are highlighted with blue squares. Positions at which nucleotides are the same as in universal primers are highlighted with pink texts in the primer sequences. COI sequences are highly divergent among amphipods whereas COI sequences from Torix Rickettsia from diverse host groups are highly conserved, which allowed the blocking primers (binding regions are highlighted with pink squares).  66 . We obtained some rickettsial COI sequences as a byproduct during host identification of amphipods with universal primers. We included these COI sequences along with nucleotide sequences from other genes for further analyses.
BLAST search. Blast search was done on GenBank with the rrs, gltA, atpA, and COI sequences obtained in this study. Based on the result of BLAST searches, all sequences that were considered as the Torix group (sequences with similarity to the query sequence higher than that between the query sequence and the Bellii group Rickettsia) were downloaded from GenBank (see Supplementary Tables 3-5) for further phylogenetic tree reconstructions. rrs and gltA sequences from other Rickettsia groups were also obtained and included for tree reconstruction (Supplementary Table 6). In addition, rrs sequences from recently discovered close relatives to Rickettsia (Candidatus Trichorickettsia, Candidatus Gigarickettsia, and Candidatus Megaira), and Orientia tsutsugamushi were included as outgroups.
phylogenetic analyses. For each gene set, all sequences were aligned in Geneious Prime with the MUS-CLE algorithm. Ambiguous sites were then eliminated in Gblocks with the least restrictive setting 67 . The bestfitting model of nucleotide evolution for each dataset was determined based on the corrected Aikake information criterion (AICc) using jModelTest v2.1.6 68 , which was conducted through the CIPRES Science Gateway v3.3 69 . For all datasets, the General Time Reversible (GTR) model of nucleotide substitution along with Gamma distributed rate variation across sites (G) and the proportion of invariable sites (I) were chosen as the best model. Bayesian trees were inferred in MrBayes 3.2.7a 70 . For each dataset, two independent runs, which consisted of four chains each, were simultaneously conducted for 10,000,000 generations with a sampling frequency of 1000.  36 . In order to design blocking primers, COI sequences of the Torix group and some species belonging to other groups of Rickettsia, and COI sequences of New Zealand amphipods were aligned in Geneious Prime (Fig. 5). We designed four different annealing inhibiting blocking oligos which were intended to compete with universal primers. All met the following criteria: First, the blocking primers should overlap with one of the universal primers. Second, the blocking primers should specifically bind to the unwanted DNA templates (i.e. Rickettsia) but not to our target DNA templates (i.e. amphipod hosts). Third, 3′-end was modified so that it does not prime amplification (here, all with C3 spacer). Initially, two primers were designed: Bloc_F and Bloc_R (Table 2). However, GC contents of these primers were too low (27.5% and 21%, respectively), which resulted in a low expected melting temperature (Tm) of 43.2 °C and 42.5 ºC, respectively. Ideally, Tm of a blocking primer should be higher than that of the competing primers 36 . We, therefore, designed a longer primer, BlocF_2, to increase Tm to 51.7 ºC (Table 2). A fourth primer was designed with a dual priming oligonucleotide (DPO): HCO_DPO ( Table 2). A DPO can be used when it is impossible to find an appropriate binding site for a blocking primer adjacent to a binding site of a universal primer 35 . A DPO primer consists of two separate segments connected with five deoxyinosines, and with C3 spacer modification at the 3′end. The total length of a typical DPO primer is long but it does not suffer from high Tm because a deoxyinosine linker, which assumes a bubble-like structure, allows the two segments to act independently 71 . All synthesized primers were purified with polyacrylamide gel purification (PAGE) to increase binding specificity by removing under-synthesized oligos.
Validation of blocking primers. We applied fragment analysis to test and compare the effectiveness of our blocking primers 35 . Fragment analysis of fluorescently labeled PCR products on capillary electrophoresis can separate fragments in different sizes and can be used as a semi-quantitative method. When amplified with the universal LCO1490 and HCO2198 primers, the expected lengths of PCR products were different for amphipod hosts and Rickettsia COI, because Rickettsial COI is 6 bp longer. The FAM dye was attached to the 5′ end of the LCO1490 primer. This fluorescently labeled forward primer was added to the PCR mixture instead of the normal (unlabeled) LCO1490 primer. Various factors can affect PCR success with blocking primers: Tm of primers, the concentration of primers (relative ratio between blocking primer and regular primer), the amount of the DNA templates in a PCR mixture (concentration of DNA), and the number of PCR cycles 35 . To optimize PCR conditions, PCR reactions were conducted under several different PCR conditions (Table 1). Fragment analyses were carried out with a 1,200 LIZ size marker on an ABI 3730xl System (Applied Biosystems) at Macrogen (Korea). Results were analyzed with Peak Scanner Software 1.0 (Applied Biosystems; https ://www.therm ofish er.com/).