Introduction

Heritable symbiotic associations such as that between Drosophila and Wolbachia pipientis have widespread impact on host ecology and evolution. These heritable endosymbiotic relationships are recognized as key drivers of evolution, but the intraspecific variation that effects their short-term evolution is not well explored. Wolbachia are \(\alpha \)-proteobacterial endosymbionts found in up to 40% of all arthropod species1,2,3. Wolbachia are maternally transmitted and spread through manipulating the reproductive strategies of their host, using mechanisms such as feminization, male-killing, or cytoplasmic incompatibility. The most common of these is cytoplasmic incompatibility, where mating between males and females of the same species results in embryonic mortality if they have different Wolbachia infection status4,5,6,7,8. Wolbachia may also confer certain protections upon their host, such as increased resistance to certain viruses, or increased survival when exposed to certain environmental stressors9,10,11,12,13,14,15. Wolbachia is one of the most abundant obligate intracellular parasites, given that 85% of animal species are insects. This has profound meaning for evolutionary processes such as sexual selection and speciation16,17.

Wolbachia strain wRi is known to have spread recently in the sister species to the model organism Drosophila melanogaster, D. simulans 4,5,8,18. Using cytoplasmic incompatibility as a proxy for infection status it was inferred to be at ~95–100% frequency in Southern California populations at the time its original sampling in the 1980’s19,20. It likely invaded California less than 25 years before it was first detected in 198421. It is now thought to have been horizontally transmitted to D. simulans from D. ananassae, though the same strain is also found in D. suzukii 22. However, the population genetics of Wolbachia in D. simulans have not been investigated. The maternal transmission of Wolbachia means that as the microorganism spreads all maternally inherited organelles spread along with it. Most notably mtDNA will be forced through a bottleneck, lowering the diversity of mtDNA in infected populations18,23,24. This will cause mtDNA and Wolbachia to be more closely associated than nuclear genes, and this coupling has been demonstrated previously in D. simulans 18,23,25. In fact, D. simulans is known to have three major mitochondrial haplotypes (siI, siII, and siIII) and two subtypes (siIIA and siIIB) that harbor very little variation and that appear to be nonrandomly associated with Wolbachia strains26,27,28. These mitochondrial haplotypes are largely allopatric, except for the presence of both siII and siIII in Madagascar and La Reunion29.

In D. melanogaster variation in Wolbachia has been investigated primarily in the DGRP using genomic data, though this has not been done in other species30. In D. melanogaster they found long lived associations between mitochondrial and Wolbachia haplotypes and strong geographic structuring among cytotypes30,31,32. This study also observed that Wolbachia titre varied among fly populations as the result of intraspecific nuclear genetic variation30. However, the assumption that it was due to intraspecific nuclear background was based on the presence of a constant environment and no polymorphisms were identified that could be affecting this phenotype. Very little is known about how Wolbachia interact with their hosts, though recent work has uncovered evidence that deubiquitylating enzymes produced by Wolbachia and secreted into the host cytoplasm mediate cytoplasmic incompatibility33. Wolbachia DNA is also frequently inserted into the host genome, though this has not occurred with wRi in D. simulans 21. Genes involved in the formation of germline stem cells such as benign gonial cell neoplasm and bag-of-marbles are considered candidates for interacting with Wolbachia, and have been found to have unusual population genetic patterns in D. melanogaster 34,35. bag-of-marbles has been suggested to interact with Wolbachia due to fertility rescue in hypomorphs, but the interaction of this gene with Wolbachia in natural populations is not clear22,34,35,36. Notably, Wolbachia localizes in tissues differently depending upon the strain and species so the interactions between the host and Wolbachia are likely to also be different37,38.

Wolbachia infections must be maintained in host populations through transovarial transmission, wherein Wolbachia is present in the germline at sufficient copy number to ensure transmission but not to cause host pathology39. Wolbachia titre has been shown to have important phenotypic effects on the host11,40,41,42,43,44,45,46,47,48. However, control of Wolbachia replication is not well understood, nor is the dependence of this control on host background versus bacterial genotype11,49,50,51. Differences in Wolbachia titre when it is transinfected between species suggests a role of host background in controlling copy number, population genomics in D. melanogaster suggest an effect of host background, and there does seem to be host-specific patterns of tissue colonization52,53,54. However, multiple Wolbachia genotypes can also behave differently in the same genetic background suggesting contributions from the bacterial genome50,55. It is also possible to select for greater Wolbachia densities, though the heritability of this is unclear56,57.

For the first time, I investigate the population dynamics of Wolbachia and mtDNA in a large panel of D. simulans. I determine infection status of Wolbachia in the panel of D. simulans genotypes. I look for signatures of selection in both genomes using summary statistics Tajima’s D and π and find that while Wolbachia patterns of variation are not unusual given its demographic history the reduction in mtDNA diversity is suggestive of a recent bottleneck due either to selection or changes in population size. I compare these results with D. melanogaster, as this is the only other system in which the population genetics of Wolbachia have been investigated. I also measure linkage disequilibrium between mtDNA and Wolbachia as a proxy for co-inheritance. Using whole genome sequences, I investigate the phylogeny of both Wolbachia and mtDNA and find that in this population they are essentially unresolved. I investigate variation in the copy number of both Wolbachia and mtDNA in this population using relative estimates derived from illumina sequencing coverage compared to nuclear coverage. I find considerable copy number variation in this population, and an association analysis using this as a phenotype implicates several genomic regions potentially involved in mediating this phenotype. This includes a region containing multiple genes involved CAAX-box protein prenylation, a process that is important for mediating the relationship between host and pathogen in other systems58,59,60.

Methods

Drosophila strains

Strains are as described in61. Briefly, the 167 D. simulans lines were collected in the Zuma organic orchard in Zuma beach, California in February of 2012 from a single pile of fermenting strawberries. Single mated females were collected and inbred by 15 generations of full sib mating of their progeny. Drosophila were raised at a constant temperature of 20 °C with 12-hour light/dark cycles. They were raised on a standard glucose/yeast media, and each library was constructed from adult females of similar age (less than one week).

Data sources and processing

The sequencing reads were downloaded from the NCBI Short Read Archive from project SRP075682.

Reads were mapped using BWA mem (v. 0.7.5), and processed with samtools (v. 0.1.19) using default parameters62,63. The Wolbachia reference is the wRi strain previously identified in Southern California (Accession number NC_012416)21. The mtDNA reference is from D. simulans w 501, which is haplogroup siII as expected for D. simulans from California (Accession number KC244284)64. PCR duplicates were removed using Picard MarkDuplicates (v. 2.9.4) and GATK (v. 3.7) was used for indel realignment and SNP calling using default parameters (http://picard.sourceforge.net)65. SNPs were called jointly for all genotypes using Haplotypecaller65. Individual consensus fasta sequences were produced using SelectVariants to create individual vcf files and FastaAlternateReferenceMaker. Vcf files were filtered for indels and non-biallelic SNPs using VCFtools (v. 0.1.13)66. The files were also filtered for SNPs with more than 10% missing data. The Wolbachia genome was filtered for regions of unusual coverage or SNP density, for example two regions of the Wolbachia genome harbored ~40 SNPs within two kb, far above background levels of variation (Supp. Fig. 1). These two regions coincided with regions of unusually high coverage suggesting they are repeated elements.

Prediction of Wolbachia infection status

Wolbachia infection status was determined by calculating the mean depth of coverage of the assembly and the breadth of coverage of the consensus sequence using bedtools67. Depth of coverage refers to the average read depth across the Wolbachia genome, while breadth of coverage refers to the number of bases covered by at least two reads. Depth of coverage at each nucleotide was estimated using the genomecov function, while breadth was estimated using the coverage function. Predictions of Wolbachia infection status using illumina data have previously been shown to have 98.8% concordance with PCR based predication of infection status31.

Nucleotide diversity

Levels of polymorphism for mtDNA and Wolbachia were estimated as π in 10 kb windows using VCFtools (v0.1.14)68. To investigate whether the frequency spectrum conformed to the standard neutral model of molecular evolution I also calculated Tajima’s D in 10 kb windows using VCFtools. To assess the significance of deviations in Tajima’s D and π 10,000 simulations were performed using msms conditioned on the number of variable sites and with no recombination69.

Linkage disequilibrium

Linkage between Wolbachia and mtDNA SNPs could potentially be a predictor of co-inheritance of mtDNA and Wolbachia. Linkage was estimated using VCFtools (v0.1.14) using inter-chrom-geno-r2 to estimate r2 between each SNP in the two genomes66.

Estimation of mtDNA and Wolbachia copy number

In insects, the phenotypic effect of Wolbachia will vary depending upon copy number in the host cells9,31. Given that there are two copies of autosomal DNA in a cell, I infer mtDNA and Wolbachia copy number based on the ratio between mtDNA and autosomal DNA. This is intended to provide a relative estimate of copy number rather than an absolute measure. Relative copy number estimated in this way obscures intra-individual variation and variation between tissues, though the authors note that all flies used in constructing the libraries were females of approximately the same age. Wolbachia contains several regions which were excluded due to unusually high coverage across all samples (more than 3x the mean coverage). Average coverage of autosomal DNA was calculated from randomly chosen and equivalently sized nuclear regions for each mtDNA (Scf_3 L:8000000..8014945) and Wolbachia (Scf_2 L:11000000..11445873). The average coverage of each nuclear region, respectively, was then used to normalize estimates of copy number for each genotype. Previously the results of measuring Wolbachia copy number in the same samples using both qPCR and estimates from illumina read depth had a Pearson’s correlation coefficient of 0.79, thus this is a robust approach to measuring Wolbachia titre30.

Phylogenomic analysis

To understand the relationship between Wolbachia infection and mtDNA I reconstructed the genealogical history of each within the sample population. Multiple alignments were generated for both mtDNA and Wolbachia by concatenating fasta consensus sequence files for each genotype. All indels and non-biallelic SNPs were excluded from the dataset prior to generating the consensus fasta for each genotype. RAxML version 8.10.2 was used to reconstruct phylogenies70. Maximum likelihood tree searches were conducted using a general time reversible (GTR) model of nucleotide substitution with CAT rate heterogeneity and all model parameters estimated by RAxML71. Trees were inferred using the rapid bootstrap algorithm and simultaneous estimation of trees and bootstrapping, with automatic estimation of the necessary number of bootstrap replicates.

Association Analysis

The association analysis focused on a relationship between nuclear polymorphisms and Wolbachia and mtDNA copy number. To reduce the need for correction due to multiple testing and focus on regions that may have been affected by selection due to the recent invasion of Wolbachia I used a subset of the nuclear genome identified previously as exhibiting haplotype structure suggestive of recent selection61,72. These regions are unusually long haplotype blocks, thus many of the SNPs within each block are not independent, reducing the need for correction due to multiple testing. Heterozygous bases were coded as missing, and all loci with more than 10% missing data were excluded from the analysis, as well as SNPs with a minor allele frequency of less than 2%, meaning they were present in the population in at least 3 copies. mtDNA and Wolbachia copy number were used for a multivariate analysis of association using plink.multivariate73. To investigate the possibility that Wolbachia copy number is affected by polymorphisms in mtDNA, and vice versa, a single trait analysis was performed using plink v. 1.0774.

Data Availability

The dataset analyzed in the present study is available at the NCBI Short Read Archive from project SRP075682.

Results

Sequencing Data

The autosomal data included in this analysis was reported in61. There was very little variation in both Wolbachia and mtDNA in this population. This included 78 SNPs and indels in the Wolbachia genome and 90 in mtDNA. Reduced diversity has been reported previously in D. simulans mtDNA23,24. The authors note that previous work has established that there is no unusual relatedness in the nuclear genome of this population61.

Infection status

In D. melanogaster lines were scored as infected if they had a breadth of coverage greater than 90% and a mean depth greater than one31. However, that dataset had a clearly bimodal distribution between infected and uninfected lines, where uninfected lines had breadth of coverage less than 10% while infected lines had a breadth of coverage of greater than 90%. As such that this demarcation was a natural interpretation of the data31. In D. simulans, all lines had ~99% breadth of coverage aside from a single line with both a lower overall depth of coverage and 80% breadth (Fig. 1). For this reason, all lines were scored as infected. 100% infection is not unusually high for D. simulans.

Figure 1
figure 1

Wolbachia infection status and relationship to mtDNA copy number. (A) Relationship between depth and breadth of sequencing coverage for Wolbachia assemblies in the D. simulans panel. Depth of coverage is shown in log10 unites and is calculated as the number of reads present at each nucleotide in the reference averaged over every site. Breadth of coverage is the proportion of covered nucleotides in the consensus sequence relative to the reference. (B) Relationship between relative mtDNA copy number and Wolbachia copy number. Both were normalized relative to nuclear coverage. Although separate regions were used to normalize mtDNA and Wolbachia, as they are different sizes, average values were very similar within genotypes. The relationship between mtDNA and Wolbachia copy number is positive (p < 2.4 × 10−7).

Nucleotide Diversity

Estimates of π in Wolbachia ranged from 5.98 × 10−7 to 1 × 10−3, with an average of 1.42 × 10−5, within the range of estimates from Wolbachia in D. melanogaster (7.9 × 10−6 − 2.8 × 10−5)30. The mean of π in simulated populations of Wolbachia is 1.9 × 10−3 suggesting that variation is somewhat reduced in wRi. π in mtDNA is 1 × 10−4 which again is similar to estimates from D. melanogaster (4.34 × 10−4 − 1.51 × 10−3)30.

Overall Tajima’s D was estimated to be −2.4 for D. simulans mtDNA (Fig. 2). Significance of this estimate was assessed using 10,000 simulations in msms conditioned on the number of segregating sites and no recombination, and it is significant at p < 0.05. Tajima’s D in Wolbachia is not significantly different from expectations under neutrality based on 10,000 simulations. Thus, while a selective sweep or population bottleneck seems to have strongly effected mtDNA in D. simulans, the same is not true of the Wolbachia population (Fig. 2). This is very different from D. melanogaster where Wolbachia and mtDNA had similar patterns of nucleotide diversity31. wRi is predicted to have spread in D. simulans much more recently than Wolbachia strains in D. melanogaster.

Figure 2
figure 2

Wolbachia and mtDNA Tajima’s D. 10,000 simulations were performed for Wolbachia and D. simulans each conditioned upon the number of polymorphisms. The actual values in D. simulans mtDNA are outside the 95% confidence interval of the simulations, while Wolbachia is not. There is considerable variation in Tajima’s D across the Wolbachia genome while mtDNA is much smaller and invariant in its values of Tajima’s D.

This is also much more negative than previously reported for mtDNA in D. simulans 23. It is very different from the general patterns of Tajima’s D in the nuclear genome, where average Tajima’s D is 1 and the majority of the genome has a positive Tajima’s D. Simulations in previous work suggest that the pervasively positive values in the nuclear genome may be due to a population contraction, which again indicates that the population dynamics affecting D. simulans nuclear and mtDNA genomes are very different23,61.

Linkage disequilibrium

There was no significant linkage disequilibrium between the genomes of Wolbachia and D. simulans mtDNA. Average LD between Wolbachia and mtDNA SNPs was 2.06 × 10−3. This may be because the infection of D. simulans was too recent for variation to accumulate along particular lineages, and also suggests that D. simulans was infected by a single invasion.

Estimation of mtDNA and Wolbachia copy number

Wolbachia and mtDNA copy number have not previously been measured together in D. simulans. There was considerable heterogeneity in both Wolbachia and mtDNA copy number (Fig. 1). Mean (standard deviation) copy number of Wolbachia is 5.56 (2.45). This is similar to one estimate in D. melanogaster, where mean copy number is 5.57 (3.95) though the standard deviation is lower in D. simulans 31. The reported mean was lower in other populations of D. melanogaster, though still within the same range (2–4.5)30. Similarly mean mtDNA copy number is 33.85 (15.5) in D. simulans and 32.9 (44.5) in one estimate for D. melanogaster 31. This is again not an absolute measure, but relative to nuclear genomic coverage. The lower standard deviation could be due to more precise staging of the age of D. simulans, less background variation effecting copy number (the D. melanogaster sample was from multiple populations), or other unknown mechanisms. There was a positive relationship between mtDNA and Wolbachia copy number (Fig. 1) (p < 2.4 × 10−7). While the functional reasons for or consequences of this are unclear, because they are correlated they will be used in a multivariate analysis of association rather than as separate analyses.

Phylogenomic analysis

To understand the relationship between Wolbachia infection status and mtDNA sequence variation I reconstructed the phylogenetic history of the complete Wolbachia and mtDNA genome using the entire set of 167 strains (Figs 3 and 4). What I found is consistent with previous work on the spread of cytoplasmic incompatibility in D. simulans, as both phylogenies are essentially unresolved. This is not unexpected for mtDNA given previous work in the species which found little within-haplotype variation among the three major mtDNA haplotypes in D. simulans 23,27. Furthermore, of the 167 sequences 88 are identical to at least one other sequence in the sample. While the Wolbachia phylogenetic tree gives the impression of having more resolution than mtDNA, this is likely due to the larger genome, as the branches have similarly low support. Of the 167 strains included in the tree 18 are identical to one or more Wolbachia genomes. Both trees are essentially star phylogenies with the majority of bootstrap support values being less than 30. Bootstrap support of greater than 70, for two branches in the mtDNA tree and five in the Wolbachia tree, is shown (Figs 3 and 4). If uninfected individuals had been included in the dataset perhaps it would be possible to test for congruence between the two phylogenies, however the essentially unresolved trees make it clear that both Wolbachia and mtDNA swept the population recently.

Figure 3
figure 3

Maximum likelihood genealogy of the D. simulans Wolbachia pathogen. All strains were infected with Wolbachia and are included in this genealogy. The underlying data consist of an ungapped multiple alignment of 168 sequences of the entire Wolbachia genome. The unrooted tree was midpoint rooted for visualization and branches with >70% RAxML bootstrap support values are shown in bold. Scale bars for branch lengths are in term of mutations per site. The majority of branches are essentially unsupported by bootstrapping.

Figure 4
figure 4

Maximum likelihood genealogy of the D. simulans mtDNA genome. The underlying data consist of an ungapped multiple alignment of 168 sequences of the entire mtDNA genome. The unrooted tree was midpoint rooted for visualization and branches with >70% RAxML bootstrap support values are shown in bold. Scale bars for branch lengths are in term of mutations per site. The tree is largely unresolved, suggesting recent spread of this mtDNA haplotype through the population.

Association Analysis

I performed the first association analysis between Wolbachia and mtDNA titre and the nuclear genome. Association analysis was performed using plink.multivariate by regressing the line means for mtDNA and Wolbachia copy number on each SNP contained within the previously identified in a scan for selection61. This scan for selection focused on identifying haplotype blocks in LD. This considerably reduces the number of SNPs tested for association, in addition to the fact that the SNPs are in haplotype blocks and are therefore not independent tests61,72. This reduces the need for correction due to multiple testing. I used a p-value cut-off of p < 9 × 10−6 and identified 16 SNPs associated with Wolbachia and mtDNA copy number (Supp. Table 1). Of these 16 SNPs 13 are located in the same region on chromosome 2 R (Scf_2 R: 13550916–13569038). Given the concentration of significant SNPs in a single region, this is also the region I will focus on the most in the following discussion. The region containing 13 SNPs contains nine genes, four of which are involved in CAAX-box protein processing, ste24a-c and a recent duplicate of ste24c CG30461. CAAX-box protein processing is a part of a series of posttranslational protein modifications collectively called protein prenylation which are required for fully functional proteins to be targeted to cell membranes or organelles. It has been shown that pathogenic bacteria can exploit the host cell’s prenylation machinery, though it is unclear if this occurs in Wolbachia 58.

The other five genes are AsnRs-m, which is largely unannotated but is thought to a mitochondrial aminoacyl-tRNA synthetase75. NIPP1Dm is involved in axon guidance and negative regulation of protein phosphorylation76,77. CG6805 is generally unannotated but is inferred to be involved in dephosporylation75. Cbp53E regulates neural development78. Lastly, Ehbp1 is a developmental gene implicated in regulation of the Notch pathway and membrane organization79.

Of the other three SNPs identified in this association analysis two are located at Scf_2 R:5814103 and Scf_2 R:5811043, while the third is located at Scf_3 L:2055556 (Supp. Table 1). Scf_2 R:5811043 and Scf_2 R:5814103 are located in Su(var)2–10 and Phax, respectively. These are neighboring genes, though there is a third gene within 10 kb, Mys45A. Su(var)2–10 is involved in development and chromosome organization, but it has also been implicated in the regulation of the innate immune response and defense against gram-negative bacteria80. Su(var)2–10 is of particular interest given that Wolbachia are gram-negative bacteria, however the potential role of Su(var)2–10 in immune response is not clear. Phax is not well annotated but is inferred to be involved in snRNA export from the nucleus78. Mys45A is potentially involved in actin cytoskeleton organization78. In D. melanogaster Wolbachia uses host actin for maternal transmission, though this has not been verified in D. simulans 81. The last SNP, at Scf_3 L:2055556, is in Connectin, a cell adhesion protein also involved in axon guidance82.

The identification of these SNPs in association with mtDNA and Wolbachia copy number does not imply a functional relationship. Nonetheless, I chose to investigate whether any of these substitutions had an effect on the coding sequence of any of genes in the region. Of the three SNPs found outside the region containing the CAAX-box proteins all were either in introns or regulatory regions. Of the 13 SNPs identified between Scf_2 R: 13550916–13569038 eight are in introns or untranslated regions, including one in the long intron of Cb53E, three in the introns or noncoding transcript of CG6805, and two in the introns of Ephb. Of the remaining five SNPs four are in coding regions but silent, causing no change in the amino acid sequence of the protein. This includes silent mutations in the exons of ste24c and two silent mutations in the exons of Epbh. One SNP located in an exon of ste24a, at 13558515, is an amino acid substitution from a Leucine to a Valine. This is not an uncommon amino acid substitution83,84, though it can be associated with phenotypes85,86. Mutations in introns and untranslated regions could also be having an effect on gene expression or processing, as could other linked SNPs in the region that were not included in the analysis.

Association between Wolbachia and mtDNA

Association analysis was performed using plink by regressing the line means for mtDNA copy number onto the Wolbachia genome and vice versa74. There was no association between Wolbachia SNPs and mtDNA copy number, but the opposite was not true. One SNP in the D. simulans mtDNA affected Wolbachia copy number at p < 3.18 × 10−6 (Supp. Table 1). It is located in the D. simulans homolog of D. melanogaster srRNA which has been implicated in pole cell formation87. Wolbachia is incorporated into the pole cells, the precursor to the germline, in order to be transmitted88.

Discussion

Using high through-put sequencing of a large panel of D. simulans I have reconstructed the genome sequences of mtDNA and Wolbachia. I use these genome sequences to investigate the recent history of Wolbachia and mtDNA in this population, as well as to estimate titre of both Wolbachia and mtDNA. The history of Wolbachia in this population is reflected in the essentially star-like phylogeny of both mtDNA and Wolbachia, indicating recent spread and co-inheritance. Lack of variation at mtDNA and Wolbachia suggests a single spread of wRi in this population as well as strict vertical transmission in the maternal cytoplasm. Variation in Wolbachia is within the range expected under a neutral model, however that was not the case for mtDNA which suggests either a selection sweep or a population bottleneck. Previous studies found similar population genetic patterns at Wolbachia and mtDNA in D. melanogaster, and thus could not distinguish whether selection on Wolbachia was driving similar patterns in mtDNA or vice versa30. The much stronger pattern of negative Tajima’s D in the mtDNA suggests that in D. simulans selection is in fact mitochondrial. There was no linkage disequilibrium between Wolbachia and mtDNA variants, however this is most likely due to fixation of a single mitochondrial haplotype without considerable subsequent mutation.

Currently little is known about how Wolbachia interacts with its host36,37,38,81,89. Understanding these interactions, including regulation of Wolbachia titre, will be key to understanding the evolution of Wolbachia and its hosts. By normalizing Wolbachia and mtDNA copy number using coverage of the nuclear genome I am able to obtain estimates of its abundance. Much as in previous work, mtDNA copy number was higher than Wolbachia copy number, though both varied across strains31. As all of my data was produced from adult females, at the same time, using the same techniques, it is unlikely that this is due to differences in methodology among samples31. Estimates of copy number were very similar to previous work in D. melanogaster, performed with qPCR, and there has been shown to be a high correlation between qPCR and illumina estimates of copy number30,31. These are not absolute measures, rather they are relative to one another and to nuclear copy number, and they provide robust estimates of Wolbachia titre within the population. As the Wolbachia phylogenetic tree is essentially unresolved in this population but there is considerable variation in Wolbachia titre, it is possible that some host factors may be affecting variation in Wolbachia titre.

The history of mtDNA and the nuclear genome is quite divergent in this population. The nuclear genome has an average Tajima’s D of 1 and 5 polymorphisms for every 100 bp61. Simulations suggest that this is due to a combination of population contraction and selection, most likely from standing variation, though many types of sweeps can produce similar signatures61. In contrast the mtDNA genome contains an abundance of low frequency variation, and in fact many of the mtDNA genomes sampled in this population are identical. This is consistent with the recent spread, single origin, and maternal transmission, of wRi in D. simulans. This is consistent with previous work which found low levels of mtDNA variation in D. simulans within a haplotype24,90. This is also consistent with work on Wolbachia which documented the spread of wRi in D. simulans in the 1980’s4,5,8,18,19,20.

While it has been proposed elsewhere, the author is not aware of another association analysis of Wolbachia and mtDNA copy number31. Wolbachia copy number is known to be affected by host background, but the genes or mechanisms involved are not known53,54,56. The fact that four of the nine genes found in the primary region detected in the association analysis are involved in CAAX-box protein processing is of particular interest, given the history of this type of gene and intracellular pathogens. CAAX-box protein processing is one step in the post-translational protein prenylation that is required for fully functional proteins to be sent to the plasma or nuclear membranes. Prenylated proteins include Ras, Rac, and Rho. However, it has been shown that pathogenic bacteria can exploit the host cell’s prenylation machinery58. For example, Salmonella-induced filament A is a protein from Salmonella typhimurium, a gram-negative facultative intracellular bacterium. Salmonella-induced filament A has a CAAX motif required for prenylation to occur, it was shown to be processed by host prenylation machinery, and it is necessary for survival of the bacterium59,91,92. Legionella pneumophila Ankyrin B protein exploits the host prenylation machinery in order to anchor Ankyrin B protein to the membrane of the pathogenic vacuole60. Proliferation of Legionella pneumophila requires Ankyrin B, as does the manifestation of Legionnaires disease. Ankyrin repeat domains are most commonly found in eukaryotes and viruses, though they are rarely found in bacteria and Archaea93. In bacteria they are found in a few obligate or facultative intracellular Proteobacteria58. Wolbachia has an unusually high number of Ankyrin repeat domains with rapid evolution93. Ankyrin proteins play a major role in host-pathogen interactions and the evolution of infections94,95. There is no way to know from the current analysis if the Ankyrin repeat genes are exploiting the host prenylation system but it is an intriguing area for future investigation. The results of this association analysis suggest that some interaction between the pathogen and its host is targeting the protein prenylation machinery.

There was also an association between a polymorphism in srRNA, which has been implicated in pole cell formation87, and Wolbachia copy number. Mitochondrial small ribosomal RNAs are found in the polar granules that contain deposits of maternal transcription factors, and are thought to be a part of the translational machinery96,97,98. Concentration of Wolbachia in the posterior of the embryo, where pole cells are forming, is correlated with degree of cytoplasmic incompatibility99. D. simulans has been shown to have nearly complete cytoplasmic incompatibility, though it is possible there are mutations sorting at low frequency that affect this or that mitigate negative phenotypic consequences of high Wolbachia titre. It has also been demonstrated that gurken is important for Wolbachia titre in the germline in D. melanogaster, and it is involved in pole cell formation beginning at an earlier stage than srRNA suggesting there could be an interaction between the two factors87,89. D. simulans wRi has a different distribution in the cytoplasm from other strains of Wolbachia, as it tends to evenly distribute throughout the embryo while other strains are either concentrated at the posterior, or at the anterior of the embryo away from the pole cells99. Future work in related species may show that these different distributions also mitigate different interactions between host and symbiont, including being effected by different genes and processes within the host.