Random Genetic Drift and Selective Pressures Shaping the Blattabacterium Genome

Estimates suggest that at least half of all extant insect genera harbor obligate bacterial mutualists. Whereas an endosymbiotic relationship imparts many benefits upon host and symbiont alike, the intracellular lifestyle has profound effects on the bacterial genome. The obligate endosymbiont genome is a product of opposing forces: genes important to host survival are maintained through physiological constraint, contrasted by the fixation of deleterious mutations and genome erosion through random genetic drift. The obligate cockroach endosymbiont, Blattabacterium – providing nutritional augmentation to its host in the form of amino acid synthesis – displays radical genome alterations when compared to its most recent free-living relative Flavobacterium. To date, eight Blattabacterium genomes have been published, affording an unparalleled opportunity to examine the direction and magnitude of selective forces acting upon this group of symbionts. Here, we find that the Blattabacterium genome is experiencing a 10-fold increase in selection rate compared to Flavobacteria. Additionally, the proportion of selection events is largely negative in direction, with only a handful of loci exhibiting signatures of positive selection. These findings suggest that the Blattabacterium genome will continue to erode, potentially resulting in an endosymbiont with an even further reduced genome, as seen in other insect groups such as Hemiptera.


Materials and Methods
Sequence data. Homologous genes for eight Blattabacterium and five Flavobacterium species were manually compiled from genomes available in GenBank (Table 1). With Blattabacterium sp. Cryptocercus punctulatus as the model genome (as it is the smallest Blattabacterium genome and thus frames the "core" gene set of Blattabacterium and Flavobacterium) we manually BLASTed each loci against all other Blattabacterium genomes, as well as against existing Flavobacterium genomes. Resulting BLAST hits were then manually compiled into single homologue files in nucleotide fasta format. We applied Clusters of Orthologous Groups (COGs) to categorize the function of genes in our dataset. Given that genome-wide COG composition is very similar among Blattabacterium 30 , we assessed composition genome-wide as well as in the subset of homologous genes found in all taxa using the Bacterial Annotation System (BASys 71 ). Trimming, Alignment of Homologs, and Phylogeny Building. All scripts developed for this analysis (pre-processing, alignment, phylogenetics, and tests for selection) can be found at https://github.com/k8hertweck/Blattabacteria. Phylogenetic reconstruction of the evolutionary relationships of the eight Blattabacterium and five Flavobacterium species (with eight Escherichia coli strains as outgroup) was carried out using PhyloPhlan 72 under default parameters and whole genomes obtained from GenBank (Table 1). For each set of homologous genes, the last three base pairs (e.g., the stop codon) of each sequence were removed using Prinseq 73 and each homolog group was then aligned using TranslatorX 71 . Gaps present in more than 10% of an alignment were removed using trimAl and alignments summarized using readAl 74 . The best fitting model of molecular evolution for gene alignments, as assessed by both AIC and BIC in jModelTest2 75 under default parameters, was GTR + G. A maximum likelihood tree for each homologous gene was calculated using this model in PhyML (www.atgc-montpellier.fr/phyml) 76 and assessed using 100 bootstrap replicates (alternative models of evolution did not significantly affect tree topology, data not shown). The maximum likelihood tree for each gene (except mia, which possessed two gene copies for some taxa) was used to create a reconciled species tree using ASTRAL v4.7.8 (github.com/smirarab/ASTRAL/) 77  Selection Analysis. Selection analysis was performed using the HyPhy v2.2.1 (github.com/veg/hyphy) 78 suite of programs. For this analysis, three different selection tests were used: HyPhy's BUSTED 79 , Quick Selection Detection (implementing MEME [Mixed Effects Model of Evolution]) 80 , and Branch Site REL 81 . Each of these programs used the same sets of input data; namely the HyPhy alignment combined with the PhyML tree for each gene. Parameters used for these tests may be found here: github.com/k8hertweck/Blattabacteria/blob/master/ blattabacteriaBUSTED.bf, github.com/k8hertweck/Blattabacteria/blob/master/blattabacteriaQSD.bf, github. com/k8hertweck/Blattabacteria/blob/master/blattabacteriaBranchSiteREL.bf.
Summary statistics for all selection analyses performed were produced using an in-house script that may be found here: github.com/k8hertweck/Blattabacteria/blob/master/blattabacteriaSelectionSummary.sh. Summary statistics and input and output files for selection analysis may be found at: github.com/k8hertweck/Blattabacteria/ tree/master/analysis. Statistics of particular import are those referencing branches under positive selection. This information was drawn from the BUSTED and Branch Sire REL output. Additional statistics were obtained from the output of blattabacteriaGeneSummary.sh (see Methods section 'Trimming, Alignment of Homologs, and Phylogeny Building).

Statistical Analyses.
For analyzing the number of positive and negative sites of selection per gene length, a Linear Model was implemented using R version 3.4.4 82 . Within these models, number of positive and negative selection sites were log transformed. A handful of outliers were noted but retained, as their inclusion had a non-significant impact upon the resulting models. Number of positive and negative selection events vs. individual COG size were carried out using Generalized Linear Models with Poisson distribution with number of selection sites as the response variable and total number of nucleotides in the genome associated with a specific COG as the explanatory variable. As COG and gene length analyses used different datasets -number of selection events per total number of nucleotides of all genes associated with a given COG and number of selection events by gene length, respectively -we found that differing models better fit each type of analysis.

Results and Discussion
Selection by Gene Length and COG Groups. COG analysis indicates an uneven distribution of functional groups within the 304 genes selected for this analysis (Fig. 1). This figure illustrates the functional 'core' genes shared by all thirteen genomes analyzed. The majority of these genes are ribosomal in function. Perhaps unsurprisingly, the number of positive selection events (F-value: 40.872, Df: 1, p-value: 6.16e-10, adjusted R-squared: 0.12) as well as negative selection events (F-value: 189.15, Df: 1, p-value: 2.2e-16, adjusted R-squared: 0.38) both showed strong positive correlation with gene length (Fig. 2a,c, respectively). This finding is consistent with the conclusions of previous studies, where natural selection is also correlated with gene length 83 . Building upon this on a functional level, however, we also noted that signatures of both positive and negative selection (response variable) correlated strongly with the total number of nucleotides assigned to a specific COG (explanatory variable) across the Blattabacterium genome (Positive selection events: Chi-square p-value: 2.2e-16; Negative selection events: Chi-square p-value: 2.2e-16; Fig. 2b,d, respectively). Blattabacterium Selection Analysis. Initial analysis of Blattabacterium homolog sets was carried out across all eight of the fully sequenced strains, using a significance level of p ≤ 0.05 for homology. At this significance level, Blattabacterium displays a strong negative mutational bias, with a ratio of sites under negative selection to sites under positive selection of 11:1 across 304 genes. While most loci within Blattabacterium displayed a bias towards negative selection, a few did exhibit signatures of positive selection (Table 2a). That the vast majority of genes within the Blattabacterium genome are experiencing neutral (Table 2b) or negative (not shown in table) selection suggests conserved selective pressures and genome architectures within established endosymbiont lineages 28,69 . Accordingly, only a small number of loci were found to show no signs of selection at all (Table 2c).
In recent years, a growing body of work seeks to place an increased emphasis on the role of selection in molecular evolution [84][85][86] . While no predominant explanatory theory for molecular evolution has yet emerged to replace the largely disproven neutral theory, a re-evaluation of the classic, primarily neutral/drift-centric hypotheses for genome evolution in Blattabacterium is necessitated. With the data presented here -and in the light of previous studies into the genome evolution of Blattabacteria -we suggest that the Blattabacterium genome is shaped by a combination of random genetic drift, environmental selection, and physiological constraint on genetic variation. The Blattabacterium lifestyle is characterized by significant and repeated population bottlenecks with each host generation as bacterial cells are transmitted vertically from mother to offspring 17,18 , a drastically reduced genome [25][26][27][28][29][30]39 , and elevated rates of mutation. Previous studies into obligate bacterial endosymbiont evolution suggest that the reduction in effective population size through generational bottlenecks and lack of genetic recombination resulting from Muller's Ratchet elevates the rate of fixation of slightly deleterious mutations through random genetic drift [33][34][35][41][42][43][44] . However, populations that experience a population bottleneck recover much of the lost genetic variation through rapid population growth 45 . While it seems likely that this is the case for free-living and endosymbiotic bacteria as well, the strength of the bottleneck affects the loss of genetic variability much more so than subsequent rates of population growth 45 . Within Blattabacterium and many other bacterial endosymbionts, these bottlenecks are not trivial, and are frequently recurring throughout the insect host's lifespan [1][2][3]7,8,16 ; an environment that is completely atypical for most free-living populations. Thus, examined alone, population bottlenecks strongly reduce the genetic variation of Blattabacterium. Additionally, Blattabacterium -like other intracellular bacterial endosymbionts -reproduces asexually and lacks genetic recombination [ 51-56 reviewed in ref. 57 ]; two mechanisms otherwise crucial for the recovery of genetic variation. This combination of factors -lack of genetic recombination and repeated population bottlenecks -does seem to suggest that Blattabacterium and other obligate symbionts are less capable of recovering lost genetic variance after population bottlenecks than free-living bacteria.
However, bacterial endosymbionts also experience much higher mutation rates than their free-living relatives 63,69,70 . Indeed, mutations are synonymous with increased genetic variation, and we show here that Blattabacterium experiences highly elevated rates of mutation compared to free-living bacterial populations. It is highly unlikely that the elevated mutation rates seen in Blattabacterium are adaptive or somehow function in recovering lost genetic variation, as mutations in Blattabacterium show a strong bias towards deletions rather than insertions; a pattern that is in agreement with previous studies as well as with Muller's Ratchet 44,50 . Additionally, it is suggested that reduced strength of selection on many genes in the endosymbiont genome increases the number of nucleotide sites that may be altered without consequences in fitness, strengthening the impact of deletion biases 44 . Bacterial genomes are primarily functional DNA, and the drastic genome reduction observed within Blattabacterium and has come at the cost of physiological functionality. Intriguingly however, this drastic loss in functionality does not yet appear to have strong negative impacts on Blattabacterium survival or host fitness.    Indeed, many physiological tasks are now taken over by the cockroach host, rendering many Blattabacterium genes superfluous within the relatively safe and predictable symbiotic environment [25][26][27][28][29][30]36 . As in other obligate endosymbionts, many if not most of these genes come under relaxed selection, as their function is critical to neither Blattabacterium's survival nor the symbiotic physiological requirements of its cockroach host 32,35 .
Whether or not elevated mutation rates in physiologically-important genes functions to actively reduce genome size and thus streamline bacterial reproduction, or are the result of random genetic drift is unknown; though that many genes lost by Blattabacterium since transitioning to an intracellular lifestyle coded for otherwise critical functionality -including the loss of many genes involved in DNA maintenance and repair [ 19,28,30,59 , reviewed in ref. 57 ] -suggests that many losses are either only mildly non-adaptive or compensated for by the host and thus do not result in immediate impairment of symbiont or host. However, genome reduction is accompanied by a reduction and cell size and a substantial reduction in energy and nutrients requirements, providing an adaptive payoff for the active removal of non-essential genes. Indeed, a number of prokaryotic Prochlorococcus species display adaptive and rapid genome shrinkage, with genomic patterns similar to those observed in obligate symbionts including reduced G + C content, elevated rates of mutation, and the loss of DNA-repair genes 87 . However, despite these similarities, genome reduction in Prochlorococcus is characterized by largely neutral selection, as large population sizes impose low genetic drift and strong purifying selection 87 . Naturally, if genome reduction in Blattabacterium and other bacterial endosymbionts was being driven by adaptive forces and not random genetic drift, then we might expect patterns of selection similar to those in the free-living Prochlorococcus. Instead, we find here that the overwhelming majority of mutations in the Blattabacterium genome are negative in direction, strongly suggesting that genome reduction is not driven by selective processes, but rather by random genetic drift; as has been suggested for numerous other obligate bacterial endosymbionts 32,35 .
Specific genes within the endosymbiont genome are expected to vary among endosymbiont lineages as a function of the metabolic and physiological requirements of the host species. As such, these species-specific genes vital to bacterial survival and/or host fecundity experienced elevated selective pressures for their persistence within the Blattabacterium genome. We suspect that many genes in Blattabacterium involved in functions critical to this bacterial-host symbiosis display neutral or positive signatures of selection. Thus, while random genetic drift appears to play a strong role in shaping the Blattabacterium genome, physiological constraint acts to maintain Blattabacterium's functionality as a primary nutritional endosymbiont across the cockroach lineage. Accordingly, the Blattabacterium genome architecture and composition is the result of the interplay between random genetic drift and the fixation of slightly deleterious mutations on one hand and physiological constraint promoting maintenance of cockroach-required metabolic functionality on the other.
When compared to the signatures of selection and patterns of evolution noted within other obligate bacterial symbionts, Blattabacterium shows striking similarity. While the ratio of negative to positive selection sites of 11:1 is specific to Blattabacterium-Flavobacterium comparisons, similar patterns of strong negative selection have been observed in other insect endosymbiont genomes 69 . Unsurprisingly then, our results conform to the findings of Brynnel et al. 33 , who also measured that the tuf gene of Buchnera is evolving more than 10 times as quickly than the same gene in the free living E. coli and S. typhimurium. Additionally, Blattabacterium -like Wigglesworthia and Buchnera -does show some evidence for maintaining those functions that are highly important to its insect host 33,63,66,70 . Indeed, the combined effects of Muller's Ratchet appears to be ubiquitous within obligate insect bacterial symbionts: the Buchnera chaperonin groEL displays a 5-fold increase in non-synonymous mutations, and a 10-fold increase in synonymous mutations, when compared to E. coli 63 . Mutational pressure alone likely does not account for the magnitude of these d N /d S rate elevations. Within Buchnera, it has been suggested that this elevation of fixation occurs through random genetic drift resulting from the continual reduction of effective endosymbiont population size with each transmission from host parent to host offspring 33,34,88 . Given that this same elevation of polymorphisms is observed within Blattabacterium -and that Blattabacterium also undergoes similar population bottlenecks with each host generation -it is likely that similar mechanisms are shaping these two independent lineages. This also parallels the findings of Brynnel et al. 33 , whom suggested that the rate of synonymous codon substitution within Buchnera can be as much as 40 times higher than its free-living relatives.

Blattabacterium -Flavobacterium Selection Comparison.
Blattabacterium displays elevated levels of both positive and negative selection events at a significance level of p ≤ 0.05 when compared to free-living Flavobacterium, indicating a genome-wide increase in mutation rates across the examined genes. In order to ensure that these patterns are not the result of sequences displaying radically different divergence times, we performed a phylogenetic analysis (Fig. 3) to elucidate the sequence similarity within each examined group. Phylogenetic analysis of both the Blattabacterium group (Table 3) and Flavobacterium group (Table 4) indicate similar levels of phylogenetic divergence between the individuals of each 22,89,90 . events within either Blattabacterium or Flavobacterium genomes. 'Position' indicates that genes starting position within the Mastotermes darwineinsis genome, the model Blattabacterium genome used here. Letters refer to COG functional categories as follows. C -Energy production and conversion; D -Cell division and chromosome partitioning; E -Amino acid transport and metabolism; F -Nucleotide transport and metabolism; G -Carbohydrate transport and metabolism; H -Coenzyme metabolism; I -Lipid metabolism; J -Translation, ribosomal structure and biogenesis; K -Transcription; L -DNA replication, recombination and repair; M -Cell envelope biogenesis, outer membrane; N -Cell motility; O -Posttranslational modification, protein turnover, chaperones; P -Inorganic ion transport and metabolism; Q -Secondary metabolites biosynthesis, transport, and catabolism; R -General function prediction only; S -COG of unknown function; T -Signal transduction mechanisms.
Scientific REPORtS | (2018) 8:13427 | DOI:10.1038/s41598-018-31796-6 In addition, each group displays comparable percentages of identical sites (Blattabacterium: 89.4%, Flavobacterium: 87.8%) as well as similar pairwise percent identities (Blattabacterium: 95.7%, Flavobacterium: 93.3%) when aligning the ribosomal 16S rRNA gene. Thus, extant Blattabacterium display signs of elevated rates of genome evolution in the form of increased levels of selection events. The increase in the number of sites experiencing negative or positive selection when compared to the free-living Flavobacterium suggests elevated levels of functional protein evolution in the endosymbionts. Only a limited number of loci display similar selection profiles between Blattabacterium and Flavobacterium (Table 2d).
Results of MEME selection analysis indicate that all genes analyzed show at least some evidence of negative selection. Sites under negative selection comprise approximately 86 percent of examined loci. However, four  loci, at one site each, show evidence for positive selection ( Table 5). Three of the four loci showing evidence for positive selection are involved in DNA or RNA modification: 2-methylthioadenine synthetase, Holliday Junction resolvase, and 50S ribosomal protein L25 subunit. Within E. coli and Salmonella typhimurium, variations of the protein 2-methylthioadenosine have been shown to stabilize codon-anticodon interactions through the restriction of first codon position wobble during tRNA aminoacylation 91,92 . This functionality prevents the misreading of the genetic code, thus reducing the likelihood of mutation. Additionally, Holliday Junction resolvase-like proteins have been shown to play key roles in DNA recombination and repair [93][94][95] . Finally, genes responsible for the production of ribosomes within a cell are crucial for the proper translation of proteins from mRNA 96,97 .
Modifications to genes responsible for the production of ribosomal proteins will likely impact the efficiency and/ or accuracy of protein translation and assembly. Given the broad reduction in functionality of the Blattabacterium genome, and the loss of many ancestral DNA and RNA maintenance and repair genes ( Fig. 1) 8,98,99 , it is in some ways not surprising that all currently-described Blattabacterium strains display similar selection pressures on those remaining genes responsible for the maintenance of genetic material. However, of notable absence from our list of genes showing signatures of positive selection is the molecular chaperone and maintenance gene GroEL. These sequences are part of the larger GroL locus in modern Blattabacterium genomes, regions of which were found previously in Blattabacterium to be under positive selection 99 . This inconsistency likely arises from the outgroups used in each study. We utilized Blattabacteria's closest free-living relative, Flavobacterium 16 In contrast to the previous genes, however, which are involved in the maintenance of genetic material, the remaining locus found to show signatures of positive selection, atpG, codes for ATP synthase F1 subunit gamma. ATP synthase-family subunit proteins typically combine to form an ATP synthase complex, which is responsible for energy production in the form of ATP within the cell 101,102 . One of the primary functions of Blattabacterium within its host is amino acid synthesis. Amino acid production is a very endergonic process, requiring large amounts of energy in the form of ATP in order to effectively carry out biosynthesis 103 . Therefore, beneficial modifications to genes coding for an ATP synthase subunit that result in the more efficient functioning of ATP synthase as a complete complex are more likely to be favored within the Blattabacterium genome. In keeping with the previous findings that all Blattabacterium strains examined to date are alike in their function to provide essential and nonessential amino acids to their cockroach hosts, here we demonstrate that Blattabacterium also share signatures of positive selection within genes responsible for the production of the ATP synthase F1 subunit.

Conclusions
Our findings indicate that the Blattabacterium genome is experiencing elevated rates of both positive and negative selection when compared to its free-living relative Flavobacterium, approaching a 10-fold increase in selection rate at the significance level p ≤ 0.05 across 304 individual genes. In combination with previous studies elucidating the evolutionary patterns in other insect endosymbionts, we conclude that the Blattabacterium genome is shaped by similar evolutionary mechanisms. Previous studies have outlined the current state of the Blattabacterium   genome, which is drastically reduced from its ancestral state and possesses a very strong bias towards A + T nucleotide base pairs. Analysis of these trends indicate that Blattabacterium are experiencing an accumulation of slightly deleterious mutations through the continued effects of random genetic drift resulting from consecutive population bottlenecks throughout Blattabacterium's evolutionary history, with physiological constraint acting to maintain genes important to bacterial survival and host fecundity. Additionally, Blattabacterium has lost many of the genes involved in DNA repair, likely through similar mechanisms discussed here, thus exacerbating this evolutionary bias towards slightly deleterious mutations. That these mutations cannot be repaired increases functional protein evolution rates within this endosymbiont. The patterns discussed here are highly similar to those evolutionary and genomic trends observed in other intracellular insect endosymbionts 34,45,61,90 . Additionally, our analyses also provide insight into the direction of selection of loci within the genome. A vast majority of loci in all Blattabacterium genomes analyzed here show signs of negative selection. Only a small fraction of loci (miaB, Holliday Junction, rplY, atpG) show signs of positive selection. These observations are in accordance with our previous understanding of the evolutionary history of Blattabacterium, as well as its function within its cockroach host as a nutritional endosymbiont aiding in the recycling of nitrogenous waste and the production of both essential and nonessential amino acids. The analysis presented here could be augmented through a robust analysis of genome reduction within Blattabacterium. Using a parsimony approach, the ancestral genome of another primary insect endosymbiont, Buchnera-Ap, was reconstructed by Moran and Mira 32 . The results of Moran and Mira's analysis indicated that much of the ancestral Bucnhera genome was lost during a relatively small number of large deletion events shortly after this bacteria's transition to an intracellular lifestyle. While it is likely that that the Blattabacterium genome was reduced through similar mechanisms, a similar reconstruction within this group would offer us a more complete picture of the evolutionary origins of this unique cockroach endosymbiont.

Data Accessibility
All data used herein was procured from public NCBI databases; see Table 1.