Contrasting host–pathogen interactions and genome evolution in two generalist and specialist microsporidian pathogens of mosquitoes

Obligate intracellular pathogens depend on their host for growth yet must also evade detection by host defenses. Here we investigate host adaptation in two Microsporidia, the specialist Edhazardia aedis and the generalist Vavraia culicis, pathogens of disease vector mosquitoes. Genomic analysis and deep RNA-Seq across infection time courses reveal fundamental differences between these pathogens. E. aedis retains enhanced cell surface modification and signalling capacity, upregulating protein trafficking and secretion dynamically during infection. V. culicis is less dependent on its host for basic metabolites and retains a subset of spliceosomal components, with a transcriptome broadly focused on growth and replication. Transcriptional profiling of mosquito immune responses reveals that response to infection by E. aedis differs dramatically depending on the mode of infection, and that antimicrobial defensins may play a general role in mosquito defense against Microsporidia. This analysis illuminates fundamentally different evolutionary paths and host interplay of specialist and generalist pathogens.


Supplementary
. Conservation of core eukaryotic gene (CEG) set across Microsporidia genomes. The percent coverage of genes with significant Blast similarity is shown for alignments above and below the recommended 70% coverage threshold, which can indicate partial gene structures.  Figure 2. Phylogeny of Microsporidia based on 217 single copy core genes. The phylogeny was estimated using RAxML 1 and a PROTCATLG model of evolution. Bootstrap support (BS) and gene support frequency (all GSF) is shown above each node. In an attempt to increase GSF, we re-estimated the phylogeny with the 71 genes with an average bootstrap value of 80 or higher. The resulting phylogeny was identical in topology and bootstrap support to the original tree, and showed increased GSF (80 GSF) across most nodes. Genome size (as actual and estimated), presence or absence of RNAi machinery, and ortholog distribution is also shown (core, in all genomes; auxillary, in two or more genomes; unique, in one genome).

Supplementary Note 1: AT-rich expansion and codon/amino acid usage bias in E. aedis
Both genes and repetitive sequences are distributed across the E. aedis assembly, with no large regions absent of genes or repeats. The lengths of intergenic regions in E. aedis follow a bimodal distribution, in which most pairs of genes are separated by large intergenic regions but some have remained closely linked (Supplementary Fig. 3).
Comparing the relative codon usage for a given amino acid for E. aedis and V. culicis, AT-rich codon frequencies are increased and the GC-rich codon frequencies are decreased ( Supplementary Fig. 4, Supplementary Fig. 5). This AT-bias has also impacted the frequencies of amino acids found in proteins. In E. aedis proteins, amino acids with the highest average GC of their codons are under-represented while amino acids with the lowest average GC of their codons are over-represented, even when only conserved genes are considered ( Supplementary Fig. 5). A similar bias in amino acid usage has been previously reported for other AT-rich genomes 3 .

Supplementary Note 2: Phylogenetic position of E. aedis
While all nodes in this phylogeny displayed high bootstrap support, analysis of individual gene trees uncovered discrepant branching order around some tree nodes. We therefore calculated gene support frequencies (GSF) 4 , i.e. the percentage of individual core gene trees that supported each node in the tree, using RAxML to estimate trees for individual single copy core genes under the same model as the concatenated gene tree. This revealed a low GSF at the base of the branch to E. aedis. Removing genes with low phylogenetic signal from the analysis (gene trees with bootstrap support < 80) and re-estimating the phylogeny with the 71 well supported genes, as suggested in ref. 4 did not improve support for the placement of E. aedis (Supplementary Fig. 2). This calls into question the phylogenetic placement of this species and suggests that more taxa may be needed to fully resolve the basal branching order. The genome-wide low GC content likely contributes to the difficulty in robustly placing E. aedis on the microsporidian phylogeny.

Supplementary Note 3: Genome compaction
Another strategy for genome compaction is the reduction of protein length, as previously shown the genes in Encephalitizoon cuniculi, Enterocytozoon bienusi, and Octospora bayeri compared to fungal orthologs 5,6 . Here, analysis of core eukaryotic genes (CEGs, 7 ) revealed more broadly that Microsporidia CEGs overall have shorter coding sequences than their fungal orthologs. We compared 91 CEGs found in most of our microsporidian and fungal genomes and found that fungal CEGs are significantly longer than microsporidian CEGs (p < 1.68e-13; Wilcoxon signedrank test). The average difference is 66.1 +/-72.6 amino acids, and a histogram of these differences is shown in Supplementary Fig. 6. This suggests that not only are there evolutionary constraints on microsporidian gene content and intron structure, but coding sequencing length as well.

Supplementary Note 4: Pathway loss in E. aedis and V. culicis
Genome compact has also occurred by loss of metabolic pathways, though different species have lost or retained different pathways. The V. culicis genome has lost five genes the GPI anchor biosynthesis pathway, leaving only GPI8, GPI13, and GWT1; in addition, the GPI anchor synthesis gene GPI10 has been lost in all Microsporidia except E. aedis and Nematocida. It is unclear whether these reduced complements of GPI anchor synthesis genes allow V. culicis to manufacture GPI anchors, or if these genes have acquired some functional redundancy in another pathway. The V. culicis genome also encodes a reduced sphingolipid biosynthesis pathway relative to E. aedis (Fig. 2A). The V. culicis genome encodes a single ceramide synthase as the final step of this pathway, while the E. aedis genome encodes two ceramide synthases (paralogs of each other), and also the next enzyme in the pathway, SCS7 desaturase (along with its CYB5 cofactor), which allows for the production of more complex ceramides. The V. culicis genome does encodes phosphatidylserine decarboxylase, which is involved in the synthesis of aminophospholipids, whereas the E. aedis genome lacks this pathway. The phylogenetic distribution of the isoprenoid pathway is nearly the opposite of phosphatidylserine decarboxylase ( Fig. 2A), and these two pathways may in some way encode some functional redundancy. Outside of metabolism, the E. aedis genome encodes three genes (SSU72, ESS1, and SPO14)) involved in Ser5 and Ser7 phosphorylation of RNA polymerase II that are all missing from the V. culicis genome.  Table 1).
RPG introns, typically found near the 5' end of genes, included orthologs of the RPGs identified as spliced in the T. hominis genome 8 plus the L1 RPG. Spliced genes with transmembrane domains had introns in the middle portion of the gene, and six of the genes contained two introns each separated by an 11 base exon. Splicing of these introns was variable; there were RNA-Seq based transcripts with no introns spliced out, both introns spliced out, and either the first or second transcript spliced. Intron retention produces a heavily truncated protein, suggesting alternative splicing does not produce variation in cell surface proteins. As in En. cuniculi 9 , the majority of V. culicis genes in all three functional classes were inefficiently spliced (median efficiency = 25.3%, Supplementary Table 1); the differential splicing of surface proteins may simply result from this inefficiency.

Supplementary Note 6: Phylogenetic profiling of proteins associated with splicing
To search for genes encoding other potential pre-mRNA splicing factors in Microsporidia, we searched for genes whose phylogenetic profile matched species with verified splicing. Genes matching this profile are found in at least 5 of the 9 Microsporidia with splicing but are absent in all the species without splicing (Supplementary Data 2). Of the 29 gene clusters that matched this pattern, 13 were broadly conserved with other fungi while 16 were largely specific to Microsporidia. Eleven genes also conserved among fungi are known components of the spliceosome, including Sm proteins (D1, D2, D3, and F), U2 snRNP proteins (Cus1, Hsh155, Prp9, and Prp11), a U1 snRNP protein (Luc7) and other splicing factors (Bud31, Yju2). Of the 16 other clusters enriched in or specific to Microsporidia, two are involved in polyadenylation. In addition, given that the majority of genes shared with Fungi are known to be involved in splicing, some fraction of the Microsporidia-specific set, including 10 clusters without assigned function, could represent proteins involved in Microsporidia-specific splicing functions.

Supplementary Note 7: Gene-level examination of pathway enrichment in intracellular stages of E. aedis
We conducted gene level examination of pathways identified as being enriched in intracellular stages of E. aedis. This revealed that GPI anchor construction and isoprenoid synthesis components were nearly universally upregulated in intracellular stages (Supplementary Fig. 7).
Furthermore, of the five COPI subunits identified in E. aedis (COP1, RET3, SEC21, SEC26, SEC27), all were significantly upregulated in intracellular stages. COPI normally plays a role in Golgi-to-ER transport, and potentially intra-Golgi transport, although a study of A. locustae showed COPI and COPII vesicles were not formed 10 . However, among Microsporidia A. locustae uniquely lacks much of the COPII machinery based on our ortholog analysis; this leaves open the possibility that COPI and COPII function differently in A. locustae than other Microsporidia. In addition to potential roles in trafficking proteins for secretion, upregulation of COPI machinery may be related to spore production, as the Golgi and ER play a central role in microsporidian spore morphogenesis and/or cell structure, as microsporidian spores lack Golgi 11 .