Insights into Adaptations to a Near-Obligate Nematode Endoparasitic Lifestyle from the Finished Genome of Drechmeria coniospora

Nematophagous fungi employ three distinct predatory strategies: nematode trapping, parasitism of females and eggs, and endoparasitism. While endoparasites play key roles in controlling nematode populations in nature, their application for integrated pest management is hindered by the limited understanding of their biology. We present a comparative analysis of a high quality finished genome assembly of Drechmeria coniospora, a model endoparasitic nematophagous fungus, integrated with a transcriptomic study. Adaptation of D. coniospora to its almost completely obligate endoparasitic lifestyle led to the simplification of many orthologous gene families involved in the saprophytic trophic mode, while maintaining orthologs of most known fungal pathogen-host interaction proteins, stress response circuits and putative effectors of the small secreted protein type. The need to adhere to and penetrate the host cuticle led to a selective radiation of surface proteins and hydrolytic enzymes. Although the endoparasite has a simplified secondary metabolome, it produces a novel peptaibiotic family that shows antibacterial, antifungal and nematicidal activities. Our analyses emphasize the basic malleability of the D. coniospora genome: loss of genes advantageous for the saprophytic lifestyle; modulation of elements that its cohort species utilize for entomopathogenesis; and expansion of protein families necessary for the nematode endoparasitic lifestyle.


Figure S4. Nucleotide sequence alignment of 11 DDE transposases
Only partial sequences are shown with similarities exceeding 90%. Point mutations observed are either C to T or G to A (shown in red).

Figure S5. Gene inventions, duplications and losses mapped to the phylogeny of selected fungi
Evolutionary events were inferred by reconciliation of individual orthologous gene trees with the species tree and subsequently mapped to the different branches of the phylogenic tree. Each branch shows three boxes, one for each type of events: inventions or gene gain; duplications; and loss events. The colors of the boxes indicate whether more or less of this type of event occurred on this particular branch than in the rest of the tree: boxes are colored according to log2 (number of the type of event on this branch divided by the median of this type of events in the tree), with 4 and -4 as maximum and minimum values, respectively.

Figure S8. Analysis of the occurrence of D. coniospora SSP families in a set of reference fungal genomes
Of the 257 SSP families that are predicted in D. coniospora, 133 families also occur in other species. The widths of the bars are proportional to the numbers of SSP families that are shared by the fungi shown with D. coniospora. 45 of the shared SSP families that occur in D. coniospora and other fungi are considered 'sparse' families. To determine which SSP families are sparse, the ancestral species is first inferred in which a given SSP family first appeared. Next, the number of species in which that family is present is compared to the number of species under the node of invention (that is, the maximum number of species this family could occur in if it was completely retained during speciation). If an SSP family occurs in less than a third of the species it could have occurred in, the family is considered to be 'sparse'. The fraction of sparse SSP families present in a fungal species is indicated by the grey scale of the bar, ranging from black in D. coniospora, in which the sparse SSP families were defined and are thus all present, to white in S. cerevisiae where no sparse families are found. The numbers of shared SSP families and the numbers of shared sparse SSP families are also shown, together with the fraction of shared families that are 'sparse' vs. those that are simply shared.

Figure S9. Predicted pathogen-host interaction (PHI) proteins in D. coniospora with verified participation in pathogenicity or cell viability
D. coniospora proteins orthologous to selected proteins in the Pathogen-Host Interaction (PHI) database 5,6 are shown. Only those PHI proteins were considered that affect cell viability or pathogenicity when modified or deleted. The 312 D. coniospora SSPs were compared to proteins of similar size. The result shows that SSPs in D. coniospora contain more cysteines (P <2e-19, Kolmogorov-Smirnov test). In addition, effector proteins often have an even number of cysteines, allowing the formation of disulfide bonds. Indeed, the D. coniospora SSPs more frequently contain an even number of cysteines when compared to other small proteins (P <4e-07, hypergeometric test).

Figure S15. Transcription levels of the NRPS, PKS and NRPS-PKS genes of D. coniospora during the four life stages
The genes that are parts of the same putative gene cluster are marked with black brackets.      Differential expression was defined as larger than four-fold change in expression between at least two of the three in vitro growth stages selected (mycelia, early conidiogenesis, and conidia).

Repetitive elements and transposons
Duplication events associated with recombination, mutation and consequent diversification are major contributors to the evolution of fungal genomes, and facilitate their adaptation to environmental constraints 28 . The complete genome assembly of D. coniospora features a repeat sequence content of 12.5% (4.11 Mb). Although this number is higher than the proportion of repeats published for most other ascomycete genome assemblies (0.1-6.0%) except for that of Hirsutella minnesotensis (36%) ( Table S2) 29 , this discrepancy is probably caused by our successful coverage of the repeat-rich centromere and telomere regions.
Amongst the repeated elements, 731 transposons comprise 2.2% of the genome, with Type I retrotransposons dominating over Type II DNA transposons (618 vs. 113, respectively) ( Table   S3). The retrotransposons gypsy (253), copia (211) and LINE (98)  coniospora-specific repeats (0.5-2 kb) constitute another 3.4% of the genome, with the majority of these elements also accumulating in the centromere and in the terminal regions.
Small D. coniospora-specific repeats (0.2-0.5 kb) occupy 1.5% of the genome and are scattered along the three chromosomes. Thus, the distributions of repeats along the D.
coniospora chromosomes is non-homogenous, with centromeric and terminal regions harboring most of the species-specific repeats and retrotransposons, but contain relatively few coding sequences (Fig. 2).

Repeat-induced point mutations (RIPs)
The and RID is also involved in sexual development in Aspergillus nidulans 33 . DIM-2 was found to act on RIP-affected sequences, but its mutation has no effect on RIP in N. crassa 25 . The presence of the rid gene has been considered a hallmark of RIP competency in sequenced fungal genomes 2,34 .
In D. coniospora, RIP activity may be important to limit the activity of transposons. Thus, local maxima of the TpA/ApT index, indicating enrichment of RIP-related mutations, clearly overlap with retrotransposons along the three chromosomes (Fig. 2). The (CpA+TpG)/(ApC+GpT) index that estimates the depletion of RIP targets reveals a similar trend (results not shown). Clear signs of RIP were evident in the retrotransposons gypsy, copia and DDE, with CpA-to-TpA and CpT-to-TpT mutations dominating (Fig. S3). Multiple sequence comparisons among 11 transposase genes with nucleotide sequence identities larger than 90% showed that single nucleotide polymorphisms were nearly exclusively C vs. T or G vs. A, suggesting recent RIP events (Fig. S4). In contrast, we found no evidence of RIP in the retrotransposon LINE and the DNA transposon cacta.

Cryptic sexual cycle
Sexual reproduction is an important way for fungi to exchange genetic material and is thus one of the drivers of their evolution. Ascomycete fungi may be heterothallic (outcrossing) or homothallic (self-crossing) 13

Asexual sporulation
Copious production of asexual spores is crucial for the pathogenic cycle of D. coniospora.
Considering the observed cell differentiation events that lead to the production of conidia 36 , we expected that the D. coniospora genes necessary for the development of conidiferous pegs and those for the formation, maturation and release of conidia would be similar to those involved in the formation of conidiogeneous cells (phialide) and conidia by the fusaria 37

Adaptation to a reduced lifestyle repertoire leads to proteome contraction
Although the gene density of D. coniospora at 253 genes/Mb is at the low end for related facultative insect and nematode pathogens 2,29 ( , and is also upregulated as compared to the spore stage (more than 8-fold), indicating an active response to nitrogen availability in the host.

Transporters
In a stark contrast to the contraction of many protein families, the number of transporters encoded in the D. coniospora genome is remarkably high (Fig. 4A) (Table S12). The high frequency of putative regulatory network elements, catabolic enzymes, and transporters amongst the proteins with recognizable conserved motifs suggests that D. coniospora metabolism needs to be carefully fine-tuned to the acquisition of host nutrients.

Regulatory circuits and stress response
Lifestyle and host range switches may rely more on the variation of regulatory circuits while maintaining the overall framework of metabolism. Comparison of protein families among fungi from different lifestyles (Fig. 4A) also supports this conclusion. Thus, the numbers of transcription factors, protein kinases, and GPCR-like proteins involved in signal transduction are strongly contracted in D. coniospora, suggesting that this fungus might only respond to a reduced variety of extracellular signals.
Similarity searches against a curated collection of verified stress response elements 42 identified 916 putative stress response proteins (Table S13) are phylogenetically more closely related to D. coniospora than the aspergilli, we propose that LaeA may also act as a negative regulator of sexual sporulation in this endoparasite (Fig. 5).

Pathogenicity islands composed of small secreted proteins (SSPs) and pathogen-host interaction proteins
Manipulation of the host by pathogens often involves the broadcasting of effector proteins that may be expressed exclusively during infection 49 . These are typically small, secreted, cysteine-rich proteins that may be genus-, species-, or even isolate-specific [50][51][52] . SSPs are often involved in host-specific pathogenicity, and include avirulence determinants, toxins, and suppressors of host defense and signaling 51,52 . Compared to intracellular small proteins, the predicted D. coniospora SSPs tend to have more cysteines (p < 2e-19, Kolmogorov-Smirnov test), and the number of these cysteines tends to be even (p < 4e-07, hypergeometric test, Fig.   S12), ensuring a more compact structure to facilitate secretion. Most predicted SSPs (79.

Hydrolytic enzymes
Chitin is a structural component of fungal cell walls as well as arthropod cuticles and nematode egg shells 53 , but is much less prevalent in the collagenous exoskeleton of nematodes 54 . As expected, the insect pathogen M. robertsii and the nematode egg/cyst parasitic fungus P. chlamydosporia feature the largest variety of chitinases, while D.
coniospora and Ar. oligospora have less proteins of this enzyme family ( Fig. 4A and C).
Acid phosphatase activity has been detected at the site of penetration during nematode invasion by D. coniospora 55 . Two of the seven acid phosphatases encoded in the genome (DCS_04121 and DCS_06910) were upregulated in the spore stage and during nematode infection (Table S15B), identifying these enzymes as plausible candidates for further biological characterization. The role of metalloproteases in nematode penetration has been challenged 56 : correspondingly, none of the corresponding genes encoded in D. coniospora were overexpressed during infection, although one (DCS_08224) was preferentially expressed during mycelial growth and sporulation (Table S15B).

Some dehydrogenases show a high level of transcription during nematode infection and/or
mycelial growth (Table S15B). The CYP superfamily as a whole is contracted in D.

Iron acquisition
Every organism needs to maintain iron homeostasis by managing iron acquisition and storage.
Iron acquisition is especially problematic for animal pathogens since this metal is tightly bound by iron-sequestering proteins in the host 59,60 . Conversely, excess iron leads to cellular damage through the generation of reactive oxygen species (ROS) by the Fenton reaction. An exhaustive search for potential orthologs of the iron acquisition systems of A. fumigatus and C.
albicans 59,61,62 clearly indicated that D. coniospora relies on an A. fumigatus-like system to gain access to the iron reservoirs of its host (Table S18).
Reductive iron assimilation (RIA) is the primary mechanism of iron acquisition in the plant necrotroph Ustilago maydis 63

Secondary metabolism
Most synthases/synthetases encoded in the D. coniospora genome were found to be transcribed in at least some of the growth phases examined (Fig. S15, Table S16A). In addition to the highly conserved NRPSs for an aminoaldipate semialdehyde dehydrogenase (DcNRPS11) and for the coprogen and ferricrocin siderophores (DcNRPS9 and DcNRPS6, respectively), the D. coniospora genome also harbors an NRPS gene cluster (centered on DcNRPS7) that shows significant synteni with similar clusters in other Hypocrealean fungi (Fig. S16A). In spite of the sequence conservation, the transcription level of DcNRPS7 is moderate in the growth conditions tested. The conservation of the gene cluster that includes the DcNRPS11 aminoaldipate semialdehyde dehydrogenase (Fig. S16B), together with its stable transcription during all growth periods suggest that this cluster may serve a basic biological function. On the other hand, DcNRPS4 (Fig. S16C), DcNRPS8 and DcNRPS-PKS1 may have resulted from relatively recent acquisition events that may have taken place before the D. coniospora / T. inflatum divergence (Fig. S11).

Drechmerins: structure, biosynthesis and bioactivities
On synthetic media, D. coniospora produces a complex mixture of >20 closely related linear non-ribosomal peptide analogues, with microheterogeneity introduced by amino acid substitutions at several variable positions in the peptide chain. We have tentatively assigned the trivial name "drechmerins" to this peptide family. Mass spectrometry (MS) and tandem mass spectrometry (MS/MS) analyses of partially purified analogues indicate that the drechmerins are related to the leucinostatins 72 , helioferins 73 , roseoferins 74 (Table S19,   possibly an N 1 -methylpropane-1,2-diamine residue such as the one described to occupy the C-terminus of leucinostatin B 76 . Accurate MS/MS data also indicate that the AHMOD units in the drechmerins undergo facile α-β dehydration of the keto-hydroxy moiety (Fig. S14, Table   S19), as has been reported for the leucinostatins 76 . This presents an obstacle to purification as attempts to prevent this conversion, or to drive the dehydration to completion, have so far been unsuccessful. Work is ongoing to overcome this purification challenge or to find a stable derivative so the structures can be fully elucidated using high-resolution NMR and chemical degradation techniques.
We propose that the assembly of the nonproteinogenic amino acids AHMOD and AMD initiates with the synthesis of 4-methyldecanoic acid on DcPKS1, followed by C2 amination to yield AMD, or C2 amination and oxidation of C6 and C8 to yield AHMOD (Fig. 6B).
AIB units may be synthesized by DCS_05098 and DCS_05099 that are orthologous to the AIB synthetic genes TqaM and TqaL, respectively, from Neurospora crassa 77 . These two putative AIB-synthetic genes are not part of the main peptaibiotic biosynthetic cluster, but are located at a different genomic locus on the same chromosome. The free-standing AT encoded by DcPKS2 may load the fatty acyl N-terminal cap onto the first T domain of DcNRPS1, initiating the assembly of a 10-amino acid peptide, likely terminating in an alanine residue.
Release of the peptide may be catalyzed by the reductive release(R) domain of DcNRPS1 with a reduction/amination sequence (or by reductive amination) that may generate the C-terminal capping group and yield the 9-amino-acid peptaibiotics drechmerins. The observed microheterogeneity in the drechmerin congener mixture produced by the fungus would then reflect the relaxed substrate specificities of the corresponding A domains in DcNRPS1.
Peptaibiotics are ionophore antibiotics 78 . Drechmerin-containing crude organic extracts produced clear zones of inhibition against Staphylococcus aureus when tested at 100 μg, but showed no antibiotic activity against Enterococci or Gram-negative bacteria such as E. coli.
These extracts also yielded a clear 6-mm inhibition zone against the corn leaf spot fungus (Helminthosporium sativum) when evaluated at 40 g in a disk diffusion assay, with the most active purified fraction inhibiting growth at doses as low as 1.25 µg. Most importantly, we observed a potent and dose-dependent nematicidal activity of the drechmerin-containing crude dichloromethane culture broth extracts against the root lesion nematode (Pratylenchus penetrans) and the golden nematode (Globodera rostochiensis) (>70% mortality at 1mg/ml, and >50% mortality at 500 µg/ml, scored 48 hr post-application). Initial fractionation and bioassay studies indicated that the antifungal activity consistently co-purified with the nematicidal activities.

Definition of orthologous groups
We performed an all-by-all BLAST search of the 250,458 proteins predicted in our selection of 24 fungal species to identify homologs. We used blastp with the default settings, except that the maximum number of alignments was set to 10,000 rather than 250 79 . We defined the network of homologous proteins by connecting proteins for which the alignments returned by BLAST spanned more than 50% of both the query and the subject sequence (in order to avoid chimeric families due to fusion proteins) and had an E value < 1e-5. The weight of an edge in this network corresponds to the relative Smith-Waterman score (SW score) that is the total SW score of the query and the subject, divided by the total SW score of the query with itself. The total SW score is defined as the sum of SW scores of individual, non-overlapping alignments returned by BLAST for a query-subject pair. This results in two weights per pair (depending on which protein of the pair was the query). To obtain an undirected network suitable for mcl analyses we kept the lowest weight of the two.
Furthermore, all edges with a weight < 0.2 were removed from the network. The network was clustered into families using mcl with default settings and the inflation parameter I set to 1.2 80 .
This resulted in 19426 families.
For each of the 10,177 clusters cluster that consisted of more than three sequences we constructed and trimmed a multiple sequence alignment (MSA, Clustal Omega and trimAL -gappyout) 81,82 . Despite the overlap criterion applied when parsing our BLAST results we found four families in which fusion proteins 'bridged' non-homologous proteins.
We split them up manually, so that at least part of each protein in a cluster is homologous to at least part of every other protein in the cluster. This resulted in 10,180 MSAs. For each of these MSAs, we inferred a phylogenetic tree with RaxML (-m PROTGAMMAIWAG -f a), rooted this tree using midpoint rooting and predicted duplication events using the Species Overlap algorithm implemented in ete2 as this was shown to lead to more sensitive orthology predictions than strict tree reconciliation 83 . We split up families into distinct orthologous groups if we predicted duplication events to have occurred in the last common ancestor of our set of 24 species based on the species compositions in the tree, assuming that no horizontal transfer occurred. Using this approach we arrived at 20,655 orthologous groups, of which 11,291 had an associated gene tree. Miskei