Niche differentiation and evolution of the wood decay machinery in the invasive fungus Serpula lacrymans

Ecological niche breadth and the mechanisms facilitating its evolution are fundamental to understanding adaptation to changing environments, persistence of generalist and specialist lineages and the formation of new species. Woody substrates are structurally complex resources utilized by organisms with specialized decay machinery. Wood-decaying fungi represent ideal model systems to study evolution of niche breadth, as they vary greatly in their host range and preferred decay stage of the substrate. In order to dissect the genetic basis for niche specialization in the invasive brown rot fungus Serpula lacrymans, we used phenotyping and integrative analysis of phylogenomic and transcriptomic data to compare this species to wild relatives in the Serpulaceae with a range of specialist to generalist decay strategies. Our results indicate specialist species have rewired regulatory networks active during wood decay towards decreased reliance on enzymatic machinery, and therefore nitrogen-intensive decay components. This shift was likely accompanied with adaptation to a narrow tree line habitat and switch to a pioneer decomposer strategy, both requiring rapid colonization of a nitrogen-limited substrate. Among substrate specialists with narrow niches, we also found evidence for pathways facilitating reversal to generalism, highlighting how evolution may move along different axes of niche space.

Malt Extract Agar (MEA) and inoculated with seven 0.5 x 0.5 cm mycelial plugs. Thirty-five wood blocks were added to each box and incubated at 20 °C in the dark for two months until all wood blocks were well inoculated. Strains were paired by scraping off the mycelium on the surface of respective wood blocks and placing the 2 x 2 cm sides of two wood blocks together with cut tracheid ends facing each other, and fixing them together using a sterile rubber band.
A total of ten replicates for each combination of strains was set up. After approximately five months of growth the dominant strain in each wood block was determined by sampling from three central sampling points along each wood block that had had surface mycelia removed and split. Therefore, six cultures were prepared on MEA for each paired interaction. Strain identity was determined either by visual assessment or by producing a sanger sequence of the ITS region as a barcode. Please refer to Balasundaram et al. [1] for additional details. Results were scored as the proportion of samples from which each strain was recovered in a particular confrontation and statistical significance of deviation from the initial proportion of 0.5 at the start of the experiment was assessed using Person c 2 goodness of fit tests.
2) RNA extraction, library preparation and sequencing Samples were frozen under liquid nitrogen and ground to a fine powder using a pestle and mortar. CTAB buffer (2% CTAB, 2% PVP, 100 mM Tris-HCL, 2 M NaCL, 25 mM EDTA, 300 µL/15 mL beta-mercaptoethanol) was added and incubated at 65 °C for 15 minutes, before two extractions with one volume of chloroform for 10 mins. RNA precipitated overnight in 1/3 volume 8 M LiCl was centrifuged at 10,000 RPM for 40 mins at 4 °C and resuspended in molecular grade water. Additional RNeasy Mini Kit and RNase-Free DNase Set (Qiagen, Hilden, Germany) clean-up steps were included. 2) Genome sequencing of S. lacrymans var. lacrymans SL198 (lacJ) To facilitate full phylogenomic analysis, we sequenced and assembled the genome of the Japanese var. lacrymans strain SL198. DNA of the strain SL198 was extracted and sequenced according to Balasundaram et al. [1]. The preparation of an Illumina 108 bp paired-end (PE) library and sequencing of one lane of Illumina GAII was performed at SNP&SEQ Technology Platform in Uppsala, Sweden, yielding a total of 24,893,665 paired-end reads. Raw reads were trimmed using Trimmomatic tools v.0.35 [2], and the genome was assembled with Velvet de novo assembler [3], using the same settings as for SL200 and SHA17-1 [1]. This resulted in an assembly comprising 1,827 scaffolds > 1000 bp and an N50 of 54kb. The full assembly statistics are shown in SI Table S2. Genome completeness and redundancy were assessed using BUSCO v.2 [4] with the Basidiomycota reference database. BUSCO results for the SL198 assembly were comparable with the previously assembled genomes, suggesting a nearly complete gene space (97% of full length BUSCO genes recovered) with low levels of redundancy (0.4% of duplicate BUSCO genes).

3) Genome annotation improvement
We used the "Just Annotate My Genome" pipeline https://github.com/genomecuration/JAMg to produce updated annotations for the genomes sequenced in (1) and the newly sequenced SL198.
Finally, EvidenceModeler [7] was used to combine gene predictions into a set of proposed high quality gene models for each locus. In order to ensure maximum coverage of the gene space, we screened individual de novo predictions, as well as the initial set of gene predictions from [1] for full length gene models not overlapping the high quality set. Putative non-overlapping gene models were kept if their predicted coding sequence was longer than 300 bp and they were expressed with at least 2 fragments per kilobase per million reads (FPKM) in either of the three conditions. Genome annotations yielded between 12,800 and 16,174 gene models (SI Table S3). Of these, between 68.8% and 80.8% were expressed with an FPKM of at least 1 among the sequenced samples. A BUSCO run in 'protein mode' produced completeness values between 89.7% and 97.4%, mirroring estimates calculated based on the unannotated genome sequences (SI Table   S3). This suggests that annotations are representative of the assembled gene space.
Predicted proteins were annotated for functional domains using InterProScan v5.4-47.0 [11] with the options: -iprlookup -goterms -pa -dp. PFAM annotation was run using hmmscan from the HMMER package v3.1b2 [12] with e-value cutoff 0.1 and PFAM version 27.0 [13]. Finally, proteins were assigned to conserved orthologous groups using eggNOGmapper [14] with eggNOG database version 4.5.1 [15] in one2one ortholog mode and the following search settings:-d fuNOG. For GO term enrichment analysis non-redundant annotations from InterProScan and eggNOG were merged. CAZyme detection and assignment to families were done using the standard procedure used for the daily updates of the CAZy database (www.cazy.org, [16]).
Using the procedure outlined above, we were able to assign approximately 70% of genes to eggNOG ortholog groups across all strains, while 47-48% of genes were associated with one or more GO terms (SI Table S3).

4) RNA-seq experiment data processing and QC
Raw reads were processed with Trimmomatic v0.35 [2] to remove adapter sequences and poor quality bases, using the following settings: ILLUMINACLIP:  Fig. S1). PC1 separates SCD and sawdust samples and PC2 separates the different wood types for all four strains. Total percent variation explained by the first two PCs ranged between 86% (lacJ) and 99% (shim), suggesting that most variation is driven by our experimental design.

5) Phylogenomic reconstruction
In order to reconstruct evolutionary histories of gene families and identify orthologous genes in the four genomes studied, we implemented a phylogenomic reconstruction pipeline.
Clustering of amino acid sequences into gene families resulted in a total of 10,584 clusters (see Dryad Annotation File). Of these, 7,316 were shared among all strains (SI. Fig. S2A). The percentage of protein coding genes clustered into gene families was 81% (10,418), 78% (10,442), 79% (10,171) and 76% (12,268) for lacE, lacJ, shas and shim, respectively, indicating similar proportions of singletons under our strict clustering settings.
For each cluster, protein sequences were aligned using PRANK v150803 [20] with default parameters. Poor quality regions were removed from alignments using TCS scores as implemented in TCoffee [21], filtering residues with scores < 4. The best-fit model of evolution was determined for each trimmed alignment using ProtTest v3.4.2 [22] and the following settings: -JTT -WAG -G -S 1 -AIC. Maximum Likelihood (ML) trees were estimated with RAxML v8.2.10 [23] using the model of best fit and the options -f a -N 10. Gene trees were improved using the species-tree aware error correction tool TreeFix [24] with the "long" settings --nquickiter=100 --niter=1000 and the best model of evolution for each alignment.