Introduction

One of the main tasks of evolutionary biology is to reveal how organisms adapt to different ecological niches and lifestyles. Trophic lability may shape the specific genomic landscape. To tackle such questions, a genomic prediction that uses both populations- and individual-level data could provide complementary insights into the mechanisms underlying trophic adaptations [1]. Fungi represent some of the primary drivers of ecosystem processes and are the ideal model for addressing these issues. Fungi have evolved diversified nutrition strategies (biotrophic, necrotrophic, and saprotrophic lifestyles), which likely exert evolutionary pressures contributing to genome evolution [2]. For several groups of plant symbiotic fungi, such as the ectomycorrhizal basidiomycetes (e.g. Laccaria bicolor), ericoid mycorrhizal (ERM) fungi, sebacinaceous fungi, and root dark septate endophytes (DSEs), the analyses of genome architecture uncovered the mechanisms of adaptations to two main trophic statuses: primary symbiotrophy and transient saprotrophy [3,4,5,6].

Foliar endophytes (FEs) are fungi that colonize the healthy photosynthetic tissues of plants without causing diseases [7]. Unlike the aforementioned fungal symbionts, FEs are characterized by highly localized infection and hyper-diversity within leaf tissues [7]. There is a growing consensus that horizontally transmitted FEs serve as defensive inducible mutualists to protect plants through the production of bioactive compounds [8,9,10,11]. This is contrasted to vertically transmitted grass endophytes which refer to constitutive mutualists. The “Foraging Ascomycete” hypothesis [12] predicts that saprotrophs may shift life-history strategies to endophytism to better disperse and persist [13,14,15]. Despite the ubiquitous distribution of FEs and their central role in plant-environmental interactions, the common adaptation strategy that enables an endophytic lifestyle at the evolutionary timescale remains elusive. Furthermore, there has been no concerted effort to explore the intraspecific genome-level polymorphism in FEs [16], hampering our understanding of trophic mode-associated genetic variations within populations.

We seek to bridge this gap by focusing on a model endophyte system on the relict fir species Abies beshanzuensis (Pinaceae), which is a critically endangered tree species left from the Quaternary glacial period (only three individual plants are known in the wild and all within 100 m of one another) and endemic in Qingyuan, Zhejiang Province (China) (Table S1) [17]. Earlier, we recorded a high level of endophytic fungal species diversity on this relict fir [18]. Among these species, Pezicula neosporulosa (Helotiales, Ascomycota) – recently recognized as a new species – dominates the needle tissues [19]. Besides, P. neosporulosa is known to be a rare species that has otherwise only been sampled twice in the Netherlands [19]. Because the helotialean ancestors of Pinaceae endophytes have been co-evolving with their hosts for more than 300 million years [20], it can be considered that P. neosporulosa has already been shifted to endophytic habitat. Research on P. neosporulosa is beginning to gather pace in part due to its broad spectrum of phenotypic and physiological polymorphism (observed quantitatively different litter decomposition activity and production of antibiotic compounds) over their very restricted geographic range [19], indicating intriguing population biology. In fungi, such cryptic morpho-physiological variants may be an underappreciated source of intraspecific ecologically relevant phenotypic diversification [21]. The prevalence of this phenomenon calls for a holistic understanding of its evolutionary benefits. For these reasons, P. neosporulosa provides an excellent opportunity to understand how it adapts to endophytic lifestyle at both individual and population levels.

In this work, we hypothesize that the evolution of P. neosporulosa could have shaped a set of common genomic features and evolutionary mechanisms facilitating its endophytic phase. We first focus on describing the evolutionary history of carbohydrate-active enzymes repertoire (CAZymes) and secondary metabolite gene clusters (SMGCs) gene families (expansions or contractions) of this endophyte by comparing genomes of individuals, two closely-related saprotrophic Pezicula species, and other plants beneficial helotialean fungi. We also performed a pan-genomic strategy to identify changes in gene family size and pan-genome dynamics based on 11 representative individuals. We point to population intra-specific polymorphism within an evolutionary context to identify the natural selection signatures. We further ask a related question: do recombination events likely occur within this population? Given that sexual recombination contributes greatly to genetic variations and adaptation of fungi, we intend to search for sexual recombination footprints and examine the reproduction system in P. neosporulosa.

Materials and methods

Pezicula neosporulosa isolates

In total, 75 endophytic P. neosporulosa isolates, and two P. neosporulosa isolates (CBS 101.96 and CBS 102.96) from the CBS-KNAW Culture Collection were used in this work. Detailed information on their origin and cultivation is provided in Supplementary Information Text S1.

Annotation of CAZymes and biosynthetic gene clusters

Genomes from 35 fungal species related to P. neosporulosa were downloaded from the JGI MycoCosm database (http://jgi.doe.gov/fungi) [22]. Two genomes from saprotrophic P. sporulosa (CBS 225.96) and P. cinnamomea (CBS 240.96) were sequenced in this study. Our comparative analyses focused on CAZomes (www.cazy.org) [23] and gene clusters involved in secondary metabolite biosynthesis. Three tools were used in combination for CAZomes annotation in dbCAN2 (http://cys.bios.niu.edu/dbCAN2, v2.0.1) [24]. The genes and gene clusters involved in secondary metabolism in 38 fungal species were predicted using antiSMASH v4.0.2 [25].

Phylogenomic reconstruction and molecular dating

Gene families or orthologous groups of these species were determined by OrthoFinder v2.3.8 with the default inflation parameter of 1.5 [26]. We identified 1497 homologous single copy gene families. After filtering short low-quality genes (encoding proteins with <200 amino acids), a total of 1,169 single-copy genes were used for constructing a phylogenomic tree based on a concatenated sequence alignment. The single-copy orthologous protein-coding sequences (CDS) were aligned using MUSCLE [27]. The unambiguously-aligned conserved blocks were extracted using Gblocks 0.91b with default parameters [28]. The concatenated alignment was used to create a Bayesian inference of phylogeny using MrBayes v3.1.2. The phylogeny was calibrated using two calibration points as follows: the most recent common ancestor (MRCA) of Leotiomycetes and ERM were estimated to have occurred c. 148 and 118 Mya, respectively [6]. Divergence time of each tree node was inferred using Bayesian Markov-chain Monte Carlo (MCMC) tree (MCMCTree) package in PAML v4.9 with the GTR nucleotide substitution model.

Gene family evolution

To monitor the dynamic changes during the evolution of gene families, we estimated gene family size expansion and contraction on each branch of the phylogenomic tree. We clustered the protein sequences into families using the Markov Cluster Algorithm (MCL) [29]. Possible expansions or contractions of gene family were estimated using the Computational Analysis of gene Family Evolution (CAFÉ) software package v4.2.1 [30]. Expansions or contractions of gene families with a p value ≤ 0.05 were considered significant.

Metabolomic analysis using high-performance liquid chromatography (HPLC)

While previous work has shown significant variation in antimicrobial activity among isolates, we proceeded to analyze the secondary metabolites produced by six representative isolates of endophytic P. neosporulosa through the HPLC technique. The metabolite extraction procedure and HPLC condition are described in Supplementary Information Text S1.

Pan-genome construction

To further infer intra-genomic diversity and trophic mode-specific markers of evolution and adaptation, we constructed the P. neosporulosa pan-genome. Illumina-based platform often yield highly fragmented genome assembly, which will bias the genome annotation. In our case, however, we still can obtain assembled genomes with relative high quality for several isolates when mapping reads to the reference genome M44 (Table S2). The pan-genome is pieced together using 10 short-read sequenced isolates (including all strains in HPLC analysis) covering the different physiological traits, and mating types (discussed below). This set of isolates provides us the opportunity to conduct a pilot study to estimate the pan-genome structure, which was constructed iteratively. Briefly, the list was started by including all the genes from M44, then, genes from other genomes were compared to the current pan-genome with BLASTn and added if the E-value was >1 × 10−10 according to the methods described by Walkowiak et al. [31]. The calculation approach of pan-genome is available in Supplementary Information Text S1.

Variants (SNPs and indels) calling and quality filtering

Whole-genome re-sequencing of 77 individuals was performed (see Supplementary Information Text S1). High-quality reads were aligned to the reference genome using BWA v0.7.17 with the MEM module [32]. SAM alignment files were sorted and converted into BAM files with SAMtools v1.3 [33]. Then HaplotypeCaller, CombineGVCFs, GenotypeGVCFs, SelectVariants, and VariantFiltration, implemented in the GATK package [34], were applied to call, select, and filter SNPsusing criteria “QD < 2.0 || FS > 80.0 || MQ < 20.0 || SOR > 3.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0 || QUAL < 40.0”. All SNP sets were filtered for a maximum of 50% missing data per site and a minimum minor allele frequency (MAF) of 0.02.

Genome-wide scan of natural selection signatures

We tested for deviations from neutrality using Tajima’s D [35] and Fu and Li’s D/D* and F/F* [36] statistics to search for signals of natural selection using VariScan v2.0.2 [37]. In addition, non-synonymous to synonymous substitution (pN/pS) ratios (measure the efficacy of selection to purge slightly deleterious mutations) [38], nucleotide diversities (θπ), and MAFs were obtained for all CAZymes and SMGCs to further confirm the signal for natural selection. Full details on statistical hypothesis and genome-wide scans for selection are provided in Supplementary Information Text S1.

Recombination detection and identification of MAT loci

We estimated the recombination events of the population using SplitsTree v4.14.4 [39]. We identified MAT loci in P. neosporulosa to understand which mating system (homothallic and heterothallic) is predominant, and check if mating type allele frequencies deviated from 1:1 ratio, which would indicate frequent asexual reproduction. The method for accurate identification of MAT loci is given in Supplementary Information Text S1. To validate these identified MAT-encoding genes, we designed several pairs of degenerate primers using Primer Premier 6.0 to amplify the target regions with PCR and Sanger sequencing (Table S3). As a result, 11 isolates were randomly selected for validation of their different mating types.

Results

The genome of P. neosporulosa resembles that of unrelated DSEs or other plant-associated helotialean fungi

The M44 reference genome for P. neosporulosa was assembled into 58 scaffolds with a total length of 54.3 Mbp and an N50 of 1.53 Mbp. A 95% completeness was confirmed, with 720 complete (718 single-copy BUSCOs and two duplicated BUSCOs), five fragmented, and 33 missing BUSCO orthologs (a total of 758 BUSCO groups were searched, https://busco.ezlab.org/frames/fungi.htm) [40]. More details on the genomic features of P. neosporulosa are presented in Table S4.

To better understand how evolution has acted on key regulatory gene families over evolutionary time, we investigated the gene family expansion/contraction events that took place on the Pezicula lineage. Results indicated that the branch to the Pezicula lineage showed a history of substantial family contractions and expansions, greatly exceeding that observed on other branches (Fig. 1A). In total, we identified 169 gene families exhibiting significant expansions (97, a total of 661 genes) and contractions (72, a total of 85 genes) in the P. neosporulosa compared to 35 reference genomes (Mann–Whitney test, p value < 0.05). We detected a significant enrichment of GO terms that were over-represented in the gene family expansion including monooxygenase (GO:0004497; adjusted p value = 1.10 × 10−5), oxidoreductase (GO:0016705; adjusted p value = 0.003), hydrolase (GO:0004553; adjusted p value = 0.0009), and chitinase (GO:0004568; adjusted p value = 0.0008) (Fig. 2), which were implicated in saprotrophic ability. Only one significantly enriched GO term in contracted families was identified for molecular functions: protein binding (GO:0005515; adjusted p value = 4.50 × 10−7).

Fig. 1: Reconstructed gene family and expansion histories along a time-calibrated phylogeny.
figure 1

A Phylogenomic species tree constructed with 1169 core orthologous single-copy genes using the GTR model with gamma-distributed rate variation. Two important groups of mutualistic helotialean fungi, including dark septate endophytes and ericoid mycorrhizal fungi, are indicated in dark blue and purple, respectively. The arrows indicate two calibration points. The numbers of expanded (orange) and contracted (blue) gene families at each internal node corresponded to the size of the bubbles. On the Pezicula clade and P. neosporulosa branch, the numbers of expansions and contractions that were statistically significant are shown in brackets. For better visualization, bubbles at terminal nodes or nodes with no clear gene family size changes are not shown. Gene number and genome size of all species are shown on the right side. Two Eurotiomycetes (Aspergillus spp.) and Xylona heveae (Xylonomycete) are the outgroup taxa. Bayesian posterior p values of 1.00 are indicated by a black square. Text with gray shading indicates three Pezicula species. B Heat map and dendrogram of all detected SMBCs. Text with gray shading indicates that P. neosporulosa is close to two DSE species in relation to their SMGCs patterns. C Comparison of the CAZymes repertoires identified in genomes of three Pezicula species. CAZomes categories include Glycoside Hydrolases (GHs), GlycosylTransferases (GTs), Carbohydrate Esterases (CEs), Polysaccharide Lyases (PL), Carbohydrate-Binding Modules (CBMs), and the Auxiliary Activities enzymes (AAs).

Fig. 2: Identification of significantly over-represented GO terms among these significantly expanded gene families.
figure 2

Description of enriched GO terms associated with significantly expanded gene families through a two-dimensional scatterplot space derived by applying multidimensional scaling to a matrix of the GO terms’ semantic similarities through REVIGO (http://revigo.irb.hr/). The bubble color indicates significance (−log10 p value), and size indicates the frequency of the GO terms in the underlying gene ontology database.

To determine the evolutionary relationships between P. neosporulosa and other members of the Leotiomycetes class, we constructed a phylogenomic tree based on 1169 single-copy core orthologous genes in 38 Pezizomycotina genomes (Fig. 1A). The three Pezicula species were monophyletic and occupied the basal position in the order Helotiales. The MRCA of Pezicula and other helotialean fungi was estimated to have occurred c. 127 Ma (95% CI, 120.1 to 136.0). The subsequent prediction that P. sporulosa and P. neosporulosa diverged c. 3.4 Mya (95% CI, 2.2 to 4.9), suggesting that these species diverged very recently compared to many other fungal genera. The phylogenomic analysis reveals that P. neosporulosa is distantly related to other helotialean plant mutualists, such as ERM and DSEs (Fig. 1A). However, the plant pathogenic species included in this study are also separated by considerable evolutional distance.

Considering the important roles of secondary metabolites and plant cell wall degradation ability in determining fungal lifestyles, we focused on identifying gene repertoire related to the CAZome and secondary metabolite biosynthesis. In total, we detected 84 secondary metabolites biosynthetic gene (SMGC) clusters in P. neosporulosa (supplementary material dataset), of which 29 belonged to the type I polyketide synthases (T1PKS), 16 to the non-ribosomal peptide synthases (NRPS), seven were the NRPS/PKS hybrids, ten belonged to the terpene synthase group, and one to the type III polyketide synthases (T3PKS). The remaining SMGCs were assigned to the unknown group (Fig. 1B). Unexpectedly, the total number of SMGCs of P. neosporulosa was higher than that of most other helotialean fungi (Table S5) and exceeded an average of 48 clusters for the order. In particular, the expansion of NRPS, T1PKS, and currently unknown gene clusters was detected. The cluster analysis of the SMGCs revealed that the endophytic P. neosporulosa is sharply different from two sister Pezicula species that are putatively saprotrophic but are very similar to the two other DSE species, such as Acephala macrosclerotiorum and Phialocephala scapiformis. The above two species, together with P. neosporulosa, share nearly the same number of genes involved in the production of phosphonates and have a reduced number of NRPS and beta-lactone biosynthetic genes compared to P. sporulosa and P. cinnamomea. Considering the very short evolutionary distance between Pezicula species (Fig. 1A), this observation points to the potential role of SMGCs in the trophic adaptation of P. neosporulosa to foliar endophytism. In line with this observation, Looney et al. [41] also speculated that a reduction of NRPS may be required for adopting a symbiotic lifestyle for ectomycorrhizal fungi.

The CAZome assembly responsible for plant cell wall degradation is also highly variable in this order (Fig. 1C). The P. neosporulosa genome is rich in predicted CAZymes (595 genes), which is similar or only slightly less than a few other DSEs and ERM in Leotiomycetes, such as Acephala macrosclerotiorum, Oidiodendron maius, Meliniomyces bicolor, and Phialocephala scopiformis. We further compared the composition of auxiliary activities (AAs) gene families. Among these, laccases (AA1) and cellobiose dehydrogenases (AA3)-encoding genes are directly involved in lignin degradation, and we found that the number of AA1 and AA3 genes exceed what is reported for most other fungal species but are comparable to mycorrhizal species, such as M. bicolor, and the DSEs P. scopiformis, and Phialocephala subalpine. In addition, cellobiose dehydrogenases (AA3), glucooligosaccharide oxidases (AA7), cellulose-degrading auxiliary activity family 9 (AA9), and β-1,4 endoglucanase (glycoside hydrolase family 5, GH5) also showed a similar pattern. Interestingly, the comparison of the three Pezicula species revealed that P. neosporulosa had a considerably richer CAZome than the two saprotrophic species (Fig. 1C), which also points to the unique ecological adaptation of this species. The principal component analysis revealed that the content of P. neosporulosa CAZome is unique with a sign of proximity to A. macrosclerotiorum and Meliniomyces bicolor that both are beneficial associates of plants (Fig. S1).

Thus, the phylogenomic analysis revealed the genetic separation and the recent speciation in the genus Pezicula and the unique genomic feature of the endophytic P. neosporulosa that resembles unrelated helotialean plant-beneficial fungi by the composition of CAZomes and the richness of SMGCs.

Gene family expansions in accessory genome highlight the adaptation of P. neosporulosa to foliar endophytism

As assessing the physiology of a species based on a single isolate is not sufficient, we have characterized and curated the pan-genome using 11 isolates with a sufficiently good quality of assembly to account for intraspecific variability (Fig. 3). We found that the pan-genome of P. neosporulosa remains open as the curve showing the relation of the pan-genome size and the number of isolates did not reach saturation and is also evidenced by the estimated exponent γ (= 0.037 ± 0.0004) > 0 (Fig. 3A). In this dataset, 13,271 and 16,881 genes constituted core- and pan-genomes, respectively (Figs. 3B and 3D). More specifically, only very few orphan genes (a total of 648 genes private to all isolates) were present in a range of 8 to 218 genes per individual. The plot for estimating the number of new genes added by each genomic sequence fitted a decaying exponential, and mathematical extrapolation predicts that an average of 74 new genes will be identified for every new genome sequenced (Fig. 3C).

Fig. 3: Pan-genome components of eleven P. neosporulosa isolates.
figure 3

A Estimation of P. neosporulosa pan-genome size. The curve is a least-squares fit of the power law to medians. B Estimation of P. neosporulosa core genome. The parameter κc is the amplitude of the exponential decay, τc is the decay constant, and Ω measures the best-fit value of the core genome. C Estimation of new genes; the number of individual-specific genes is plotted as a function of the number of genomes sequentially added. The parameters κs and τs are equivalent to κc and τc, and tg(θ) measures the best-fit number of specific genes. D The proportion of genes in the core, dispensable, and private genomes. E Distribution of COG categories between the core genome, accessory genome, and variable genomes.

In the context of nutritional adaptation and possible switch from saprotrophy to symbiotrophy, we aimed to identify if mutualistic and saprotrophic potentials are fueled by dispensable genes, which might have adaptive values. The outcomes of the attribution of the KOG functional categories to the core, pan, and accessory gene pools of 11 P. neosporulosa isolates are depicted in Fig. 3. Accessory genomes are enriched in genes encoding for the proteins assigned to category Q (secondary metabolism) and category G (carbohydrate transport and metabolism) (Fig. 3E).

We also provided direct chemical and molecular evidence from six isolates to demonstrate the variable secondary metabolite productions and diverged SMGC clusters organization (Fig. 4). Though several peaks were identical, most were different in peak areas, and some were unique among HPLC profiles. Similar differences were also observed in their SM gene clusters. These gene content polymorphisms were most likely generated through deletion or insertion. One prime example involves the NRPS-T1PKS hybrid gene cluster (Fig. 4D). We discovered that this cluster differed in the number and type of transport and additional modification genes associated with them, thus adding further potential to produce structurally diverse compounds from these isolates. Intriguingly, the M86 isolate lacks transport-related and other modification genes compared to remaining isolates; this could in part contribute to its poor metabolite production, evidenced by fewer peaks and peak areas in their metabolome profiles (Fig. 4C), although the chemical structures have not yet been identified.

Fig. 4: Incomplete saprotrophy-biotrophy transition and intraspecific physiological variations in P. neosporulosa.
figure 4

A A schematic outline showing the endophytic lifestyle of P. neosporulosa determined by a diverse repertoire of genes related to CAZymes and SMGCs. AA9 and GH5 are the potential genetic determinants of endophytism. The Foraging Ascomycete (FA) hypothesis is presented to elucidate the benefits of endophytism. B Semiquantitative plate assays were performed to determine intraspecific variations in antibiotic activity, cellulase, laccase, and peroxidase production from six representative P. neosporulosa isolates. Fusarium oxysporum was used as a pathogen for the confrontation test. Congo red, syringaldazine, and Azure-B were used to detect cellulase, laccase, and peroxidase activities, respectively. This result supports the notion that the individuals did indeed display a remarkable degree of a physiological diversity. C Comparison of HPLC profiling of secondary metabolites produced by six isolates. The compounds were detected using a UV detector at a wavelength of 210 nm. D Comparison of micro-synteny and sequence similarity between SMGC clusters in six isolates investigated. Left: a Bayesian phylogenetic tree inference from 10,956 core orthologous single-copy genes. The approximately-maximum-likelihood phylogenetic tree was constructed using FastTree v2.1.7 under a JTT substitution model. Differences in gene content in the NRPS-T1PKS hybrid gene cluster in six isolates are shown. All predicted genetic components of this cluster are located at one scaffold.

A high level of standing genetic variation within the P. neosporulosa population

A total of 19.3 billion paired-end reads (comprising 198 Gb of high-quality raw nucleotide data) were generated FROM sequenced 77 P. neosporulosa strains. The average sequence depth per individual was 67.8× coverage (ranging from 41.3 to 122.6×) (Table S6). We surveyed genome-wide patterns of genetic variation in non-overlapping 1 kbp sliding windows and generated a Circos plot for visualization (Fig. 5A). Totally, we recovered 1,912,822 quality-filtered SNPs after singleton exclusion and the average SNP density of 35 SNPs/kbp, corresponding to diversity θW and θπ of 0.0073 and 0.0093, respectively. It suggested a relatively high level of total genetic diversity. A total of 967,088 non- transposable elements (TE) SNPs (50.56%) were located at intergenic regions. SNPs in genic regions, excluding TEs, included 375,450 synonymous and 200,988 nonsynonymous SNPs (Fig. 5B, C). The site-frequency spectrum for the various functional SNP classes indicated that both nonsynonymous and synonymous polymorphisms were skewed toward lower frequencies, suggesting that they were enriched for slightly deleterious mutations (Fig. 5D). An extensive genomic distribution of small insertions and deletions (indels, 1–40 bp) were located throughout the genome at an average density of 2.94 indel/kbp A total of 70,231 insertions and 89,481 deletions were identified, with average lengths of 4.06 and 4.72 bp, respectively (Fig. 5E). We annotated 15,333 (9.60%) indels in coding regions of the genome.

Fig. 5: Summary of single nucleotide polymorphisms and indels in the P. neosporulosa population.
figure 5

A Genome-wide averages of summary statistics were visualized using Circos. The data tracks are organized concentrically from the outer circle to the inner circle: distribution of SMGC clusters, gene density, TE density, SNP density, indel distribution, Tajima’s D, Watterson’s theta (θW, nucleotide polymorphism based on segregating sites), and Circos plot of the 58 scaffolds scaffold. The graph was plotted using Circos with a consecutive, non-overlapping 1 kbp sliding window. B, C Distribution of SNPs among functional effect classes compared to the proportion of sites in the reference genome. D Minor allele site-frequency spectrum. E Statistics of short indel (insertions and deletions) distribution. F Recombination pattern based on SplitsTree neighbor-net network analysis. Branch lengths represent pairwise Hamming distances (uncorrected P distances). The un-rooted radial phylogram showed a star-like topology without clear reticulations, suggestive of a low degree of recombination. The branches are shadowed in light blue indicate the isolates carrying mat1-2-1 (b) idiomorph (see more details in Fig. 6).

Restricted recombination and loss of sex in P. neosporulosa

To ascertain if the frequent sexual recombination will result in increased intraspecific genetic diversity, we next set out to search for population-level signs of recombination in the P. neosporulosa population. We assessed the extent of network structure among branches. A striking feature of the non-recombining network is the obtained combinations of 27 extremely long branches that have several terminal short-branched networks each. We speculated that this pattern might represent the typical pattern of diversified clonal expansions. Overall, the star-shaped topology with no geographic structure showed no evidence of recombination, indicating that adaptive radiation had occurred in P. neosporulosa.

The extremely low level of recombination is a sign of clonal, strongly inbreeding, or selfing population structure. We, therefore, ask whether P. neosporulosa undergoes sexual reproduction in either a heterothallic (self-sterile) or homothallic (self-fertile) fashion. We analyzed the structure and allele distribution of the mating-type loci determined by searching the P. neosporulosa genome for homologous MAT-locus sequences from known helotialean fungi. Intriguingly, we identified a putative unique reproductive system in this fungus, which contained two divergent mat1-2 idiomorphs in a ratio (30:47) that did not deviate from 1:1 (see Fig. 5F; χ2-test, p = 0.174) (Fig. 6A). The former individuals carry genes from mat1-2 and mat1-1 located at a single locus, with gene arrangement (mat1-1-1, mat1-1-5, and mat1-2-1) consistent with primary homothallism, while the latter lacked opposite mat1-1 idiomorphs (Fig. 6B). Furthermore, we revealed the presence of two types of mat1-2-1 idiomorphs with dissimilar sequences of different sizes (342 and 298 amino acids for mating types 1 and 2, respectively), but each allele contained the conserved PRkXseXrrR sequence in the C-terminal of the high-mobility group (HMG-box) domain. The mat1-2-1 cladogram revealed the genetic divergence of the mat1-2-1 idiomorphs, and one of them was clustered with two other Pezicula species (Fig. 6A). This suggests that the duplication of mat1-2-1 is quite old and that the two alleles have different evolutionary histories. The structure of mating-type 2 loci was similar to that of the homothallic Hymenoscyphus albidus [42], of which mat1-2-1 clusters with the heterothallic form mating type 1, suggesting an independent origin of this structure. We did not find isolates carrying only mat1-1 idiomorph, which suggests the absence of a heterothallic sexual system. The organization of the mating-type loci and flanking regions (apn2 and cox13) is shown in Fig. 6B. PCR and Sanger sequencing validated the occurrence of these mating-type genes in different groups (Fig. 6C).

Fig. 6: Mating type structure, polymorphism in the P. neosporulosa genome and phylogeny of mat1-2-1.
figure 6

A mat1-2-1-based phylogeny among helotialean-related fungi via the maximum likelihood (ML) method using PhyML v3.1 with the default model of HKY85. Sequences were derived from our work and ML bootstrap values/Bayesian posterior probabilities above 85% and 0.99, respectively, are indicated as black filled circles on nodes; ML bootstrap value below 80% while Bayesian posterior probabilities above 0.99 are indicated as gray filled circles on nodes. B Organization of the two types of the mating locus in each genome. C PCR validation of mat1-1-1, mat1-1-5, and two mat1-2-1 idiomorphs from representative isolates; in each sample, we found a single band corresponding to one idiomorph.

A set of CAZymes and SMGCs show signatures of selection

The high density of SNPs around SMGCs (Fig. 5A) implies that there may be selective maintenance of diversity in specific regions due to balanced polymorphisms what would be an alternative explanation of the above observations. To explore this possibility, we estimated genome-wide distribution plots for neutrality test statistics to see whether there are patterns in DNA sequence variation that fit the expectations of the hypothesis of neutrality. Data showed that both empirical and permuted null distribution of all neutrality test statistics calculated from SNP data in 50-kbp intervals is skewed toward positive statistics (Fig. S2). The average windowed Tajima’s D, Fu, and Li’s D*, F*, D, and F in the population were 0.58, 1.57, 1.39, 1.73, and 1.54, respectively, indicative of a genome-wide excess of intermediate frequency variants.

To look for potential outliers, we compared the distribution of diversity and test statistics over the average of the whole genome and gene families. The results indicated that the GH, PL, NRPS, and T1PKS families had higher average Tajima’s D values than the observed genome-wide distribution data (right-tailed Mann–Whitney U-test, p values = 6.23 × 10−5, 0.0041, 0.00013, and 0.0078, respectively) (Fig. 7A). A one-sided test for an excess of rare variants is justified here, as we are interested in sequences showing extremely positive Tajima’s D.

Fig. 7: Selection signatures acting on the CAZomes and SMGCs.
figure 7

A Comparison of Tajima’s D statistics among the four gene families (GH, PL, T1PKS, and NRPS) and the whole-genome average at the population level. A 1-kbp sliding window was used to calculate the test statistics values. Significance was assessed by comparing the observed sliding-window values of each metric against a null genome-wide distribution derived through random permutation (right-tailed Mann–Whitney U-test). ** and *** indicate significance at the p < 0.01 and p < 0.001 levels, respectively. B Scatterplots of the Tajima’s D with θπ, Fu and Li’s D*, and pN/pS. First, we chose 24 and 81 candidate genes encoding CAZymes and secondary metabolite biosynthetic enzymes, respectively, which showed positive outlier values of at least one neutrality test statistic. Second, we randomly selected 100 genes as the control (genome average). The sample () function was used to obtain the random sampling in R v3.6.2. Third, we compared the θπ, Fu and Li’s D* (as an example), and pN/pS for plotting to detect the strong selection signatures. We set the thresholds as follows: Tajima’s D > 95% percentile of null genome-wide distribution and values of θπ, Fu and Li’s D* were set above the genome average, and pN/pS > 1. Gray lines are indicated in each plot as thresholds. The shaded dots in each scatterplot represent the genes under strong balancing selection. A right-tailed Mann–Whitney U-test is performed to compare whether the average θπ, Fu, and Li’s D* and pN/pS values of the two groups of candidate genes are higher than those of the reference gene set. The one-sided test for an excess of rare variants is justified here because we are interested in sequences showing extreme positive Tajima’s D, Fu, and Li’s D*, and pN/pS values. C local regions of diversity and patterns of selection. Tajima’s D and θπ values are plotted for a 1-kbp sliding window across four representative genes belonging to GH15, T1PKS, NRPS, and terpene synthase, showing a strong signal for balancing selection with outstanding peaks of elevated Tajima’s D and θπ. The x-axis represents the position on the scaffold. D The distribution of minor allele frequencies spectrum of corresponding genes compared to the null distribution of minor allele frequencies and to the whole genomic average.

We next sought to determine how many genes residing in each family show selection signals. We assessed the significance of Tajima’s D and Fu and Li’s D/D* and F/F* values against the genome-wide distribution. To avoid false positives, we report significance for statistical values that exceeded the positive 5% thresholds of the genome-wide non-parametric permutation distributions. Taking Tajima’s D statistics as an example, this resulted in assigning a cut-off value of ≥2.41 to separate outliers.

Our data revealed that many genes showed positive outlier values. In brief, values of 24 CAZymes and 81 SMGCs exceeded the 95% tails of the genome-wide distributions of at least one test statistic, suggesting that selection has acted on these genes. Among these, genes within the GHs and T1PKS families recorded the highest numbers of outliers (10 and 31 genes, respectively, Table S7). We further created stringent criteria to confirm their strong selection signatures by correlating Tajima’s D values with θπ, Fu and Li’s D* (as an example), and pN/pS (three scatterplots are shown in Fig. 7B). As a control, we looked at 100 randomly selected genes from the entire genome dataset. The higher pN/pS (>1) values obtained suggested that these were among the fastest evolving regions in the genome and considered to be under balancing selection [38]. Subsequently, we compared the values for test statistics between the average of two gene sets and reference genes. Except for pN/pS, the observed significant differences in other statistics (right-tailed Mann–Whitney U-test, all p values < 0.01) exceeded random fluctuations and were likely influenced by selection forces (Fig. 7B). One CAZyme and eight SMGCs showed outlier values for Tajima’s D and pN/pS values > 1. Notably, the gene found to have the highest pN/pS (estimated to be 2.88) was t1pks, with 23 nonsynonymous and eight synonymous mutations, indicating an enrichment for nonsynonymous mutations. We further showed four candidate genes that exhibited a particularly high level of nucleotide diversity (Fig. 7C). Within each genic region, at least one or two significant peaks exceeded the upper 5.0% tail of the genome-wide diversity distribution, with low diversity in the surrounding regions, suggesting these are among the fastest evolving regions in the genome, and evolving under very low evolutionary constraints.

To explore the extent to which allele frequency distributions in the candidate loci differed from those of the whole genome, we compared the minor allele frequency (MAF) spectra. The MAF for genome-wide SNPs exhibited the expected frequency of rare alleles, consistent with the idea that most polymorphisms are neutral or nearly neutral (Fig. S3); however, the overall MAFs for many candidate genes’ SNPs was not consistent with neutrality (Fig. 7D). The effect of balancing selection was to shift the selected loci toward a distinguishable enrichment of MAF of 0.3–0.5 (right-skewed).

Discussion

This study reveals the genomic hallmarks of foliar endophytism of a relict fir-associated endophytic fungus and uncovers its evolving narrative at multiple individual and population levels. Helotialean mutualistic fungi improve the fitness of many plants, including those forming mycorrhiza [6, 43, 44]. Intriguingly, unlike genomes of specialized or obligate plant-associated ectomycorrhizal, arbuscular mycorrhizal fungi and some endophytes, in which the CAZome sizes are remarkably reduced [45, 46], helotialean mycorrhizal and endophytic fungi have a rich repertoire of genes encoding lignocellulolytic enzymes of carbohydrate metabolism [5, 44, 47], reflecting their facultative biotrophy or incomplete saprotrophy to biotrophy transition. This pattern suggests that they have similar genetic repertoires to both plants pathogenic or saprotrophic species. However, large-scale comparative genomics and functional verifications could provide some new clues on the identification of a core set of CAZymes genes contributing to endophytic lifestyle [48,49,50]. In our case, we found two potential genetic determinants of endophytism: GH5 and AA9, which are richer in plant beneficial helotialean fungi (gene numbers ranging from 9–33 and 1–24, with an average of 24 and 15, respectively) than closely-related pathogenic or saprotrophic relatives (gene number ranging from 4–22 and 0–27, with an average of 15 and 11, respectively). GH5 and AA9 are thought to be vital for establishing ectomycorrhizal symbiosis and endophytism adaptation [49, 50].

As mentioned above, instead of improving plant nutrient uptake provided by the helotialean mycorrhizal and root endophytic fungi, helotialean FEs most likely act as defensive mutualists of trees by functioning as antagonists of pathogens or pests. This process is mainly driven by producing a series of biologically active secondary metabolites [9, 20]. Schulz et al. [51], Tanney et al. [52], and Yue et al. [53] revealed that many endophytic Pezicula species constituted an immense reservoir of antimicrobial compounds, which are in line with our findings. The amount of identified SMGCs in the P. neosporulosa genome is essentially higher than in other helotialean fungi, suggesting a vast and still uncharacterized metabolic potential. More interestingly, the evolutionary distance between the three Pezicula species is very small, but they are genomically and ecologically distinct. P. neosporulosa genome differs from the two other species and resembles those from unrelated plant beneficial helotialean fungi enabled by subtle yet highly specific changes in the gene content of a couple of SMGCs. It thus clearly suggests that within symbiotic Helotiales, mutualistic endophytism can be achieved among unrelated taxa using highly similar genomic toolkits, indicating the possibility of convergent evolution.

It could be speculated that gene family expansions or contractions play a major role in shaping organismal adaptation and the emergence of evolutionary innovations [54]. The specialized expansions of several lignocellulose-degrading gene families could be envisaged to adapt to environmental challenges [55]. We also attempted to construct the P. neosporulosa pan-genome to see if there are some expanded unique genome contents contributing to endophytic trophic advantage. We posit that enrichment in accessory genes related to CAZomes and secondary metabolite biosynthesis may be involved in maintaining endophytism, which is consistent with previous investigations [56, 57].

A key result in the present study is the pronounced standing diversity in P. neosporulosa population, suggesting sufficient variation for this fungus to evolve in response to environments. Intraspecific genetic variation in fungi is pervasive, and its role in trait evolution is increasingly appreciated [21, 58,59,60]. Indeed, genetic polymorphisms have been characterized in populations of the endophytic fungal Rhabdocline spp. (Helotiales) and Lophodermium spp. (Rhytismatales) from Pinaceae and mycorrhizal O. maius forming ericoid mycorrhizae with plants [16, 61,62,63,64]. However, less attention has been directed to their ecological and evolutionary implications. In this study, we assessed the magnitude of the polymorphism based on whole-genome sequence comparisons. It is reasonable that both SNPs and indels contribute to the extensive intra-genomic variations in the P. neosporulosa genome. The average SNP density (35 SNPs/kbp) is much higher than other reported fungal populations [65, 66]. This, however, leads to an as yet unanswered question on why the extremely low genetic diversity of the relic host tree still hosts a highly diverse population of a fungal endophyte. This suggests that a predominantly clonal population structure has not eroded genotypic diversity. Thus, we conclude that the potential sources of such astonishing diversity within a small temporal scale may be related to large effective population size [16] or high de novo mutations [67]. Unexpectedly, we found that two Dutch isolates are included within the diversity of Chinese isolates, suggesting no geographical clustering and long-distance dispersal of P. neosporulosa, although the avenues of its dispersal remain unknown. Thus, we cautiously advocate that selection could have led to the evolution of phenotypic variability, which might have prevailed over genetic differentiation to allow rapid adaptation to hosts.

Subsequently, we searched for signatures of selection using a genome-scanning approach. It has been suggested that the signals of balancing selection in highly self-fertilizing and clonal populations could be stronger in the case of peaks of variability in genomic regions [68, 69]. Our findings support this claim. A dozen outlier loci showing pervasive signature of selection in genes associated with T1PKS, NRPS, PLs, and GHs families indicate these genes experience rapid evolutionary change. Carbone et al. [70] and Drott et al. [71] confirmed that the costs of aflatoxins in the absence of insects and soil microbes are outweighed by benefits in their presence, resulting in balancing selection on the aflatoxin gene cluster of Aspergillus parasiticus. Our findings may fit this scenario as balancing selection will retain the chemotype polymorphisms of secondary metabolites in P. neosporulosa for protecting trees against herbivorous insects or needle pathogens. It is thus easy to envision that the diverse alleles may display variable activities of CAZomes and SMGCs, thus increasing the evolutionary potential and resilience of a symbiont population over long periods.

Another contribution of our study is the discovery of a unique mating system. To the best of our knowledge, this work represents the first documentation of two divergent mat1-2 alleles taking place simultaneously in a population that has both homothallic and heterothallic MAT loci arrangements. The two mat1-2-1 alleles differ in length and sequence at the nucleotide and amino acid levels, which fits with the observation of rapid evolutions in the mat1-2 sequences [72]. Furthermore, the apparent absence of isolates carrying only the opposite mat1-1 locus is surprising. We observed 30 isolates in total with a single mating type (mat1-2-1), and according to the typical bipolar mating system of all ascomycete fungi, both opposite mating types (mat1-1 and mat1-2) are needed for sexual recombination. Thus, we speculate that this is likely a sign of a severe bottleneck that generates genetic drift. It is noteworthy that the occurrence of two types that do not deviate from a balanced ratio and the age of the two mat1-2-1 idiomorphs might also indicate either the existence of heterothallism in which mat1-2-1 (a) acts as a mat1-1 mating type, a mixed system with outcrossing and haploid selfing, or a transition from heterothallism to homothallism. To distinguish between these hypotheses would require observation of the mating system in crossing assays.

Conclusion

Our results reveal a new example of convergent genomic adaptation to endophytism in P. neosporulosa, mostly determined by enzymes involved in carbohydrate metabolism and secondary metabolite biosynthesis. It seems clear that different plant-beneficial helotialean fungi, including P. neosporulosa, use similar genomic toolkits to function as symbiotrophs. Notably, a hallmark of P. neosporulosa lies in its highly intra-genomic polymorphism despite the signature of clonality. We acknowledge that it is not the first attempt to estimate the incidence of intra-specific variations in FEs. However, our focus on P. neosporulosa from the relict Abies is justified because the studied population experiences increased selection pressure to diversify CAZomes and SMGCs. This evolutionary force could render the Pezicula fungi have more endophytic niche fitness and serve as a considerable “ecological regulator” in forest ecosystems.