Fungal secretome profile categorization of CAZymes by function and family corresponds to fungal phylogeny and taxonomy: Example Aspergillus and Penicillium

Fungi secrete an array of carbohydrate-active enzymes (CAZymes), reflecting their specialized habitat-related substrate utilization. Despite its importance for fitness, enzyme secretome composition is not used in fungal classification, since an overarching relationship between CAZyme profiles and fungal phylogeny/taxonomy has not been established. For 465 Ascomycota and Basidiomycota genomes, we predicted CAZyme-secretomes, using a new peptide-based annotation method, Conserved-Unique-Peptide-Patterns, enabling functional prediction directly from sequence. We categorized each enzyme according to CAZy-family and predicted molecular function, hereby obtaining a list of “EC-Function;CAZy-Family” observations. These “Function;Family”-based secretome profiles were compared, using a Yule-dissimilarity scoring algorithm, giving equal consideration to the presence and absence of individual observations. Assessment of “Function;Family” enzyme profile relatedness (EPR) across 465 genomes partitioned Ascomycota from Basidiomycota placing Aspergillus and Penicillium among the Ascomycota. Analogously, we calculated CAZyme “Function;Family” profile-similarities among 95 Aspergillus and Penicillium species to form an alignment-free, EPR-based dendrogram. This revealed a stunning congruence between EPR categorization and phylogenetic/taxonomic grouping of the Aspergilli and Penicillia. Our analysis suggests EPR grouping of fungi to be defined both by “shared presence“ and “shared absence” of CAZyme “Function;Family” observations. This finding indicates that CAZymes-secretome evolution is an integral part of fungal speciation, supporting integration of cladogenesis and anagenesis.

(a-e) Strains belonging to the genera Aspergillus or Penicillium along with their reported taxonomic section, the species used in the current study, the species supplied by NCBI, the strain or isolate number, the NCBI accession number and a "Note" column. An "X" in the Note column indicates a representative strain of the given species whereas an empty field indicates a strain which was not selected as representative strain for the species in the current work. The genome assemblies resulting from resequencing, is treated as one and the excluded genome assembly is denoted as "DUPLET". Sections having only a single genome assembly available is not included in the current work, denoted "ALONE". The star on the actual species name and the note "ODD" indicates that the taxonomy (identity) of the strain could not be supported through assessment of the alignment based phylogenetic tree of 14 orthologous proteins. A bold species name indicates a correction of the species name used in the current work.     Table S1e Genus Section Actual species Species of NCBI Strain/Isolate Accession NCBI Note Table S2 Listing of enzyme observations found in all sections of both Ascomycota and Basidiomycota. None of these observations were included in the dendrogram in Fig. 2. The columns A and B refer to occurrence of such enzyme observations in the phyla Ascomycota and Basidiomycota, respectively. The number listed is the percentage of species within the phylum having the specific observation.

Figure S1
EPR dendrogram including all available strains, which passed filtering of each species within Aspergillus and Penicillium. The asterisk indicates that the taxonomy of the individual specimens has been revised based on prior phylogenetic assessment. The inner blue ring divides the species into eight groups, including one group of all members of Circumdati and Flavi now found alone in their own branch.

Figure S2
Molecular Phylogenetic analysis by Maximum Likelihood method. The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model. The tree with the highest log likelihood (-1882.14) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 107 nucleotide sequences. All positions with less than 99% site coverage were eliminated. That is, less than 1% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 453 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 and the numbers are bootstrap values of 100 iterations. An asterisk (*) indicates an ITS obtained from the strain based on PCR amplification and sequencing of only the ribosomal internal transcribed spacer region. A suffix BLAST is the top hits found when Penicillium strain LiaoWQ-2011, contig JPLR01000001.1, positions 3079-3830 was used as query in a BLAST search. The blue box indicates the ITS barcodes published in the original research paper for the strain of P. capsulatum whereas the grey box covered the species belonging to section P. Canescentia. The orange box frames the ITS barcodes retrieved from the genomes in the current study placing the two P. capsulatum strains within the P. Canescentia section.

Figure S3
Phylogenetic analysis based on 14 Muscle aligned Tubulin/HSP protein for all available genome assemblies of Aspergillus or Penicillium. Entities with a the letter D are considered duplicates since the two sequences are available for the same strain and only the newest is considered for further analysis. The letter O indicates that the here presented taxonomical does not agree with the NCBI taxonomy identification and cannot be inferred to any established taxonomy among the current genome assemblies. The letter S indicates that a single genome assembly representing its entire section by only that entity thus disregarded for further analysis. Species names prefixed by an asterisk indicate a probable misidentification of related species.

Figure S4
Two HSP combined with two tubulin genes were aligned for 1181 fungal genomes and used to construct a phylogenetic tree based on Maximum Likelihood. The inner ring is phylum, second ring is class, third ring is genus and the outer ring indicates if the entity was included or not prior to this filtering. The green colored branches of the tree indicates removed genome assemblies. See more detailed figure here: https://itol.embl.de/tree/19238139270451537296388. For the plot having representatives from all around the fungal kingdom; genomes leaking barcode genes were initially removed. The four barcodes were aligned and trimmed and concatenated to construction of a phylogenetic tree used for outlier inspection. Genomes placed outside their respective phylum, class or genus were removed.