Molecular prospecting for cryptic species of the Hypholoma fasciculare complex: toward the effective and practical delimitation of cryptic macrofungal species

Many macrofungal cryptic species remain unidentified. A possible solution is to increase the number of loci analyzed and use rigorous statistics for macrofungal species delimitation. To validate this assumption, cryptic species of the Hypholoma fasciculare complex, a group of common wood-decomposing fungi, were attempted to be delineated. Massively parallel sequencing of mitochondrial ribosomal RNA (mt_rRNA), nuclear ribosomal internal transcribed spacer (ITS) region, and 24 single-copy genes were performed for 96 specimens collected in Japan. Then, the species boundaries were inferred using comparative gene genealogies (mt_rRNA vs. ITS), Bayesian Poisson tree process (bPTP) model for the phylogeny of concatenated nuclear sequences, and analysis of molecular variance (AMOVA) for single nucleotide polymorphisms. In both the mt_rRNA and ITS phylogenies, the H. fasciculare complex was not divided into well-supported clades. Nevertheless, based on the bPTP, two mitochondrial haplotypes were inferred to represent distinct species (H. fasciculare and H. subviride). The results of AMOVA also indicated that the differentiation of nuclear loci can be explained mostly by differences between haplotype. These results suggest that it is necessary to increase the number of target loci to 20 or more and use both phylogeny-based and population genetics-based statistics for the accurate delimitation of macrofungal species.

Scientific RepoRtS | (2020) 10:13224 | https://doi.org/10.1038/s41598-020-70166-z www.nature.com/scientificreports/ single molecular marker is not necessarily sufficient for accurate species delimitation 12,13 . Therefore, approaches that incorporate many loci are necessary for effective detection of species boundaries. However, methods for accurate species delimitation of fungi based on multiple loci have not been well developed. Over the last three decades, comparative gene genealogies (i.e., visual inspections of individual gene genealogies 14 ) have been used to delimitate fungal species, including pathogenic fungi 6,[15][16][17][18] , lichenized fungi [19][20][21] , and macrofungi 7,22,23 . This method clearly differs from single-locus phylogeny because distinct linkage disequilibrium among different loci indicates the restriction of gene flow, which is an important indication of reproductive isolation among sympatrically distributed fungal taxa 7,17 . Nevertheless, this method has several limitations. First, there is a lack of rigorous statistical frameworks for detecting species boundaries; bootstrap supports are an exception, although they represent the reliability of nodes rather than that of species boundaries. This becomes especially problematic when trying to delimit recently diverged species, for which reciprocal monophyly is not necessarily demonstrated across multiple loci 24,25 . Second, only a small number of loci can realistically be used in this method because of the substantial amount of labor required to use many loci. Third, heterozygotes are not taken into account in this procedure despite the fact that they could provide fundamental information for inferring the amount of gene flow in populations 26 . Thus, the methodologies for detecting macrofungal species using multiple loci could be considerably improved.
Notably, several studies have developed statistical frameworks for exploring and testing species boundaries based on molecular phylogenetic trees. For instance, the General Mixed Yule Coalescent (GMYC) model 27 and its Bayesian implementation 28 (bGMYC) separate the branching patterns observed in a time-calibrated ultrametric tree into two events: speciation events between species-level taxa (modeled by a Yule process) and coalescent events between lineages sampled from within species (modeled by the coalescent). The GMYC model assumes that the coalescence process is far more common than the speciation processes within the tree and attempts to determine a threshold that reflects the transitions between both processes. The Poisson tree processes (PTP) model and its Bayesian implementation (bPTP) are similar to the GYMC model 29 , and are often integrated with the evolutionary placement algorithm (EPA), in which short reads are placed into a given reference tree obtained from full-length sequences to determine the evolutionary origin of reads 30 . The PTP model assumes that the number of substitutions between species is significantly higher than the number of substitutions within species, although they differ from the GMYC model in that they use the number of substitutions to directly model the speciation rate. The application of these methods to concatenated multilocus sequences could provide rigorous statistical frameworks for distinguishing species, although heterozygotes would still not be distinguished from homozygotes.
Meanwhile, analysis of population genetics using single nucleotide polymorphisms (SNPs) is a powerful method of detecting reproductively isolated species. This method has some advantages over phylogeny-based species delimitation by testing distinct linkage disequilibrium among different loci. Specifically, the use of SNP markers allows heterozygotes to be distinguished from homozygotes. Additionally, statistical frameworks, such as Wright's F statistics and its analogs, can be implemented to utilize allele frequency data to quantify population subdivision and estimate the amount of gene flow 26 . An approach based on population genetics can therefore provide reliable evidence for reproductive isolation among fungal lineages, although caution should be exercised when delimiting allopatric species 24,31 . Although less common than phylogeny-based species delimitation, population genetics analysis has been used to delimit the species boundaries of various fungi [32][33][34][35] .
A fundamental question addressed in the present study is how increasing the number of loci analyzed (i.e., using 20 or more loci) and using two different statistical frameworks (i.e., both phylogeny-based and population genetics-based species delimitation) could improve the accuracy of the delimitation of closely related cryptic macrofungal species. To explore this question, species boundaries were examined in the species complex of Hypholoma fasciculare (Strophariaceae, Agaricales). Hypholoma fasciculare is a common and widespread wooddecomposing fungal species, which is distinguished by a sulfur-yellow pileus with an orange-brown center, crowded gills that are initially yellow but darken as the blackish-purple spores drop, and the very bitter taste of their basidiomes [36][37][38][39] . To date, the taxonomy of this species remains to be resolved. Because variations of basidiome sizes and colors were reported in this species, the fungal specimens identified as H. fasciculare could cluster into several distinct species 37 . Moreover, H. fasciculare could be confused with H. subviride, which has been reported only around the Central America but is morphologically similar to H. fasciculare 40,41 . To identify cryptic species in the H. fasciculare complex in Japan, the large (mtLSU) and small (mtSSU) subunits of mitochondrial ribosomal RNA, the nuclear ITS1 and ITS2 region, and 24 nuclear single-copy genes were sequenced for the H. fasciculare complex using the massively parallel sequencing technique. Thereafter, species within the complex were tried to be delimited using (1) comparative gene genealogies of the concatenated mtLSU and small mtSSU regions (mt_rRNA dataset) and the concatenated ITS1 and ITS2 regions (ITS dataset), (2) the bPTP model that was applied to the molecular phylogeny inferred from the concatenated nuclear sequences (nuc_concat dataset), and (3) hierarchical analysis of molecular variance (AMOVA) based on nuclear SNPs. An outline of these analyses is shown in Fig. 1. The empirical study of the H. fasciculare complex should show the importance of increasing number of loci analyzed and using rigorous statistical frameworks for the accurate detection of cryptic macrofungal species.    Table S1. Bold names indicate that sequences are determined in this study.
Scientific RepoRtS | (2020) 10:13224 | https://doi.org/10.1038/s41598-020-70166-z www.nature.com/scientificreports/ In the present study, only a part of the qualitative data on the habitat of H. fasciculare complex was available. However, it appears that fruit bodies of haplotype A occurred directly on stumps or fallen trees, whereas those of haplotype B occurred on soil or around the bases of dead trees (Fig. 5C).

Discussion
Our results provide important information for taxonomy of the Hypholoma fasciculare complex. Moreover, the present findings suggest that the use of rigorous statistical methods, such as phylogeny-based and population genetics-based analyses, for multilocus datasets allows for effective and practical detection of cryptic fungal species that cannot be easily distinguished by single gene genealogy and comparative gene genealogies.
The species boundaries in the H. fasciculare complex remained unclear when analyzed solely using the comparative gene genealogies of mt_rRNA and/or ITS sequences, presumably due to the insufficient taxonomical and phylogenetic resolution of single-locus phylogeny (Fig. 2). In general, distinct species do not necessarily form distinct clades with high bootstrap supports in mitochondrial phylogenies because of the insufficient variations in mitochondrial genes 10,42,43 . This problem appears to lie in the mt_rRNA phylogeny of the H. fasciculare complex. Additionally, intra-and interspecific variation could not be easily distinguished based on the phylogeny of the ITS region. These results highlight the limitations of single-locus phylogeny for distinguishing species, as suggested for other organisms 24,25 .
In contrast, use of bPTP model and AMOVA with 20 or more nuclear loci resulted in reasonably successful species delimitation of the H. fasciculare complex of Japan. The results of the bPTP model suggest that differences in mitochondrial sequence correspond to the species boundaries of the H. fasciculare complex (Fig. 4). Additionally, AMOVA for nuclear SNPs suggests that fungi with different mitochondrial haplotypes are reproductively isolated (Table 3). Specifically, distinct cytonuclear disequilibrium (i.e., linkage between cytoplasmic and nuclear markers 44,45 ) appeared to occur even in regions where both haplotypes coexist (e.g., Tok1, Yam1, and Yam2). Furthermore, populations consisting of fungi with the same haplotypes did not appear to deviate from the Hardy-Weinberg equilibrium (Table 3). Overall, both bPTP and AMOVA support the theory that the H. fasciculare complex in Japan should be divided into two species.
It was also shown in this study that two species in the H. fasciculare complex could be distinguished morphologically and ecologically (Figs. 5). The species represented by haplotype B is characterized by its relatively large pileus, long stipe, and wide gill cystidia compared to the species represented by haplotype A. It should be highlighted that these diagnostic features would not have been easily distinguishable from intraspecific morphological variation without the aid of DNA-based approaches. In other words, DNA-based species delimitation is not a substitute for the detection of diagnostic morphologies, but rather an efficient tool for detecting diagnostic morphologies. Interestingly, the habitats where the fruit bodies of the fungi studied occur appear to differ between the two species, implying that the species differ in the substrates that they decompose. This finding goes against the assumption that closely related fungal species have similar physiological features. Although further surveys, for example the analysis of stable isotopes 46,47 , are required to verify the divergence of trophic status, our results suggest morphological and ecological differentiation between two species in the H. fasciculare complex.
Our results indicate the necessity for revising the nomenclature of the H. fasciculare complex. Although the type specimens of H. fasciculare were lost, several evidences indicate that haplotype B corresponds to H. fasciculare. First, the diameter of the pileus described originally is somewhat similar to that of haplotype B 48 . Second, the type locality of H. fasciculare (supposedly the United Kingdom, home of the nomenclator, or its surrounding countries) seems to be included in the geographical distribution of haplotype B rather than in that of haplotype A, because all of the fungal materials of H. fasciculare complex collected in European countries were inferred to be conspecific to the samples of haplotype B based on integrated analysis using EPA and bPTP (EPA-bPTP) (Fig. 4). Meanwhile, haplotype A seems to correspond to H. subviride, which is morphologically similar to H. fasciculare 41 . Our results indicate that haplotype A is conspecific with the fungal materials identified as H. subviride in the INSD, which were collected in North and Central America, near to the type locality of H. subviride (i.e., Cuba) (Fig. 4). Further, macroscopic morphologies of haplotype A do not contradict the original description of H. subviride 49 . In summary, haplotypes A and B should be treated as H. subviride and H. fasciculare, respectively.
Notably, our results indicate that increasing number of loci analyzed is generally useful for accurate species delimitation of macrofungi. The molecular phylogeny based on 22 concatenated nuclear loci (i.e., the nuc_concat dataset) appeared to increase the variable sites and thereby improve taxonomic and phylogenetic resolution. Genome-wide analyses, such as restriction-site associated DNA sequencing 50 (RAD-seq) and multiplexed www.nature.com/scientificreports/ inter-simple sequence repeat genotyping by sequencing 51 (MIG-seq) can provide many more loci and thus are more useful tools for population genetic/genomic studies. However, these two methods are limited in the utility at deeper phylogenetic scales: few orthologous loci are typically recovered across disparate taxa in the former method 52 and the level of homoplasy is expected to increase with increasing the time of divergence between populations in the latter method 53 . Thus, they are not necessarily suitable for detecting cryptic macrofungal species as phylogenetically distinct fungal species in addition to recently diverged species are often confused owing to their morphological similarities 7,23 . Therefore, sequencing of 20 or more loci, as carried out in the present study, seems to be efficient and effective for the detection of cryptic fungal species. The use of adequate statistical methods for the phylogeny-based approach is fundamental for improving the accuracy of macrofungal species delimitation. Use of bPTP allows the speciation process to be distinguished H sublateritium   Table S1.    Table S1. Italicized samples correspond to the queries of EPA, for which sequences of nuclear ribosomal internal transcribed spacer (ITS) are obtained from the international nucleotide sequence database (INSD). Orange and blue lines indicate that differences are inferred as intraspecific and interspecific differences by bPTP model, respectively. The numbers near the branches are posterior probabilities (> 0.5) that the nodes represent species delimitation.
Scientific RepoRtS | (2020) 10:13224 | https://doi.org/10.1038/s41598-020-70166-z www.nature.com/scientificreports/ from the coalescent process based on the branch length of molecular phylogeny, thereby yielding more reliable detection of species boundaries than a visual inspection of molecular phylogeny 29 . Use of the EPA is also useful for understanding the phylogenetic placement of fungal materials for which only short reads are available in the INSD 30 . The bPTP model is almost equivalent to the GYMC model 27 and its Bayesian implementation 28 (bGMYC), which require a time-calibrated ultrametric tree. However, it is important to note that substantial computation time is required to reconstruct an ultrametric tree and the use of GMYC together with EPA is somewhat difficult. Therefore, the use of EPA-bPTP is more recommended for distinguishing species of macrofungi. The population genetics analysis using AMOVA was also shown to be useful for recognizing species boundaries of macrofungi. Since heterozygotes can be distinguished based on SNP data, AMOVA of SNP data allows for quantifying levels of gene flow among populations, which can provide important clues for recognizing reproductive isolation 54 . Wright's F-statistics is commonly used to test deviation from the Hardy-Weinberg equilibrium 26 . Among analyses based on analogs of F-statistics, AMOVA is characterized by its flexibility in the use of different hierarchical levels in the analyzed population structure 55 . Such flexibility is beneficial for population genetics studies of macrofungi, for which it is often necessary to collect samples from multiple areas to secure a sufficient number of samples for statistical tests. However, one of the limitations of this method is the necessity of presuming hypothetical species to be tested using AMOVA (e.g., haplotype A and B in the H. fasciculare complex), which need to be determined by the dataset independent of the SNP data, such as mitochondrial genes. The second limitation is that the presence/absence of reproductive isolation becomes unconvincing if geographical distributions of hypothetical species do not overlap 24 . Nevertheless, AMOVA for SNP data is an effective and practical tool for detecting reproductive isolation in natural populations.
In summary, our findings indicate that single gene genealogy and comparative gene genealogies may lead to invalid species delimitation among closely related macrofungi. Instead, species boundaries should be distinguished using adequate statistical methods with many loci. Because both EPA-bPTP and AMOVA have some limitations, use of both methods would compensate for each method's shortcomings and thereby provide reliable results for delimitating cryptic macrofungal species.

conclusion
As expected, the species boundaries of the H. fasciculare complex remain unclear when analyzed solely by comparative gene genealogies of mt_rRNA and ITS sequences. In contrast, both EPA-bPTP based on the phylogeny of concatenated nuclear sequences and AMOVA based on nuclear SNPs indicate that two mitochondrial haplotypes of the H. fasciculare complex represent distinct species, H. fasciculare and H. subviride. Our findings indicate that caution should be exercised when using single gene genealogy and comparative gene genealogies with few loci for delimiting closely related species of macrofungi. They also suggest that increasing the number of loci used to 20 or more and using both phylogeny-based and population genetics-based statistical frameworks allow for effective and practical macrofungal species delimitation.

Methods
Field survey. From June 2014 to November 2016, 95 specimens of the H. fasciculare complex were collected from 29 forest sites in Japan (Table S1). One sample of H. sublateritium was also collected as an outgroup. Small sections of fruit bodies were removed and stored in 99.5% ethanol for subsequent molecular analysis, and the remaining sections were dried and preserved as voucher specimens. Dried specimens were deposited in the Makino Herbarium of Tokyo Metropolitan University (MAK). DNA extraction, PCR amplification, and sequencing. Total DNA was extracted from the tissue of the voucher specimens using a cetyltrimethylammonium bromide (CTAB) method as described previously 35 . Two-step PCR was performed for these samples as described previously 56 . The target regions were the mtLSU and mtSSU, the nuclear ITS region, and 24 single-copy genes (Table 3). For mtSSU and mtLSU, primers were designed as part of the present study. For the other loci, preexisting primers were used 8,56 . After pooling of an equal volume of the respective PCR products, amplicons of 450-600 bp in length were excised and extracted with the E-Gel SizeSelect 2% agarose gel system (Thermo Fisher Scientific). The amplicon libraries were sequenced by 2 × 250 bp paired-end sequencing on a MiSeq platform (Illumina, San Diego, CA, USA) using a MiSeq v2 Reagent NANO Kit according to the manufacturer's instructions. Table 2. Results for analysis of molecular variance (AMOVA) based on single nucleotide polymorphism of nuclear loci, considering haplotypes, regions (sampling locality) and individuals (samples). AMOVA is a hierarchical analysis of molecular variance, estimating the % of molecular variance accounted for by each level of the nested sampling hierarchy as well as Φ (≒Fst).    57 as described previously 56 . The demultiplexed 14,533,614 reads were deposited in the DDBJ Sequence Read Archive (DRA accession: DRA009900). The processed reads were assembled into contigs (unique sequences) in Claident using a similarity cut-off of 100%. The final Claident output files (e.g., "nonchimeras. fasta" and "summary.csv") were further processed using R version 3.3.1 58 . All R scripts used in the molecular analyses are shown in Text S1. For nuclear loci, unique sequences with a read abundance of ≥ 20% of total 'locus × sample' reads were presumed to represent genotypes and were used for population genetics analysis because it is unlikely that respective genotypes of fruit body samples were represented by only a small fraction of total 'locus × sample' reads. For mitochondrial loci, the most abundant unique sequences were presumed to represent haplotypes.

Molecular phylogenetic inference.
The unique sequences of all loci obtained from the same sample were incorporated into the consensus sequence (IUPAC standard) using the "consensus" function of the R package "seqinr ver. 3.4-5" 60 . These consensus sequences (GenBank accession numbers: LC538389-LC540371; Table S2) were used for the subsequent molecular phylogenetic inference. For the ITS1 and ITS2 datasets, regions of ambiguous alignment were removed using Gblocks v0.91b 61 with options "Allow smaller final blocks" and "Allow gap positions within the final blocks". If the sequences obtained from different samples were identical, those sequences were merged as unique sequences. Molecular phylogenetic inference was performed using the mt_rRNA dataset, ITS dataset, and the nuc_concat dataset (the concatenated sequences of ITS1, ITS2 and 20 single-copy genes), separately (TreeBase ID: S26016). Prior to molecular phylogenetic inference, the nuc_concat dataset was subjected to the congruence among distance matrices 62 (CADM) test to determine whether the Table 3. Forward (F) and reverse (R) PCR primers to amplify a short fragment of each region. These primers contain an overlapping region of the Illumina sequencing primers and 6-mer Ns. a Primers designed in this study; b predesigned primers 15 ; c predesigned primer 8 .

Gene
Forward primer (5′-3′) Reverse primer (5′-3′) www.nature.com/scientificreports/ datasets were congruent, using "CADM.global" implemented in the R package "ape ver. 5.1" 63 . Then, the null hypotheses were confirmed to be rejected for all pairwise comparisons (Table S3). Phylogenetic inference based on the maximum likelihood (ML) method was performed using RAxML ver. 8.1.5 64 , in which the tree searches were repeated 25 times using random sequence addition for generating starting trees. Bootstrap support values were calculated from 1,000 standard bootstrap replications, as implemented in RAxML. Parameters of the GTR Gamma model were estimated separately for each partition according to model selection based on the Akaike information criterion (AIC) using Kakusan 4 65 .
Phylogeny-based approach to species delimitation using concatenated nuclear sequences. Species delimitation was inferred using the bPTP model, which was integrated with the EPA.
Specifically, the ITS1 sequences obtained from the INSD were placed into the ML tree inferred from the nuc_ concat dataset using the EPA algorithm as implemented in RAxML. Since the sequencing of the ITS2 was less successful than that of the ITS1 in the present study, the short reads did not include the ITS2. Then, the bPTP model was applied to the phylogenetic tree obtained from the EPA. For bPTP, "H. sublateritium" was precluded as an outgroup. The analysis consisted of 1,000,000 Markov Chain Monte Carlo generations, with a thinning every 1,000 generations and a burn-in of 10%.
Population genetics analysis. Using the "as.matrix.alignment" function of the R package "seqinr", the alignment of each nuclear locus was converted into a matrix of genotypes, where rows and columns represented samples and the nucleotide positions of the DNA sequence, respectively. To reduce biases, samples and loci with many missing datapoints were removed (i.e., 50% cut-off levels). After converting the data frame into a genind object, AMOVA was performed to determine the proportion of nuclear genetic variation that could be attributed to differences between mitochondrial haplotypes, between geographical regions (sampling localities), and between/within samples using the "poppr.amova" function of the R package "poppr ver. 2.8.3" 66 . This test also calculated Φ statistics, analogous to Wright's F-statistics. The statistical significance of variance components was computed using the Monte Carlo test using the "randtest" function implemented in the R package "ade4" 67 with 9,999 permutations.
Observation of morphological characteristics. To compare morphological characteristics, specimens of the H. fasciculare complex were examined. The macro-and micromorphological characteristics of basidiomes were described from fresh and dried specimens, respectively. The pilei and stipes of 94 specimens were measured. Microscopic observations were performed under a CX41 optical microscope (OLYMPUS, Tokyo) with material (sections of basidiome tissue) mounted in 5% potassium hydroxide (KOH) solution. Basidiospore measurements were taken at 1,000 × magnification under an optical microscope. The lengths and widths of 5-12 basidiospores were measured for each collection. Between-group differences in pileus and stipe size, spore length and width, and gill cystidia length and width were analyzed based on the Mann-Whitney U test using the "wilcox.test" function of R.