Macrosynteny analysis between Lentinula edodes and Lentinula novae-zelandiae reveals signals of domestication in Lentinula edodes

The basidiomycete fungus Lentinula novae-zelandiae is endemic to New Zealand and is a sister taxon to Lentinula edodes, the second most cultivated mushroom in the world. To explore the biology of this organism, a high-quality chromosome level reference genome of L. novae-zelandiae was produced. Macrosyntenic comparisons between the genome assembly of L. novae-zelandiae, L. edodes and a set of three genome assemblies of diverse species from the Agaricomycota reveal a high degree of macrosyntenic restructuring within L. edodes consistent with signal of domestication. These results show L. edodes has undergone significant genomic change during the course of its evolutionary history, likely a result of its cultivation and domestication over the last 1000 years.

The genus Lentinula from the Basidiomycete order Agaricales, has a global distribution and consists of 7 described species, one of which is the well-known gourmet edible mushroom Lentinula edodes, colloquially known as Shiitake. New Zealand is home to a single endemic species of Lentinula that represents the monophyletic lineage Lentinula novae-zelandiae. Research investigating the biogeography and phylogenetic history of the Lentinula genus strongly supports a Laurasian descent for L. novae-zelandiae, with a single long-distance dispersal event from Australia sometime within the last 8 million years 1 . This data confirms earlier research that shows L. novae-zelandiae belongs to a monophyletic clade found only in New Zealand. Aside from these articles, little research has been undertaken on L. novae-zelandiae.
In contrast to this, L. edodes has a large natural range spanning mainland China and an even larger expanded range as it is now the second most cultivated mushroom in the world 2 . Known as the Shiitake mushroom after its common host, the Shii Tree (Castanopsis cuspidate), this fungus has been cultivated in China for at least 900-1000 years, with wild foraging being undertake for at least 1800 years 2 . This process has resulted in a number of highly productive domesticated strains that are used to produce what has become the second largest mushroom food crop in the world [3][4][5][6] .
The genomic impact of domestication has been comparatively well studied within the animal and plant kingdoms compared with the fungal kingdom 7 , although some research has been undertaken 4,7-12 . The outcome of this work has identified several features associated with domestication, such as interspecific hybridization events, horizonal gene transfer events, copy number variations, genome decay, and chromosomal rearrangements. Furthermore, a large body of evidence exists showing that mobile genetic elements can play a crucial role in shaping the genomic architecture of an organism regardless of whether it has undergone a domestication process 11,13 . No work to date has investigated whether there is any genomic signal of domestication in the Lentinula genus.
Species within the genus Lentinula, and in fact most Basidiomycete fungi, exist in the dikaryotic state during vegetative growth 14 . This is a life stage where each cell is home to two nuclei, one from each of the parental strains. When the fungus is preparing to produce fruit bodies, the two separate nuclei fuse in a process known as karyogamy, followed by meiosis. This process results in four daughter spores produced through sexual recombination. In brief, during meiosis homologous chromosomes pair, replicate, then separate. The pairing of homologous chromosomes allows for crossing-over events in which genetic material is exchanged between them 14 . This crossing-over is one of the sources of genetic variation derived through sexual recombination. It can even result in entire arms of a chromosome pair being swapped between homologous chromosomes in extreme cases.

Results
Genome assembly and annotation of Lentinula novae-zelandiae ICMP 18003. The genome of L. novae-zelandiae ICMP 18003 was assembled using 8 Gb of base called long-read MinION data and 18 Gb of paired-end short-read Illumina data, to an average depth of 127X and 137X coverage respectively following quality control. Analysis of the illumina dataset with GenomeScope reported a haploid genome length between 38.58 to 38.63 Mb, consisting of 5.88 to 5.89 Mb of repetitive content and with a heterozygosity of 0.871% to 0.879%. The model fit for these metrics was 95.25% to 96.98%.
The assembly pipeline produced a chromosome scale genome assembly consisting of 17 scaffolds with a total genome size of 48.9 Mb. The assembled genome had a GC content of 46.49%, an N50 of 4,832,147 bp and a L50 of 4, with the largest scaffold being 8,122,969 nucleotides long. Of these 17 scaffolds, the second-longest scaffold is a fully assembled chromosome capped with telomeric sequences on each end and with only a single gap in the scaffold. A further 13 sequences had at least one end capped by telomeric sequence. These telomeric sequences consisted of the repeating motif of TTA GGG G, with between 26 and 31 repetitions, with nucleotide lengths ranging from 183 to 220 bp.
Analysis with BUSCO reported the L. novae-zelandiae ICMP 18003 genome assembly has a completeness score of 96.5%. This consists of 3,636 complete single-copy BUSCOs out of a total of 3870, with 98 duplicated BUSCOs and 109 missing BUSCOs.
Genome assembly annotation. The L. edodes B17 and L. novae-zelandiae ICMP 18003 genome assemblies had their repetitive content analysed as part of the assembly pipeline. In total the L. edodes B17 genome had 24.9% identified as repetitive elements and the L. novae-zelandiae ICMP 18003 had 31.82% of its genome identified as repetitive elements. The larger repetitive content of the L. novae-zelandiae ICMP 18003 genome compared to the L. edodes B17 genome was primarily due to an increase in the number of LTR elements, with significant increases in the number of Gypsy DIRS1 elements accounting for the large bulk of the LTR elements, with the remainder mostly made up of Type 1 Copia elements. There are more than 1000 unclassified repetitive elements identified within the L. novae-zelandiae ICMP 18003 assembly than compared with the L. edodes B17 genome assembly.
Comparative genomics and macrosynteny. Agaricomycete reference genomes. One L. edodes genome was identified as suitable for macrosyntenic analysis; the representative genome assembly of L. edodes, assembly GCA_001562095.1 identified as strain B17 18 . This genome was selected as the long-read based genome assembly as it was assembled with FALCON 37 using 61X coverage of long-read data produced via a PacBio RS II system. This assembly was further scaffolded using 120X coverage of long-mated pair reads (5-kb and 10-kb libraries) as well as 86.1X coverage of short-insert reads (500-bp library) with the software SSPACE and GapCloser 18 .
NCBI's GenBank database yielded 270 genome assemblies for within the Agaricomycetes. Of those, six genome assemblies were identified as being chromosome scale genome assemblies; Agaricus bisporus ASM30057v2, Flammulina velutipes Fv1.0, Hericium erinaceus HeCS-4_2.0, Pleurotus ostreatus 03989_v2, Pyrrhoderma noxium ASM228747v2 and Trametes hirsuta TraHir072. In addition to this, the C. cinerea CC3 assembly was selected as a candidate as it is considered a chromosome-scale genome assembly and syntenic analysis has been undertaken on it previously 36 . Furthermore, work done in a recent article 38 on the genus Armillaria produced 11 putative chromosomes for A. ostoyae, bringing the total number of chromosome scale assemblies within the Agaricomycetes to eight. The genome assemblies of C. cinerea CC3 and P. noxium had predicted gene sets and as such were deemed suitable for macrosyntenic analysis.
The L. edodes B17 genome had five telomeric regions identified. Of these, three are located embedded within the scaffold in which they were identified, with only two capping the end of a scaffold. None of the embedded telomeric sequences had flanking assembly gaps. The telomeric sequences in the B17 genome ranged from 10 to 25 repeats of the telomeric motif and had nucleotide lengths from 73 to 179 bp. No telomeric regions were identified within the C. cinerea CC3 genome. The P. noxium genome assembly had 13 telomeric regions identified, all at terminal ends of assembled scaffolds. A summary of genome assembly metrics identified telomeric regions can be found in Supplementary Table 1.
A summary of the quantitative results from analysis with SynChro can be found in Supplementary Table 2. In general, SynChro found 67.23% average similarity between syntenic homologs across assessed genome assemblies, except for the Lentinula species, which shared 86.21% average similarity. The average number of genes per syntenic block ranged from 4.73 through to 28.91; however, the Lentinula species skew that result with an average of 28.91 genes per block, whereas on average all the other pairwise comparisons had an average of 5.56 genes per block. The number of syntenic blocks identified between pairwise comparisons ranged from 361 to 715, with an average of 546.83 syntenic blocks per pairwise comparison. Interestingly, the fewest syntenic blocks were found between L. edodes B17 and L. novae-zelandiae ICMP 18003 at 361 blocks, however this appears to be due to the number of genes per syntenic block for those comparisons.
Supplementary Table 2 shows how many times two consecutive blocks from one genome were found on the same chromosome of the compared genome. These data provide insights into the macrosyntenic relationships between the pairwise species comparisons. Comparisons between the L. edodes B17 genome assembly and any of the other analysed genomes showed a stark difference compared with the pairwise comparisons between the rest of the compared species.
For example, L. edodes B17 when compared against C. cinerea showed 416 sets of two consecutive blocks of the L. edodes genome were found within the C. cinerea genome; yet the P. noxium genome when compared against C. cinerea in the same manner showed 668 blocks, despite it being from a basal order to that which L. edodes is found. When comparing L. novae-zelandiae to the L. edodes B17 genome, only 166 syntenic blocks are identified. In contrast, L. novae-zelandiae shows 573 consecutive blocks of its genome are found on the same chromosomes within the C. cinerea genome and 402 consecutive blocks shared between it and P. noxium.
The ordering of the syntenic blocks identified across pairwise comparisons also matches this trend. Visualization of the macrosyntenic relationships between the pairwise comparisons of genome assemblies with Circos highlight the above-stated trend. These plots readily show the high level of conservation of macrosyntenic structure in all comparisons except for those with L. edodes. In these plots the colored ribbon connections between scaffolds represent syntenic blocks, with the width of the connection points scaled to the size of the syntenic block. For each plot the ribbons have been labelled according to one of the genomes as stated in the legend for each. This allows for identifying rearrangements between the query and target genome.
The SynChro results in Fig. 1 show a high degree of conserved macrosyntenic structure. For example, scaffold 1 in the C. cinerea genome primarily links with scaffold 4 and 9 of the L. novae-zelandiae ICMP 18003 genome assembly. Of significance is that there are only four small ribbons linking scaffold 4 of the L. novae-zelandiae ICMP 18003 genome assembly to scaffold 7 of the C. cinerea genome assembly and no other links to scaffold 9. In contrast to this, Fig. 2 shows the macrosyntenic relationship between C. cinerea and L. edodes B17, with scaffold 1 of the C. cinerea genome assembly linking with eight scaffolds in the L. edodes B17 genome assembly, each of which has numerous links to other scaffolds within the C. cinerea genome assembly. Furthermore, the ordering of ribbon links within the L. edodes B17 genome assembly shows a high level of disorder, evidenced by 16 of the 25 scaffolds having ribbon links back to at least two or more C. cinerea assembly scaffolds. The pattern observed in Fig. 1 is representative of a high level of conserved macrosynteny, whereas the pattern observed in Fig. 2 shows a low level of conserved macrosynteny. www.nature.com/scientificreports/ Syntenic relationships as identified with SynChro between the L. novae-zelandiae ICMP 18003 genome assembly and the L. edodes B17 genome assembly are shown in Fig. 3. Despite SynChro finding the highest number of syntenic blocks between these species out of all pairwise comparisons, the macrosyntenic structure is massively disordered when compared with that seen in Fig. 1. Of the 25 scaffolds within the L. edodes B17 genome assembly, 17 have significant links to two or more scaffolds within the L. novae-zelandiae ICMP 18003 genome assembly. Some of these links may simply be a signal that they are part of the same chromosome but were unable to be assembled together due to the bioinformatics processes used and/or the nature of the dataset. However, scaffolds 1 to 6, all of which are 3.5 Mb or more, appear to have a high degree of macrosyntenic restructuring. Scaffold 1 for example, has links to 9 scaffolds within the L. novae-zelandiae ICMP 18003 genome assembly. Given that scaffold 2 of the L. novae-zelandiae ICMP 18003 assembly is a fully assembled telomere to telomere chromosome it represents a powerful data resource in this context. Interestingly, scaffold 6 of the L. edodes B17 genome has almost equal syntenic links to scaffold 1 and scaffold 2 of the L. novae-zelandiae ICMP 18003 genome assembly.
Analysis with Satsuma2 identified the same trend of macrosyntenic relationships as SynChro did, as evidenced by the Circos plots produced from the results. Overall Satsuma2 found significantly fewer macrosyntenic blocks than SynChro did; however, the pattern of organisation of those blocks is consistent with those found by SynChro. A composite image of all pair-wise syntenic analyses with both SynChro and Satsuma2 can be found in Supplementary Image 1.

Discussion
In this study a chromosome-scale genome assembly for L. novae-zealandiae ICMP 18003 was created using ONT long-read data, illumina short-read data, and state-of-the-art bioinformatics tools. This assembly was produced using a bespoke hybrid assembly pipeline that has resulted in the first chromosome-scale genome assembly for the Lentinula genus. The L. novae-zelandiae ICMP 18003 genome assembly was used as a focal data set to conduct macrosyntenic analyses between it, L. edodes, and two other chromosome-scale genome assemblies from the taxonomic class Agaricomycota to identify whether there were any signals of domestication within the L. edodes lineage. When considering the known history and biogeography of L. edodes and L. novae-zelandiae, the obvious feature that marks a difference between the two closely related species is the long history of cultivation across a large geographic region of L. edodes. This cultivation has spanned an estimated 1000 years, whereas L. novae-zelandiae has no history of cultivation and is a geographically isolated monophyletic lineage 1,39,40 . This history and how closely related these species are makes L. novae-zelandiae an ideal species against which to compare L. edodes.
Research on fungal genome structure in response to domestication suggests large macrosyntenic rearrangements can occur frequently and rapidly within populations and can even lead to diversification of lineages [7][8][9]11,13,31,41 . For example, research in brewer's yeast has shown that chromosomes can fragment and subsequently be misrepaired by being fused together at telomere regions, resulting in new chromosomes with embedded telomeric sequences within them 11,13,41 . The results presented here identified telomeric regions embedded www.nature.com/scientificreports/ within scaffolds of the L. edodes genomes with no assembly gaps on either side. Given the high-quality nature of the genome and the lack of assembly gaps, the location of these telomeric regions is highly supported. Further potential signal of domestication is readily apparent in the high degree of macrosyntenic rearrangements that has occurred within the L. edodes B17 genome. Macrosyntenic analysis results from both SynChro and Satsuma2 show the same high level of conserved macrosyntenic structure shared between all genome assemblies assessed except for the L. edodes B17 genome. Previous research has found a similar level of conserved macrosynteny across species within the Agaricales 30,36 . The trend of conserved macrosynteny is supported by these results for all species except for L. edodes, with even the distantly related P. noxium from the basal order Hymenochaetales conforming to the trend. The B17 L. edodes genome assembly used within this study was derived from a single spore monokaryotic strain and as such some of its genomic structure may be due to inherited CLP's from the parental strains [15][16][17] . The research in this area suggests that these CLP's are derived from recombination of homologous chromosomes of differing lengths. If the parental strains of the B17 monokaryon had CLP's then this may explain some of the structural differences observed, but it cannot explain the embedded telomeric regions nor the high degree of observed restructuring between scaffolds 1 through 6 identified via SynChro and Satsuma2.
The finding of such a large degree of macrosyntenic differences between these two species within the same genus is therefore remarkable. Furthermore, these results could explain the varying results produced in the linkage group analysis work previously done, where between 11 and 14 linkage groups have been found. It may be that these results are not an artifact of the protocols used but are simply due to different cultures having different karyotypes. www.nature.com/scientificreports/ The biological ramifications of these genomic restructuring events are unknown, yet it is possible that these macrosyntenic changes have been underpinning adaptation of L. edodes to the commercial production environment. Future work in this area should focus on assessing the macrosyntenic differences within a large number of L. edodes cultivars through whole-genome sequencing. Research has shown that contiguity of a genome assembly is critical for meaningful macrosyntenic analysis, as such future assemblies should ideally be produced de novo using third-generation long-read sequencing technology with a robust quality-controlled assembly pipeline that involves a final curation step 42 . This final curation step is highly recommended 43 but rarely undertaken in contemporary genome assembly projects. Studies have identified many genome assemblies across all organismal groups that have significant amounts of contaminant sequence [44][45][46][47][48][49] . It would be prudent to investigate the gene regions flanking these structural re-arrangements as these may have given rise to fusion genes, pseudogenes or differentially regulated genes. Further validation of these structural rearrangements with wet-lab based techniques such as karyotyping and PCR amplification of fusion genes would provide strong evidence to support these findings.
With only a single highly-contiguous L. edodes genome available, it is beyond the scope of this research to report on how much macrosyntenic diversity exists within the population of L. edodes. It is reasonable to hypothesize that there may be significant macrosyntenic variation within the L. edodes population, and it is likely these differences would be readily found between wild populations and cultivars that are commercially used. The genome assembly of L. novae-zelandiae reported here will provide a valuable resource for researchers undertaking comparative genomic studies within the genus Lentinula as well as for those with an interest in exploring the effects of domestication on fungi.

Conclusion
The highly contiguous genome assembly of L. novae-zelandiae produced here has provided the means to make a meaningful macrosyntenic comparison between it and L. edodes. This comparison has revealed a large degree of macrosyntenic re-structuring within the B17 genome assembly of L. edodes that is potentially due to domestication. The genome assembly of L. novae-zelandiae is the first chromosome-scale assembly for the Lentinula genus and as such it represents a rich resource for future research; not only this but the methodology presented here provides a means for the production of high-quality fungal reference genomes using state-of-the-art technology. Lentinula novae-zelandiae ICMP 18003 DNA extraction and sequencing. Fungal genomic DNA was extracted from the dikaryotic culture ICMP 18003 of Lentinula novae-zelandiae that had been grown on PDA media at 23 °C for two weeks. Fungal tissue was ground into a fine powder using liquid nitrogen in a mortar and pestle before being extracted using the phenol/chloroform based protocol: High-quality DNA from fungi for long-read sequencing protocol 51 DNA purity was assessed on a Nanodrop spectrophotometer. DNA fragmentation was assessed by running 1 ul of DNA extract on a 1% agarose gel at 80 V for 120 min. Concentration of DNA was assessed using the dsDNA HS assay on a Qubit 4 fluorometer (Thermo Fisher).

Illumina library preparation and sequencing.
A library was prepared from the extracted DNA by Macrogen using the TruSeq Nano DNA kit with a 350 bp insert size. This was then sequenced on an illumina MiSeq platform with 100 bp paired end reads.
DNA Size selection and library preparation. DNA fragments less than 10 kb were depleted via a size selection protocol using AMPure XP beads (Beckman Coulter). 0.45X volume of resuspended AMPure XP bead solution was added to the extracted DNA and incubated at room temperature with gentle mixing for 20 min. The sample was then placed onto a magnetic rack until the solution was clear, following which the supernatant was removed and the sample was then washed twice with 200 ul of fresh 70% EtOH. DNA was then eluted in 50 ul of EB (10 mM Tris pH 8.0) at room temperature for 2 min before being returned to the magnetic rack until the solution was clear. The DNA containing supernatant was then transferred to a fresh 1.5-ml Eppendorf DNA LoBind tube.
DNA repair (NEBNext FFPE DNA Repair Mix, NEB M6630) was performed on extracted fungal genomic DNA following Oxford Nanopore Technologies recommended protocol. The repaired DNA was then purified by adding 60 ul of resuspended AMPure XP beads to the sample in a fresh 1.5 ml Eppendorf DNA LoBind tube. The sample was incubated at room temperature for 5 min with gentle mixing, washed twice with 200 ul fresh 70% ethanol, pellet allowed to dry for 30 s, and DNA eluted in 61 ul of EB (10 mM Tris pH 8.0). A 1 ul aliquot was quantified by fluorometry (Qubit 4) to ensure ≥ 1 ug DNA was retained.
Ligation was performed by adding 25 ul of Ligation Buffer, 10 ul of NEBNext Quick T4 DNA Ligase and 5 ul Adapter Mix (SQK-LSK109 Ligation Sequencing Kit, Oxford Nanopore Technologies (ONT)) to the 60 ul of DNA sample from the previous step. This was mixed gently and incubated at room temperature for 10 min.
The adaptor-ligated DNA was cleaned using 40 ul of AMPure XP beads and Short Fragment Buffer (SQK-LSK109). The purified-ligated DNA was resuspended in 15 ul EB (10 mM Tris pH 8.0), incubated at room temperature for 10 min, pelleting the beads again, and transferring the supernatant to a new tube. A 1-ul aliquot was quantified by fluorometry (Qubit 4) to ensure ≥ 500 ng DNA was retained.  67 pipeline was used to predict gene sets for the repeat masked Lentinula genome assemblies, with a minimum protein length of 20 amino acids. Transcript evidence to be used for downstream gene model prediction was created using Trinity v. 2.8.5 68 . The following datasets were downloaded from NCBI Genbank with SRAtoolkit 2.9.6: SRR527823, SRR5891391, SRR5891392, SRR5891393. Each dataset was aligned to the L. edodes B17 genome using STAR v. 2.7.0 69 . Each dataset was assembled using Trinity in the genome guided assembly mode with the jaccard clip option turned on and a max intron size set to 75 68 73 was used to generate a consensus set of gene predictions from the ab initio prediction programs. In the training step, the four assembled transcriptome datasets previously described were used as well as the UniProt/SwissProt protein database 74 . tRNAscan-SE v. 2.0.5 75 was used to predict tRNA genes.
Comparative analysis. Agaricomycete reference genomes. The NCBI GenBank genome database was searched to identify L. edodes genomes appropriate for this study as well as to identify chromosome-scale genome assemblies within the taxonomic class Agaricomycetes. This was done by searching for "Agaricomycetes", with subsequent manual parsing of the summary file produced. Assemblies identified as chromosome scale were flagged for further analysis with QUAST and had their telomeric regions identified. Assemblies that had predicted gene sets were identified as suitable for downstream macrosynteny analysis.
Macrosynteny analysis. To analyse the macrosyntenic relationships between the different genome assemblies thoroughly and reliably, macrosynteny analysis was conducted using two different pieces of software, each of which takes a different approach to identifying macrosyntenic relationships.
The January 2015 version of the SynChro package from the CHROnicle software suite 76,77 was used with a delta value of 2. This software aims to identify homologous gene regions and takes as input the assembled genomes and a set of gene predictions for each assembly. In parallel to this, genome synteny analysis was also conducted with Satsuma2 v.20161123 78 . This software aims to identify homologous nucleotide regions and takes as input the assembled genome. To visualize the output from these software, Circos v. 0.69-8 79 was used.