Evidence of non-tandemly repeated rDNAs and their intragenomic heterogeneity in Rhizophagus irregularis

Arbuscular mycorrhizal fungus (AMF) species are some of the most widespread symbionts of land plants. Our much improved reference genome assembly of a model AMF, Rhizophagus irregularis DAOM-181602 (total contigs = 210), facilitated a discovery of repetitive elements with unusual characteristics. R. irregularis has only ten or 11 copies of complete 45S rDNAs, whereas the general eukaryotic genome has tens to thousands of rDNA copies. R. irregularis rDNAs are highly heterogeneous and lack a tandem repeat structure. These findings provide evidence for the hypothesis that rDNA heterogeneity depends on the lack of tandem repeat structures. RNA-Seq analysis confirmed that all rDNA variants are actively transcribed. Observed rDNA/rRNA polymorphisms may modulate translation by using different ribosomes depending on biotic and abiotic interactions. The non-tandem repeat structure and intragenomic heterogeneity of AMF rDNA/rRNA may facilitate successful adaptation to various environmental conditions, increasing host compatibility of these symbiotic fungi.

T he arbuscular mycorrhizal fungus (AMF) is an ancient fungus with origins at least as old as the early Devonian period 1,2 . AMF colonizes plant roots and develops highly branched structures called arbuscules in which soil nutrients (phosphate and nitrogen) are efficiently delivered to the host plant 3 . AMF forms symbiotic networks with most land plant species 4,5 , and the mycelial network formed by various AMF species contributes to plant biodiversity and productivity within the terrestrial ecosystem 6 . The distinctive features of AMF have made it an important model in ecology and evolution 7,8 ; these features include coenocytic mycelia 5 , nutrition exchange with plant, classification as an obligate biotroph 9 , signal crosstalk during mycorrhiza development 9,10 , and extremely high symbiotic ability 9,11 . Recently, multiple genome projects have advanced the understanding of AMF species. Genomic data have been provided for Rhizophagus irregularis DAOM-181602 (=DOAM-197198) [12][13][14] , Gigaspora rosea 12 , Rhizophagus clarus 15 , and other isolates of R. irregularis 14,16 . These studies revealed potential host-dependent biological pathways 12,17 and candidate genes for plant infection and sexual reproduction 15,16,17 . However, fragmented genome sequences limit the ability to analyze repetitive structures and to distinguish between orthologous and paralogous genes 14 . The first published genome sequence of R. irregularis DAOM-181602 (JGI_v1.0) 17 contained 28,371 scaffolds and an N50 index of 4.2 kb (Supplementary Table 1). The second sequence by Lin et al. 13 (Lin14) contained 30,233 scaffolds with an N50 of 16.4 kb (Supplementary Table 1). Recently published assemblies by Chen et al. 14 Table 1). The quality of genomic sequence data for other AMF species did not surpass that of DAOM-181602 12,15 . In contrast, many fungi that are not AMF species contain less than several hundred scaffolds and N50 lengths over 1 Mb 18 . For example, a genomic sequence of an asymbiotic fungus closely related to AMF, Rhizopus delemar (GCA000149305.1), was constructed from 83 assemblies with an N50 of 3.1 Mb 19 . Thus, we here present an improved wholegenome sequence of R. irregularis DAOM-181602 to facilitate examination of the genomics underlying specific features of AMF species. Taking an advantage of the highly contiguous assembly with little ambiguous regions, we focus on the investigation of the repetitive structures including transposable elements (TE), highly duplicated genes, and rDNA gene copies.
A general eukaryotic genome has tens to thousands of rDNA copies 20 (Supplementary Figure 1a), and the sequences of the copies are identical or nearly identical. However, since Sanders et al. 21 , many studies have indicated intracellular polymorphisms of rDNA (ITS) in various AMF species [22][23][24] , and the sequencing of isolated nuclei from Claroideoglomus etunicatum and R. irregularis DAOM-181602 suggested sequence variation among the paralogous rDNAs, i.e., intragenomic heterogeneity 13,25 . This heterogeneity has potentially high impact of studying AMF species, because the rDNA is a fundamental marker of the AMF phylogeny and ecology 8,[26][27][28] , and studies have assumed that these rDNAs have no intragenomic sequence variation 29 . Hence, determining the variation degree could cause a reevaluation of the previous understanding of geographic distribution 8 , species identification 28 , and evolutionary processes of AMF. However, the degree of the variation among the 48S rDNA paralogs has been ambiguous because previous studies by Sanger or Illumina sequencing were unable to distinguish each rDNA paralog in a genome. Moreover, the number of rDNA genes in an AMF genome has never been investigated.
The tandem repeat structure (TRS) of the rDNAs is also an attractive topic for evolutionary studies. General organisms require many rDNA copies to make a sufficient amount of rRNA for protein translation 30,31 . However, in the evolutionary timescale, multicopy genes reduce in number due to homologous recombination (Supplementary Figure 1b) 32,33 and single-strand annealing (Supplementary Figure 1c) 34 . To maintain the number of rDNAs, eukaryotes increase the number of copies by unequal sister chromatid recombination (USCR) using the rDNA TRS ( Supplementary Figure 1d, e) 32 . Because this rDNA replacement causes a bottleneck effect in the genome, almost all eukaryotes have homogenous rDNAs in their genomes 20 . This process, termed concerted evolution is an essential system for maintaining eukaryotic protein translation by ribosomes 30 . The heterogeneous rDNAs observed in AMF species implies the collapse of their concerted evolution, and suggest the unique maintenance system of rDNA copy number.
In this study, we built an improved reference genome assembly of R. irregularis DAOM-181602, which allowed us to discover repetitive elements with unique characteristics of the AMF genome. We identified an unusually small number of rDNA genes in the R. irregularis genome. We also found that the rDNA copies are highly heterogeneous and lack a TRS.

Results
A contiguous DAOM-181602 genome generated by PacBio data. We primarily used single-molecule, real-time (SMRT) sequencing technology for sequencing and assembling the R. irregularis genome. We generated a 76-fold whole-genome shotgun sequence (11.7 Gb in total) (Supplementary Table 2) from genome DNA isolated from a spore suspension of a commercial strain of R. irregularis DAOM-181602 using the PacBio SMRT sequencing platform. A total of 766,159 reads were generated with an average length of 13.1 kb and an N50 length of 18.8 kb (Supplementary Table 2). We assembled these PacBio reads using the HGAP3 program 35 (149.9 Mb composed of 219 contigs). To detect erroneous base calls, we generated 423 Mb of 101 bases-paired-end Illumina whole-genome sequence data (Supplementary Table 2) and aligned them to the HGAP3 assembly. Through variant calling, we corrected 3032 single base call errors and 10,841 small indels in the HGAP3 assembly. Nine contigs were almost identical to carrot DNA sequences deposited in the public database (Supplementary Table 3), and these were removed as contaminants derived from a host plant used by the manufacturer. We evaluated the completeness of the final assembly using CEGMA 36 ; of the 248 core eukaryotic genes, 244 genes (98.4%) were completely assembled (Table 1 and Supplementary Table 1). Consequently, we obtained a high-quality reference genome assembly of R. irregularis DAOM-181602, which is referred to as RIR17. Compared with previous assemblies 13,14,17 , RIR17 represents a decrease in assembly fragmentation (1123 to 210) and an improvement in contiguity using the N50 contig length as a metric (Table 1 and Supplementary Table 1). The total size of the assembly was 9-59 Mb greater than that of previous versions, reaching 97.24% coverage of the whole genome (154 Mb) 17 (Table 1 and Supplementary Table 1 OG0000066  OG0000052  OG0000047  OG0000025  OG0000076  OG0000089  OG0000090  OG0000094  OG0000095  OG0000098  OG0000105  OG0000107  OG0000103  OG0000114  OG0000118  OG0000080  OG0000117  OG0000136  OG0000154  OG0000155  OG0000171  OG0000164  OG0000170  OG0000190  OG0000208  OG0000214  OG0000217  OG0000229  OG0000248  OG0000202  OG0000279  OG0000304  OG0000311  OG0000314  OG0000338  OG0000332  OG0000364  OG0000383  OG0000416  OG0000432  OG0000477  OG0000482  OG0000481  OG0000523  OG0000535  OG0000536  OG0000579  OG0000580  OG0000582  OG0000620  OG0000621  OG0000675  OG0000771  OG0000919  OG0000925  OG0001001  OG0001061  OG0001131  OG0000069  OG0000062  OG0000074  OG0000138  OG0000174  OG0000261  OG0000298  OG0000085  OG0000365  OG0000384  We confirmed a unique repeat profile in the AMF genome. The majority of the interspersed repeats (62.83%) could not be categorized with known repeat classes (Supplementary Data 1), indicating that the AMF genome accumulated novel classes of interspersed repeats. Moreover, DAOM-181602 lacks short interspersed nuclear elements (SINEs), which are abundant in closely related fungi (Supplementary Data 1). Several types of SINEs proliferate using transposases on long interspersed nuclear elements (LINEs) 38 . Although the AMF has 23 LINEs containing the transposase gene (Supplementary Data 2), SINEs have never been found in previous genomes 13,14,17 or RIR17. DAOM-181602 may have a system to suppress the invasion and proliferation of SINEs (e.g., a high number of very active Argonaute proteins, as predicted by Tisserant et al. 17 ).

New gene annotation details gene family expansion in AMF.
Using the RIR17 assembly together with strand-specific RNA-Seq data (Rir_RNA_SS in Supplementary  Table 5). The gene models having any support were submitted to the DDBJ as standard genes and were used in downstream analyses. The models having no support were assigned as PROVISIONAL gene models (Supplementary Data 1 and Supplementary Table 5). Using Orthofinder with previous genomic gene sets indicated that our gene models cover the majority of previously provided genes (Supplementary Figure 2). Although new models showed more coverage of Benchmarking Universal Single-Copy Orthologs (BUSCOs) 39 (Supplementary Table 5) than JGI_v1.0 and Lin14, their gene completeness was slightly lower than that of JGI v2.0 (nine BUSCO families overlooked, Supplementary Table 5), indicating the advantage of using the JGI annotation pipeline to discuss the gene variety in DAOM-181602 14 . However, we considered our model set suitable for the analysis of the repetitive region and highly paralogous genes because our model is based on highly continuous assemblies, and the number of genes on repetitive regions was increased to 2349-12,559 genes from the number in JGI_2.0 (Supplementary Table 6).
R. irregularis has one of the largest numbers of genes in fungi (Supplementary Figure 3). Our ortholog analyses indicate that the gene number inflation was caused by lineage-specific expansions of gene families and not by whole-genome duplications. An Orthofinder analysis of nine fungal genomes and two animal data sets (Supplementary Table 7) showed that many of the singlecopy genes in other fungi were also single copies in RIR17 (216/ 239 families, Supplementary Data 3), negating the possibility of whole-genome duplication in R. irregularis. The large number of species-specific single-copy genes in DAOM-181602 (10,354 genes, Fig. 1a, Supplementary Data 3) suggests that the AMF genes inflated by new gene constructions through gene fusion and mutation accumulation. Moreover, several common gene families in Opisthokonta also contributed to gene inflation; the R. irregularis lineage had 92 rapidly expanded families, containing 8952 genes (Fig. 1a, d, e, Supplementary Data 3 and 4), suggesting that R. irregularis has also acquired many genes by the duplication of particular gene families.
The motif annotation indicates that inflated genes may contribute to signaling pathways of AMF species. Our Pfam search annotated 1620 species-specific single-copy genes and 6755 rapidly expanded genes ( Other signal-related motifs (e.g., Sel1 repeat and BED zinc finger) were also found in the inflated genes ( Fig. 1b, f, Supplementary Data 6). AMF has developed a unique signal pathway for symbiosis (e.g., establishments of symbiosis with pathways via SIS1 41 and lyso-phosphatidylcholine 42 ). This inflation of signaling-related genes may have led the development of a complex signaling pathway in AMF.
We then investigated the contribution of the TEs to gene inflation based on the overlapping of highly paralogous genes and the TEs. Previous studies hypothesized that the gene inflation in R. irregularis relates with the expansion of TEs 14 . Our analysis showed that in several rapidly expanded families (e.g., OG0000090 and OG0000020), over 90% of the genes were located with TEs ( Fig. 1d, e, Supplementary Data 7), suggesting that TEs accelerated the gene expansion in these families. However, some of the families had no correspondence with TEs (e.g., OG0000025 and OG0000058 in Fig. 1c, e). In species-specific single-copy genes, TEs were slightly more frequently found with motifs than in all gene sets but were less frequently found in species-specific single-copy genes without motifs (Fig. 1c). This detailed analysis supports the contribution of TEs to gene inflation in several gene families but also clarified that several families show TE-independent expansion. Although more genome data for AMF species and sister groups are required to reveal the gene expansion process and its contribution to AM symbiosis, our data provide a fundamental dataset to reveal the evolution of gene redundancy in AMF species. Losing conserved fungal genes. Previous AMF studies suggested the loss of several categories of genes by symbiosis with plant 12,13,17 . Our RIR17 genome assembly confirmed the loss of genes involved in the degradation of plant cell walls such as cellobiohydrolases (GH6 and GH7 in the CAZy database), polysaccharide lyases (PL1 and PL4), proteins with cellulose-binding motif 1 (CBM1), and lytic polysaccharide monooxygenases (Supplementary Data 2 and Supplementary Table 8) and nutritional biosynthetic genes, including fatty acid synthase (FAS) and the thiamine biosynthetic pathway (Supplementary Data 8). Given that fatty acids and thiamine are essential nutrients for fungi 43,44 , R. irregularis should take up those essential nutrients from a host plant without digestion of the plant cell wall. Several recent papers have described the transport of lipids from plants to AMF [45][46][47] .
R. irregularis has an exceptionally low rDNA copy number. The general eukaryotic genome has tens to thousands of rDNA copies 20 (Supplementary Figure 1a). However, the RIR17 genome assembly contained only ten copies of the complete 45S rDNA cluster, which was composed of 18S rRNA, ITS1, 5.8S rRNA, ITS2, and 28S rDNAs (Fig. 2a, Supplementary Data 9). To confirm that no rDNA clusters were overlooked, we also estimated the rDNA copy number based on the read depth of coverage. Mapping the Illumina reads of the genomic sequences (Rir_DNA_PE180) onto the selected reference sequences indicated that the coverage depth of the consensus rDNA was 8-11 times deeper than the average coverage depth of the single-copy genes (Fig. 2b, Supplementary Table 9), the number of rDNA copies is approximately 10, and the RIR17 assembly covers almost all of the rDNA copies. This AMF rDNA copy number is the lowest among eukaryotes 48 other than pneumoniacausing Pneumocystis (one rDNA) 49 and malaria-causing Plasmodium (seven rDNAs) 50 .
This low copy number suggests a unique ribosome synthesis system in AMF. The rDNA copy number has relevance for the efficiency of translation because multiple rDNAs are required to synthesize sufficient rRNA. For instance, an experimental decrease in rDNA copy number in yeast (approximately 150 rDNAs in wild type) resulted in no isolated strain having <20 copies, which is considered the minimum number to allow yeast growth 30 . The doubling time of yeast with 20 rDNA copies (TAK300) was 20% longer than that of the wild type 30 . In DAOM-181602, successive cultivation in an infected state with a plant has been widely observed, suggesting that this exceptionally small rDNA copy number is enough to support growth. The multinucleate feature of AMF would increase the rDNA copy number per cell and thereby perhaps supply enough rRNA to support growth. A similar trend in rDNA reduction is observed in the organellar DNA (e.g., mitochondria and plastids) 51 . Revealing the details of translation in AMF requires a future tracking study of the rRNA production and degradation process in AMF. Elucidation of the mechanism to produce mass rRNAs from a few rDNAs may contribute to the understanding of not only AMF evolution but also other polynuclear cells (e.g., striated muscle and Ulvophyceae green algae) and symbiont-derived organelles.
rDNAs are heterogeneous and completely lack a TRS. Interestingly, none of the RIR17 rDNAs form a TRS, in contrast to most eukaryotic rDNAs, which comprise tens to hundreds of tandemly repeated units 20 . Most of the rDNA clusters in RIR17 were placed on different contigs; a single copy of rDNA was found in unitig_311, _312, _35, _356, _4, and _52, and two copies were found in unitig_39 and _62 (Fig. 2b, Supplementary Data 9). In the cases where two rDNA clusters were found, the two copies resided apart from each other and did not form a tandem repeat; the distances between the clusters were over 70 kb (76,187 bases in unitig_62 and 89,986 bases in unitig_39, Fig. 2b, Supplementary Data 9), the internal regions contained 31 and 42 proteincoding genes, respectively, and the two clusters were located on reverse strands from each other (Fig. 2a, Supplementary Data 9). Since all rDNA copies are located over 28 kb away from the edge of each contig (Fig. 2a, Supplementary Data 9), the lack of TRSs is unlikely to be an artifact derived from an assembly problem often caused by highly repetitive sequences.
The lack of tandem rDNA structure was also supported by mapping our PacBio reads to RIR17 and searching for rDNA on JGI_v2.0 assemblies. BWA-MEM 52 mapping showed multiple PacBio reads across the 5′ non-coding region, 48S rDNA, and 3′ non-coding regions of each rDNA contig (Fig. 3a, Supplementary  Figure 4). Because our PacBio analysis directly sequenced the DNA molecules in AMF, this syntenic structure is not due to chimeric fragments from DNA amplification but reflects the natural sequence. The 5′ and 3′ non-coding regions of each rDNA have sequences that are not similar other than the highly similar 5′ regions on c62-1 and c62-2 ( Fig. 3b and Supplementary Figure 4), negating the possibility of mapping confusion due to sequence similarity. We reproducibly obtained the PacBio reads passing the rDNA regions from our three PacBio datasets, which had been constructed from different spore suspensions. Furthermore, our rDNA searching by RNAmmer detected a non-tandem 48S rDNA region from three JGI_v2.0 scaffolds (Fig. 3c, Supplementary Figure 5 and Supplementary Table 10). Although the seven rDNAs cannot be reconstructed from JGI_v2.0, two partial rDNA sequences on JGI_v2.0 had corresponding down-or upstream sequences that matched our RIR17 rDNAs (Supplementary Figure 5 and Supplementary Table 10), indicating that our assembly around the rDNA genes is consistent with previous assemblies.
We then examined polymorphism among the 45S rDNA clusters on RIR17. rDNA heterogeneity has been reported in various AMF species, including DAOM-181602 13,17,25,29 . However, the distribution and degree of the variation among the rDNA paralogs were unclear. Pairwise comparisons of the ten rDNA copies detected 27.3 indels and 106.1 sequence variants with 98.18% identity on average (Supplementary Data 10 and  Supplementary Table 11), whereas the sequences of rDNA clusters at c311-1 and c52-1 were identical. Polymorphisms were distributed unevenly throughout the rDNA; percent identities were 99.91% in 18S rDNA, 97.93% in 28S rDNA, 96.65% in 5.8S rDNA, 93.45% in ITS1, and 90.28% in ITS2 (Fig. 4, Supplementary Data 10 and Supplementary Table 11). The number of polymorphic sites in R. irregularis rDNAs reached 4.07 positions per 100 bases, much higher than in other fungi, which have polymorphic sites at 0.04-1.97 positions per 100 bases ( Table 2). The rDNA polymorphisms observed in RIR17 covered most of the polymorphisms previously reported in this species (Fig. 5), providing incentive to review the molecular ecology of AMF. The degree of intragenomic variation was not high enough to disrupt species-level identification (i.e., all rDNA genotypes from a genome are jointed to the R. irregularis clade), but was sufficient to cause erroneous identification of R. irregularis strains (Fig. 5). These findings pose a caution that previous studies on geographic distribution 8 , species identification 28 , and evolutionary processes of AMF assuming rDNA homogeneity require reevaluation, considering the high-level intra-genomic heterogeneity of rDNA sequences in AMFs. Recently developed population genomic approach with ddRAD-Seq 53 may also help us understand the biodiversity of AMF.
A model for the relaxation of rDNA homogeneity. The revealed non-tandem structure of AMF rDNA led to a model for the mechanism responsible for its intragenomic heterogeneity. Kuhn et al. 54 and Pawlowska and Taylor 25 predicted that rDNA heterogeneity is caused by relaxation of concerted rDNA evolution in Glomerales including Rhizophagus. However, details of the relaxation have been unclear. Here, we propose a hypothetical mechanism: the loss of TRSs precludes the presence of DNA conformations associated with rDNA amplification and the maintenance of its homogeneity. The standard model of concerted rDNA evolution needs two or more tandemly repeated rDNA segments because the rDNA duplicates using tandemly repeated rDNAs as binding sites and templates for replication (Supplementary Figure 1c) 55 . Although non-tandem rDNAs are rare in eukaryotes, this trend of heterogeneity in non-tandem rDNAs has been detected; Arabidopsis thaliana has one pseudogenic rDNA (lacking 270 bases of an important helix as rRNA) besides the main tandem repeat rDNA arrays 56,57 , and the lack of rDNA tandem repeats in malariacausing Plasmodium parasites 50,58 indicates intragenomic rDNA polymorphisms. These observations support our hypothesis that rDNA heterogeneity in AMF is related to their lack of TRSs. AMF species may not amplify their rDNA by the general eukaryotic rDNA amplification system (USCR), which may increase their rDNA heterogeneity.
On the other hand, our phylogenetic analysis suggests that AMF has a system to maintain weak similarity among the paralogs without TRSs. Previously observed rDNA heterogeneity in Glomerales suggests that concerted evolution was relaxed before the diversification of Rhizophagus species 25,29 . When the observed ten rDNAs duplicated before speciation and evolved independently, each of the duplicated genes formed a clade with orthologs in other species. However, we found no orthologous rDNA genes from other Rhizophagus species (Fig. 5). Our tree suggests that the observed rDNAs in R. irregularis either expanded or were assimilated after speciation. One hypothetical mechanism that would cause this similarity is homologous recombination via synthesis-dependent strand annealing (Supplementary Figure 6) 59 . This conserved system to repair doublestrand breaks (DSBs) results in non-crossover recombination and gene conversion wherein nonreciprocal genetic transfer occurs between two homologous sequences (Supplementary Figure 6). Decreases in divergence by gene conversion are widely observed in duplicated genes. RIR17 showed that two rDNA pairs on the same contigs (c39-1 and c39-2, c62-1 and c62-2) had higher similarity than other paralogs (Fig. 4c). This similarity may be caused by the high gene conversion rate between these loci.
Our model raises a new question about the mechanism that maintains the number of rDNAs without gene duplication by USCR. Even if rDNA lacks TRSs, crossover recombination and single-strand annealing delete paralogous genes. Observed inverted repeat structures between rDNAs in proximity may contribute to inhibiting single-strand annealing between them and prevent copy number reduction. Plastidial rDNAs of land plants also make inverted repeat structures and conserve two rDNA copies on their plastidial DNA. Another probable system is the suppression of crossover recombination. When Holliday junctions dissociate without crossover, DSBs are repaired without gene number reduction. AMF species may keep their rDNA copy number by their highly controlled Holliday junction dissociation. Although the detail of AMF reproduction system and its contribution to the recombination still have many unresolved issues 60 , it should be noted that the majority of these crossovers arise during meiosis in eukaryotes 59 and R. irregularis can asexually make the spore without meiotic stage.  Supplementary  Table 2) produced 18,889,290 reads (read length = 100-301 bases) from DAOM-181602. We mapped the reads to all gene models from RIR17 (43,675 protein-encoding isoforms and ten 48S rDNA paralogs) and estimated the expression levels of each gene by eXpress software 61 . All rDNA paralogs had over 5000 Fragments Per Kilobase of exon per Million mapped fragments (FPKMs) ( Table 3), and multiple reads were matched to the specific region of each paralog, indicating that the ten rDNA copies are transcriptionally active. In general, eukaryotes silence a part of the rDNA copies 62 , and some eukaryotes change the transcribed rRNA sequences by RNA editing 63 . These editing and silencing processes were not detected in the AMF, and the rRNA was as polymorphic as the rDNA. These results show that DAOM-181602 has multiple types of ribosomes, each containing different rRNAs. Additionally, we detected highly duplicated ribosomal protein genes (e.g., ribosomal protein S17/S11) (Supplementary Tables 6 and Supplementary Data 5) and tRNA genes, indicating unknown amino acid isotypes, which may also account for the heterogeneity of ribosomes (Supplementary Table 12). The evolutionary significance of the non-tandemly repeated heterogeneous rDNAs is unclear. One of the probable factors is a reduction in the need to maintain numerous rDNAs in a genome. As described in the above sections, the AMF rDNA copy number suggests a system that efficiently produces rRNA from a few rDNAs, and the inverted repeats structure of rDNAs will also reduce the deletion rate of rDNAs. AMF may thus no longer need to rapidly amplify rDNA copies using TRSs, and the slowed replacement rate of rDNA may then cause the heterogeneity as a side effect. Another possibly adaptive effect is the enhancement of phenotypic plasticity by ribosomal heterogeneity (Fig. 6). Recent studies have started to reveal that various eukaryotes (e.g., yeast, mice, and Arabidopsis) produce heterogeneous ribosomes and subsequently alter phenotypes via proteins translated by particular ribosomes 64 . Accelerated accumulation of AMF rDNA mutations by the lack of TRSs may lead to functional variety in produced ribosomes and increases in the rate of adaptation by different translation activities within the same species. Previous studies have reported that large variations in the fungal phenotype were observed among single spore lines derived from one parent AMF [65][66][67] . This might be not only due to genetic variation but also due to variations in each rDNA expression. Although the functional effects of observed rDNA mutations remain to be determined, the middle area of our 28S rDNA (4450-4500 bases on c62-1) had a higher mutation rate than ITS regions (Fig. 4a). Because the ITS regions (encoding nonfunctional RNA) vary under neutral mutation rates, the accumulated variants in the middle-28S region may have Indel Transversion Transition 0 c g g g g a a t t g c g g g g a a t t g c g g g g a a t t g c g g g g a a t t g c g g g g a a t t g c g g g g a a t t g c g g g g a a t t g c g g a g a a t t g c g g a g a a t t g c g g a g a a t t g g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g c c t t t t a g g g g g -g g g a c t t g g g g ga c t t g g g g ga c t t g g g g ga c t t g g g g ga c t t g g g g ga c t t g g g a aa c t t g g g g aa c t t g g g g aa c t t g g g g g t a c t t t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a t t t a a a a a a a -t t a a a a a a a g g a t t g g a t t g g a t t g g a t t g g a t t g g a c t g g a c t g g a c t c c g t t g g a g t c c g t t g g a g t c c g t t g g a g t c c g t t g g a g t c c g t t g g a g t c c g t t g g a g t c c g t t g g a g t c c g t t g g a g t c c g t t g a a g t a g a t t t c t t a a g a t t t c t t a a g a t t t c t t a a g a t t t c t t a a g a t t t c t t a a g a t t t c t t a a g a t t t c t t a a g a t t t c t t a g g a t c t c t t a g g a t c t c t t a functional effects favored by natural selection (via diversifying selection). This region is thus a useful target for the future functional analyses of AMF rRNA.
AMF species are similar to the malaria parasite in that they both have heterogeneous non-tandem rDNAs and infect distantly related host species 50 . In the malaria parasite, changes in the ribosome properties depend on the host (human or mosquito), which is likely able to alter the rate of translation, either globally or of specific messenger RNAs, thereby changing the rate of cell growth or altering patterns of cell development 50 . The relationship between the diversity of host organisms and rDNA polymorphisms will be an important area for further research. The phenotypic plasticity caused by heterogeneous translation machinery may allow adaptation for various host species having slightly different symbiotic systems. Previous studies have proposed that the heterokaryosity in AMF species drives variable genetic combinations of mycelia 68 . Recent genomic studies, furthermore, discovered signatures of sexual reproduction within the dikaryon-like stage 16,69 . Our hypothesis does not exclude current theories for the genetic and phenotypic plasticity of AMF species (heterokaryosis and sexual reproduction) but proposes a multilayered diversification mechanism leading to their widespread distribution.

Conclusion
We here reported an improved genome assembly of R. irregularis DAOM-181602. Improved genome revealed that common concepts of eukaryotic rDNA are not applicable to AMF. Its rDNA copies are highly heterogeneous, reduced in number, and lack TRS. This frequently used ecological and phylogenetic marker gene should be adopted cautiously for AMF. The sequence diversity and reduced copy number of rDNA may result from the collapse of the concerted evolution system due to the lack of TRS. Although the adaptive significance of the TRS lacking in AMF remains to be determined, future investigations on the functional differences among the heterogenous rRNAs may reveal mechanisms that facilitate adaptation to various environmental conditions in AMF.

Methods
PacBio-based assembling. DNA preparation: The DNA sample for the PacBio and Illumina sequencing was extracted from a commercial strain of R. irregularis DAOM-181602 (MYCORISE® ASP, Premier Tech Biotechnologies, Canada). The DNA extraction followed the method of Fulton et al. 70 with some modifications described below. Purchased spore suspensions (including approximately 1,000,000 spores) were centrifuged (4500 rpm, 20 min), and washed three times with distilled water. Precipitated spores were frozen with liquid nitrogen, ground with pestle, and dispersed in extraction buffer (100 mM Tris-HCl pH 8.0, 20 mM EDTA, 0.75% sarkosyl, 0.1% PVP, 0.75% cetyl trimethylammonium bromide (CTAB), 0.13 M sorbitol, 0.75 M NaCl, and 0.1 mg ml −1 proteinase K). After incubation at 37°C for 10 min, the aqueous phase was centrifuged (15,000 rpm, 4 min), and the pellet was discarded. An equal volume of phenol/chloroform (1:1, vol:vol) was added, and the sample was gently mixed and centrifuged (15,000 rpm, 2 min). The aqueous phase was collected, and an equal volume of chloroform was added to the sample, which was then mixed and centrifuged (15,000 rpm, 2 min). The aqueous phase was collected again, and 1:10 vol of sodium acetate and 0.7 vol of isopropanol were added. The sample was then mixed and centrifuged (12,000 rpm, 20 min). The resulting pellet was washed twice with 70% EtOH and eluted with TE buffer. Extracted DNA was purified with Genomic-tip (Qiagen, Germany) following the manufacturer's instructions.
PacBio sequencing: Long-read sequences were generated with a PacBio RS II sequencer (Pacific Biosciences, Menlo Park, CA, USA) using a DNA/Polymerase Binding Kit P6 v2 (Pacific Biosciences) and a DNA Sequencing Reagent Kit 4.0 (Pacific Biosciences). The library was prepared according to the 20-kb Template Preparation Using BluePippin™ Size-Selection System (Sage Science, MA, USA). A total sequence of 11.7 Gb in 955,841 reads (76× coverage of the genome, assuming a genome size of 154 Mb) was obtained from 29 SMRT cells (Supplementary Table 2). The N50 length of the raw reads was 13,107 bases.
Error correction and identification of host plant contamination. After polishing using Illumina data, we eliminated the sequences derived from contaminated DNAs during the sample preparation. BLASTn search of the polished assemblies against the refseq_genomic database detected nine assemblies showing similarity with sequences from carrot (BLAST ver. 2.2.31+, query coverage per subject >95%, percentages of identical matches >90%, bit score >1000) (Supplementary Table 2), which might be used as a host plant by the manufacturer for the cultivation of R. irregularis samples. After elimination of the nine contaminated contigs, we submitted the assemblies to the DDBJ as whole-genome shotgun sequence data (RIR17) of R. irregularis DAOM-181602 (BDIQ01).
Genomic alignment with previous genome assemblies. The quality of our genome assembly was evaluated by alignment with previously available R. irregularis DAOM-181602 genome assemblies. A one-by-one genome alignment was constructed by MUMmer ver. 4.0.0beta2 72 between RIR17, JGI_v2.0, Lin14, and JGI_v1.0 assemblies. Each genome set was aligned by the nucmer function in MUMmer, and the statistics of the alignments were extracted by the dnadiff wrapper with the default setting.
Gene prediction and annotation. De novo repeat motifs were identified using RepeatModeler ver. 1.0.8, which combines RECON and RepeatScout programs 37 . Based on the identified motif, the repetitive region in the assemblies was masked with RepeatMasker ver. 4.0.5 37 . We used the default parameters for the identification and the masking. For the gene models constructed from RIR17 assemblies, standard RNA-Seq data were obtained from R. irregularis spores and hyphae. The RNA was extracted with an RNeasy Plant Mini kit (Qiagen) after incubation of the purchased spores (MYCORISE® ASP) in a minimum nutrient medium for 1 day. An Illumina RNA-Seq library was constructed with a TruSeq Stranded mRNA Library prep kit (Illumina). The library was sequenced (101 bases from each end) on a HiSeq 1500 platform (Illumina). A total of 16,122,964 raw reads (3.2 Gb) were obtained from the library (Supplementary Table 2). After filtering low-quality and adapter sequences, RNA-Seq data were mapped to RIR17 assemblies with TopHat ver. 2.1.1 73 with the default setting.
Then, the RIR17 assemblies were processed through the RNA-Seq-based gene model construction pipeline using AUGUSTUS ver. 3.2.1 software 74 . We constructed R. irregularis-specific probabilistic models of the gene structure based on 495 manually constructed gene models from the longest unitig_392 sequence in RIR17. Manual gene models were made with ab initio AUGUSTUS analysis based on probabilistic models for Rhizopus oryzae and by manual refinement using the homology data with already-known genes and mapped RNA-Seq data. Then, with the trained probabilistic models and the intron-hints data from the mapped RNA-Seq read, 37,639 optimal gene models were constructed using the AUGUSTUS pipeline. We then confirmed whether the AUGUSTUS pipeline overlooked the called genes in previous genome studies. We mapped all transcript sequences obtained from previous gene modeling on Lin14 and JGI_v1.0 against our RIR17 genomic sequences with Exonerate 75 (ver. 2.2.0, option --model est2genome --bestn 1), resulting in the recruitment of 3933 overlooked genes. The completeness of the constructed gene model was evaluated with BUSCO ver. 2.0 39 . The BUSCO analysis used the Fungi odb9 gene set (http://buscodev.ezlab.org/datasets/ fungiodb9.tar.gz) as a benchmark and employed the -m proteins option to analyze the preconstructed protein data without the ab initio gene modeling step.
The confidences of the obtained 41,572 gene models were estimated based on (1) RNA-Seq expression support, (2) homology evidence, and (3) protein motif evidence. For the calculation of gene expression levels, we mapped our Rir_RNA_SS data and 32 RNA-Seq data submitted to the sequence read archive (SRA) database (24 data sets from DRP002784 and eight data sets from DRP003319) and calculated the gene expression levels (FPKM) using FeatureCounts 76 with the default setting (Supplementary Data 2). Homology with previously known genes was determined by BLAST searches against the orthoDB Constructed gene models were annotated by several in-silico searches. Gene functions were predicted based on BLASTp (Database = nr, RefSeq, and UniProt), and Pfam in InterProScan (Supplementary Data 2). We manually selected the descriptive nomenclatures from those four searches and submitted to the DDBJ. Orthologous relationships were classified with Orthofinder (ver. 1.1.2) 78 , and rapidity expanded/contracted families were analyzed with CAFE (ver. 4.1) 79 from Orthofinder results. Phylogenetic trees for the CAFE analysis were constructed with IQ-tree (ver. 1.6.1) 80 for maximum likelihood (ML) analysis and r8s (v1.81) for a conversion for an ultrametric tree. An ML tree was made from 159 single-copy genes from the Orthofinder results (Supplementary Data 2) and was converted to an ultrametric tree based on the divergence times of AMF-Mortierellales (460 Myr) 27 and Deuterostomia-Protostomia (550 Myr) 81 . Overlapping genes with TEs were extracted from AUGUSTUS and RepeatMasker results using bedtools (ver. 2.26.0, bedtools intersect with -wa option) 82 .
The missing ascomycete core gene (MACG) orthologs were sought using BLAST with the -evalue 0.0001 option, and the reference sequences for the MACG search were selected from protein data from an S288C reference in the  (Fig. 2a).
The number of rDNA paralogs in the genome was estimated by mean depth of coverage. We masked repetitive regions (based on RepeatModeler analysis) and all rDNA regions on RIR17 except one rDNA copy (c62-1). Then, trimmed R1 Illumina reads from Rir_DNA_PE180 library were mapped to the repeat-masked RIR17 using bowtie2 ver. 2.2.9 86 . The coverage depth of the rDNA region and 243 single-copy BUSCOs were obtained using bedtools (bedtools coverage command with -d option), and the statistics of each region were calculated and visualized by R software ver. 3.4.2 with the ggplot2 library (Fig. 2b). To prevent copy number estimation from depth fluctuation due to the intragenomic heterogeneity, we confirmed the coverage depth using the consensus sequences of all ten rDNA paralogs; the joined Illumina reads (from Rir_DNA_PE180 library) were mapped back to a consensus rDNA sequences and ten single-copy BUSCO genes from RIR17, and the depth of coverages was then counted by bedtools (genomeCoverageBed) (Supplementary Table 9).
The difference among the rDNA paralogs was calculated from the aligned sequences by MAFFT ver. 7.309 (options: --localpair, --op 5, --ep 3, --maxiterate 1000) using the pairwise comparison with CLC Main Workbench 7.8.1 (Qiagen). The mutation type was called by eye from the alignment, and we chose the c62-1 paralog as a reference sequence for mutation calling (Fig. 4a). Phylogenetic trees (Figs. 4c and 5) were constructed from the MAFFT alignment by the neighborjoining method with MEGA 88 ver. 7.0.21 under the maximum composite likelihood model and were tested for robustness by bootstrapping (500 pseudoreplicates).
Heterogeneity of translation machineries. The expression levels of the rDNA paralogs were examined with modified Illumina sequencing of R. irregularis spores and hyphae. Total RNA was extracted with an RNeasy Plant Mini kit (Qiagen) after the incubation of the MYCORISE® spores in a minimum nutrient medium for 7 days. An Illumina RNA-Seq library was constructed with a TruSeq Stranded mRNA Library prep kit (Illumina). To skip the poly-A tailing selection step in the  6 Hypothetical model for the evolution of unique rDNAs/rRNAs in AMF. Evolutionary model for the lack of TRSs in AMF and its sequence heterogeneity. The various environmental conditions (e.g., various host species) may lead to the evolution of phenotypic plasticity via multiple types of ribosomes in AMF. If the rDNA is exposed to disruptive selection, rDNA duplication by TRSs and USCR may be non-adaptive because the duplication of particular rDNA types reduces the variety of rDNA types library construction, we started from the fragmentation step of the standard manufacturer's instructions. The library was sequenced (301 bases from each end) on a MiSeq platform (Illumina). A total of 16,122,964 raw reads (3.2 Gb) were obtained from the library (Supplementary Table 2). After filtering low-quality and adapter sequences, RNA-seq data were mapped to the RIR17 assembly with TopHat with the default settings. FPKMs for each gene were calculated with eXpress ver. 1.5.1 with the --no-bias-correct option. Transfer RNAs were identified with tRNAscan-SE 89 ver. 1.3.1.
Data availability. Raw reads, genome assemblies, and annotations were deposited at INSDC under the accessions as follows: Sequence read archive: DRA004849, DRA004878, DRA004889, DRA004835, DRA005204, and DRA006039; Whole genome assembly: BDIQ01000001-BDIQ01000210; Annotations: GBC10881-GBC54553. All the other data generated or analyzed during this study are included in this published article and its supplementary information.