Phenotypic diversification by enhanced genome restructuring after induction of multiple DNA double-strand breaks

DNA double-strand break (DSB)-mediated genome rearrangements are assumed to provide diverse raw genetic materials enabling accelerated adaptive evolution; however, it remains unclear about the consequences of massive simultaneous DSB formation in cells and their resulting phenotypic impact. Here, we establish an artificial genome-restructuring technology by conditionally introducing multiple genomic DSBs in vivo using a temperature-dependent endonuclease TaqI. Application in yeast and Arabidopsis thaliana generates strains with phenotypes, including improved ethanol production from xylose at higher temperature and increased plant biomass, that are stably inherited to offspring after multiple passages. High-throughput genome resequencing revealed that these strains harbor diverse rearrangements, including copy number variations, translocations in retrotransposons, and direct end-joinings at TaqI-cleavage sites. Furthermore, large-scale rearrangements occur frequently in diploid yeasts (28.1%) and tetraploid plants (46.3%), whereas haploid yeasts and diploid plants undergo minimal rearrangement. This genome-restructuring system (TAQing system) will enable rapid genome breeding and aid genome-evolution studies.

S ynteny analyses of current genomes demonstrate that genome evolution accompanies vestiges of massive genome rearrangements, such as translocations (TLs), deletions, insertions, and copy number variations (CNVs) [1][2][3][4] . Human cancer cells also undergo multiple genome rearrangements 5 , analogous to those occurring during genome evolution and that might promote their acquisition of malignancy. It is postulated that the source of those rearrangements is DNA double-strand breaks (DSBs) in living cells induced by ionizing radiation, oxidative DNA damages, chemical modification, or abortive DNA replication; however, the causal relationship between formation of multiple DSBs and genome rearrangements requires in-depth investigation.
Genome rearrangements are thought to increase phenotypic diversity that provides critical raw materials for evolution. Point mutations represent another source of raw genetic materials for natural selection, but provide relatively small changes in DNA sequences and enable rather limited exploration of the sequence space via "fitness-random walk" for adaptation to given environmental changes. On the other hand, gross genome rearrangements, such as TLs, CNVs, and aneuploidy, might allow more extensive changes to the sequence space, thereby enabling rapid and comprehensive searches for more favorable DNA sequences. However, much remains to be elucidated about the real impact of genome rearrangements on phenotypic diversification.
Another important event during genome evolution associated with natural selection is "whole-genome duplication (WGD)" [6][7][8] , which is reportedly induced immediately following the large-scale genome rearrangements. For example, the yeast Saccharomyces cerevisiae arose from an ancient WGD along with subsequent genome rearrangements, and Arabidopsis thaliana and various cereal crops evolved through multiple rounds of WGD accompanied by genome rearrangements [6][7][8][9][10][11][12] . In addition, it has been demonstrated that polyploids have more evolutionary space for genome rearrangements and these have a major impact on phenotypes in Brassica napus 11,12 and in S. cerevisiare 13 . Genome rearrangement and WGD might contribute synergistically to genome evolution, but it is difficult to assess the relative impact of DSB induction and genome rearrangements in diploid vs haploid yeast, or tetraploid compared to diploid plants.
To address these questions, we establish a new tool to induce artificial large-scale genome rearrangements by introducing multiple genomic DSBs, which are formed simultaneously in living cells by the heat-activated endonuclease TaqI from Thermus thermophilus HB8 14,15 . By applying this system to S. cerevisiae and A. thaliana cells exhibiting different ploidy, we successfully monitor the process of genome rearrangements after formation of multiple DSBs, as well as the impact of each rearrangement on phenotypic diversification.
Notably, analyses of long-read/short-read high-throughput resequencing and chromosomal size reveal that multiple simultaneous DSB formation caused gross genome rearrangements at a higher frequency in diploid yeast cells than in haploid cells. Additionally, induced DSBs promote large-scale genome rearrangements in Arabidopsis more readily in tetraploids than in diploids. Moreover, genome-rearranged yeast cells and plants show enhanced phenotypic diversity based on their higher degree of ploidy. Our findings suggest that formation of multiple DSBs allow complexed genome rearrangements and phenotypic diversification, particularly in combination with WGD.

Results
Induction of genome rearrangements by multiple DSB formation. To monitor the process of genome rearrangement, we newly developed a method involving TaqI-aided genome rearrangement (hereafter referred to as the "TAQing system"), as illustrated in Fig. 1a. We first verified the TAQing system in vegetative haploid S. cerevisiae cells (YPH499). TaqI is a four-base endonuclease that specifically recognizes the TCGA palindrome and can theoretically cleave genomic DNA once every 256 base pairs. This TCGA sequence can help us consider the causal relationship between rearrangement events and the adjacent potential DSB sites, which are difficult to estimate in radiation-based or mutagen-based conventional mutagenesis. TaqI activity is temperature-dependent and only functions at temperatures >37°C ( Supplementary Fig. 1a). The full-length TaqI gene was conditionally expressed in the presence of Cu 2+ from vectors harboring a Cu 2+ -inducible promoter (Fig. 1b) and activated one-time by transient heat treatment. Heat treatment tentatively induced random formation of DSBs at multiple chromosomal sites, as confirmed by Rad51 foci formation ( Supplementary Fig. 1b), a DSBdependent band shift in the DNA damage checkpoint proteins Rad53 and Rad9 (Fig. 1c), and degradation of chromosomal bands in pulse-field gel electrophoresis (PFGE) (Fig. 1d). Based on the degree of broken DNA fragments according to PFGE analysis, we estimated that the DSB frequency was comparable to that in diploid meiotic yeast cells 16 , which reportedly produce~200 DSBs per cell.
This condition of TAQing with heat-activation resulted in a reduced number of substantially viable cells (~20% viability in haploids and 40% in diploids; Fig. 1e,f). Introduction of an inactive TaqI variant (dTaqI; with a D142A mutation) into yeast cells resulted in no such effects on viability (Fig. 1e), indicating that the reduced viability was caused by TaqI DNA cleavage.
To determine whether activated TaqI could induce homologous recombination (HR) in diploid cells, we monitored interallelic gene conversion of the arg4-nsp/arg4-bgl heteroalleles, finding a marked (18-fold) increase in HR frequency (Fig. 1g). In addition, loss-of-heterozygosity assays (Fig. 1h) showed that TaqI heat activation markedly induced gene conversion or unequal crossing over.
PFGE experiments revealed that 28.1% (9/32) of activated TaqItreated ("TAQed") diploids exhibited changes in chromosome size, whereas this was observed in only 3.1% (1/32) of mock controls. Importantly, no visible rearrangements were observed in TAQed haploid clones (0%, 0/16; Supplementary Fig. 2a and Table 1), suggesting that yeast diploids underwent genome rearrangements much more efficiently than haploids. These results indicated that the TAQing system was capable of inducing large-scale genome rearrangements more efficiently in yeast diploids than in haploids. A possible explanation for this is that haploids are less tolerant of gross genome rearrangements than diploids because of the lower redundancy of their genetic information.
Rearrangements between homologous sequences and DSB ends. Of the 2307 colonies of TAQed cells, we selected 178 mutants which formed smaller colonies (<20% of the area). In total, 131 of these mutants showed altered cell morphology revealed by microscopic observation (1.56 ± 0.43-fold larger cell area, mean ± SD; Supplementary Fig. 2c,d). Additionally, 11/13 mutant isolates exhibited altered chromosomal size detectable by PFGE analysis (Fig. 1i), suggesting that the massive genome rearrangements frequently accompanied phenotypic changes. To identify the precise genomic alterations, 13 mutant isolates exhibiting altered morphology were further examined by combined long-read/short-read high-throughput sequencing and tiling-array analyses ( Supplementary Fig. 2c).
The fused parental diploid WT14 (MATa/a) strain shared 0.7% single nucleotide variations (SNVs) 17 , which enabled the identification of chromosomal-rearrangement sites in the mutants. Accordingly, the genome sequences of the parental strain were reassembled from the high-throughput resequencing data generated by short-read Illumina MiSeq (>500 coverage; Illumina, San Diego, CA, USA) and long-read PacBio RSII (>87 coverage; Pacific Biosciences, Menlo Park, CA, USA) sequencing.
Comprehensive resequencing of the 13 TAQed mutant isolates (MiSeq: 73-254 coverage) showed an average of 1.3 SNVs/strain ( Fig. 2a and Table 1; Supplementary Table 1), whereas SNVs were absent in the control diploid without TaqI activation. It should be noted that the frequency of TaqI-mediated canavanine-resistant colony formation, which supposedly reflects mutation frequency, was substantially reduced (~50%) in rev3Δ (error-prone DNA   (Fig. 2b). This suggested that these mutations, as reported previously 19 , were at least partly induced by error-prone DNA repair of TaqI-mediated DSBs that proceeded not via canonical non-homologous end-joining (C-NHEJ), but via HR or microhomology-mediated end-joining (MMEJ; or alternative NHEJ, A-NHEJ) associated with a rather long resection. We found that TaqI activation promoted various types of chromosomal aberrations (Fig. 2a, Table 1), none of which were detected in the mock control. We detected 40 non-reciprocal inter-homolog short gene conversions (SGCs; 0.7-30 kb; average length: 7.0 kb) ( Fig. 2a and Table 1), confirming that yeast cells underwent HR-based repair very efficiently. Notably, all six non-homologous translocation (TL) breakpoints were at TaqI-recognition sequences (Fig. 2d, e; Supplementary Table 1) without any deletions, suggesting that the cohesive two-nucleotide overhangs at TaqI-generated DSBs could be directly joined in vivo, possibly without any resections of DSB ends. Such a high incidence of unequal TLs after DSB formation in mitosis is distinct from their scarce occurrence after natural Spo11induced meiotic DSB formation in S. cerevisiae reversion assays 20 .
Two isolates harbored chromosomes with fused TL breakpoints within Ty-retrotransposable elements, and another isolate had a chromosome fusion within a repetitive ribosomal DNA (rDNA) region ( Fig. 2c and Table 1). This was consistent with previous studies reporting that chromosomal TLs in Saccharomyces-like species are often associated with Ty elements or repetitive sequences 21 . These results indicated that the TAQed yeast cells were susceptible to massive chromosomal rearrangements, which occurred between homologous sequences or unprocessed DSB ends.
We also examined the three-dimensional (3D) positioning of loci surrounding the TL breakpoints by referring to the chromosomal contact information revealed by a previous Hi-C study 22 . According to Hi-C data, the average distance of any random two-chromosome loci in S. cerevisiae was calculated at 0.86 µm. The distances for the four pairs of six chromosomal loci for C-NHEJ-joined unequal TLs were smaller than this average, whereas those for the remaining two pairs were near this average (Fig. 2f,g). The average distance of all six C-NHEJ-joined unequal translocated loci pairs was 0.57 µm. Notably, the pairs of chromosomal-breakpoint loci for TL-Chr4:13, TL-Chr5:15, and TL-S799Chr6:YPH499Chr6 were located particularly close to each other (Fig. 2f,g). These results suggested that C-NHEJmediated TLs occurred between loci in relatively close proximity. It should be also noted that two pairs of breakpoint loci for TL-Chr10:15 and TL-Chr4:16, which were assumed to result from HR events between Ty-repetitive sequences, were located at relatively distant chromosomal domains (Fig. 2f,g).
TAQed yeast genomes confer marked phenotypic diversity. In addition to variability in the cell size and morphology (see Fig. 3a; Supplementary Fig. 2d), we also detected substantial nonmorphological qualitative changes in the TAQed yeast diploids. For example, 6.2% (11/178) isolates exhibited hyper-flocculation or hypo-flocculation phenotypes (Fig. 3b,c; Supplementary  Fig. 2c). This TAQing-induced phenotypic diversity led us to further examine whether different traits derived from two strains could be stably combined, as the two strains are subjected to cell fusion, followed by TAQing treatment.
To investigate this, we fused a xylose-utilizable haploid strain (capable of growing on xylose-containing medium) and a thermotolerant haploid strain (which can grow at 40°C) to obtain a non-TAQed diploid cell-fusion strain. This fused non-TAQed strain initially exhibited both xylose-fermentation ability and thermotolerance; however, after multiple passages, one of such combined phenotypes was lost, as it ceased to grow on xylose-containing medium at 40°C.
We then heat treated the freshly prepared fused non-TAQed diploid to induce TAQing, followed by single-colony isolation on xylose-containing medium at 40°C. As a control, non-TAQed fused strains were subjected to the same procedure in the absence of TAQing (Supplementary Fig. 5a).
Notably, only the TAQed diploid continued to grow stably under the combined stress environment during selection (Fig. 3d), suggesting that the non-TAQed fusion lost one of the combined traits during preculture. It should be noted that the ethanolproduction efficiency of the TAQed mutant in xylose-containing medium was as effective as that in glucose-containing medium (Fig. 3e). We subsequently confirmed that such combined phenotypes could be stably inherited over 10 times of passages in the TAQed fusion strain, whereas the non-TAQed fused diploid easily lost one of the favorable phenotypes ( Supplementary  Fig. 5b-d). These results showed that genome rearrangements induced by the TAQing system substantially contributed to phenotypic diversification and stable inheritance of useful combined quantitative traits after at least 10 passages.

TAQing in plants.
To monitor the rearrangement process after multiple DSB formations in multicellular organisms, we tested the TAQing system using diploid A. thaliana plants harboring the TaqI-expression vector along with the constitutive CaMV 35S promoter [TaqI + (2n)]. To examine the effects of WGD, diploid   3 2 in Ty-like regions 1 in rDNA region plants were treated with colchicine to produce tetraploids, followed by transformation of the TaqI-expression vector [TaqI + (4n)]( Supplementary Fig. 6). These plants were incubated at 37°C for 24 h to activate TaqI (Fig. 4a). We observed an~10-fold upregulation in expression of the DSB repair gene AtRAD51 23 was upregulated~10-fold in TAQed diploid [TaqI + (2n)] and tetraploid [TaqI + (4n)] plants (Fig. 4b). A β-glucuronidase (GUS) recombination reporter assay 24 revealed a marked increase (>20fold) in HR events in cotyledons of the TAQed plants (Fig. 4c retardation, likely due to DNA damage-induced cell arrest 25 , most plants were fertile and produced viable seeds with slightly lower yields ( Supplementary Fig. 7). Some progenies of TAQed plants showed changes in both morphology and biomass ( Fig. 4d; Supplementary Fig. 8a). The stem lengths of TAQed mutants #13 and #19 were 30% higher ( Fig. 4d) than that observed in the mock control. Additionally, compared to wild-type plants, the TAQed mutants showed a 40% increase in biomass (dry weight of aerial parts) (Fig. 4e) 26 . Importantly, the relative distribution of stem lengths in the 81 TAQed tetraploid (TQ4) plants was significantly broader (P < 0.01, F-test) than that in the 40 TAQed diploid (TQ2) plants ( Supplementary Fig. 8b), with the variance in the stem-length distribution in the TQ4 plants (σ 2 = 6.65 × 10 −2 ) is 4.3-fold greater than that in TQ2 plants (σ 2 = 1.53 × 10 −2 ). Thus, TAQing-induced genome rearrangement leads to greater phenotypic diversification in tetraploids than diploids.
Notably, plants undergoing massive chromosomal rearrangements exhibited changes in leaf size (Fig. 5a) that were successfully transmitted to offspring (Fig. 5b). Specifically, in the case of TQ4_e progenies, the fusion of Chr1 and Chr4 might affect the plant size. These data indicated that the new quantitative traits acquired by the TAQing system were stably inherited and retained over at least three generations.
More highly complex rearrangements occur in tetraploids. We then conducted combined tiling-array analysis and short-read high-throughput resequencing of TAQed plants, detecting~25 mutations/genome in TQ2 and TQ4 plants (Fig. 6a, Supplementary Data 1), which was 2-4-fold higher than that observed in wild-type plants and mock controls 27 . We also observed increases in the number of insertions/deletions (InDels) in TAQed progenies (Fig. 6a).
TLs are often found in the vicinity of transposable elements. Chromosome-rearrangement sites were identified by sequencing the genomes of four TQ4 plants harboring large CNVs ( Fig. 6c; Supplementary Fig. 10 and Table 2). We detected eight interchromosomal TLs, one intrachromosomal TL, three large (0.27, 0.49, and 2.03 Mb) deletions, and one tandem duplication. Additionally, large-scale chromosomal rearrangements were frequently found in TaqI-recognition sequences (Fig. 6d), suggesting that TaqI-induced DSBs caused these genome rearrangements via cohesive two-nucleotide overhangs, as observed in S. cerevisiae (Fig. 2f). Moreover, 30.7% (4/13) of large CNV events contained microhomology regions (2-3 nucleotides) lacking the TaqI-cleavage site, indicative of error-prone MMEJ/A-NHEJ events (Fig. 6d). In the other cases, similarity between TL-breakpoint sequences was not detected (Fig. 6d). Furthermore, 15.4% (4/26) of TL breakpoints were detected within transposable elements, such as those of the RC/Helitron and LTR/Gypsy families (Supplementary Table 2). A previous study suggested that these transposable elements drive plants evolution by genome shuffling 28,29 .

Discussion
Our results demonstrated that one-time activation of genomewide multiple DSB formation is sufficient to efficiently induce large-scale genome rearrangements and phenotypic diversification. These findings suggested that simultaneous formation of multiple DSBs can promotes gross genome rearrangement. We observed that the degree of rearrangements was much more pronounced in cells with higher ploidy. In yeast, massive genome rearrangement was observed in diploids, whereas little chromosomal aberration was detected in haploids. In Arabidopsis, largescale CNVs and aneuploidies were not detected in 16 diploid TAQed plants, whereas tetraploid TAQed plants more frequently induced complex CNV-associated genome rearrangements (42 CNVs in 37 TQ4 plants; Fig. 6b; Table 2; Supplementary Fig. 9a). These data suggested that increased in ploidy allowed higher genomic plasticity 30 and promoted more complex genome rearrangements. Under the activation conditions used in this study, the TAQing system was assumed to generate nearly 200 simultaneous DSBs, which might not be optimal for the investigation of the DSB effects in cells with lower degrees of ploidy. Usage of different restriction enzymes or activation conditions for TAQing might address this problem.
We observed that WGD-enhanced phenotypic diversity in TAQed yeast and plant cells. For example, the stem-length divergence was 4.3-fold larger in tetraploid TAQed plants than in diploid TAQed plants ( Supplementary Fig. 8b). One possible reason for this is that the high redundancy in genetic information acquired following WGD allowed more complexed genome reorganization, whereas such rearrangements tend to cause lethal effects in cells with limited redundancy of genetic information. Therefore, haploids with rearrangements might be quickly eliminated. Another explanation is that WGD alone can promote phenotypic diversification. It is possible that combinatorial patterns of CNVs can be more divergent in cells with higher ploidy.
The increased allowance of phenotype heterogeneity following WGD is consistent with previous studies on natural-selection processes in selfing species. Inbred yeast and selfing plant genomes likely emerge from hybrid fusions or WGD in combination with  (black) and TAQed strains (S799, blue; YPH499, red) with a typical chromosomal rearrangement, the TL. Magnified views around the TL breakpoint is shown. e TL-breakpoint sequences in five TAQed strains. f Distance distribution (histogram and cumurative index curve) of randomly selected two alleles or loci pairs for TL events in the previous Hi-C data. Red and blue dots represent loci pairs for C-NHEJ-mediated and HR-mediated TLs, respectively. g 3D location of loci pairs associated with TL events. Red and blue dots linked with lines represent loci pairs for C-NHEJ-mediated and HR-mediated TLs, respectively genomic rearrangements. For example, pentaploid or hexaploid brewer's yeast (Saccharomyces pastrianus) was likely formed from the hybridization of S. cerevisiae and the wine yeast Saccharomyces eubayanus [31][32][33] . Moreover, polyploidization and segmented chromosomal duplication in S. cerevisiae cause phenotypic changes [34][35][36] .
Therefore, we speculated that the phenotypic diversification of A. thaliana [37][38][39] and Brassica rapa 11,12 acquired during natural selection might have involved similar mechanisms.
It is intriguing that some TLs occur via C-NHEJ of DSB ends or HR between repetitive sequences. TaqI generates DSB ends with two-bases cohesive 5′ overhangs. Our results indicated that in living yeast and plant cells, many of such DSB ends are directly joined independent of HR without any resection of the DSB ends. We speculated that the majority of TaqI-induced DSBs were promptly joined without causing TLs or any sequence changes; however, these events cannot be detected by DNA sequencing. This idea seems consistent with a recent observation that HR in yeast rDNA regions is dispensable for DSB repair at arrested forks, and that suppression of end resection is important for protecting DSBs from genome rearrangements 40 . It would be instructive to investigate whether the CRISPR-associated DNAcleaving enzyme Cpf1 41 , which generates staggered DNA ends, could induce TLs between chromosomal sites of interest. Some TL breakpoints are found in homologous and/or repetitive sequences, such as retrotransposons, transposons, and rDNA. These types of rearrangements are postulated to induce large-scale genomic evolution. This idea is supported by a previous observation that rearrangements among chromosomes of S. cerevisiae-related species are often associated with retrotransposons 42 . This tendency could be extended in more complex multicellular organisms with an abundance of repetitive sequences. In the somatic cells of multicellular organisms, repetitive sequences or microhomology is often found at breakpoints associated with deleterious chromosomal rearrangements, some of which promote cancer progression.
The TAQing system offers several practical benefits. For example, the breeding of fermentative microorganisms sometimes requires the generation of hybrids from strains with different useful traits. While such fused strains can easily lose one of the useful traits during multiple rounds of passage, rearrangements induced by the TAQing system are highly stable and enable longrange retention of multiple beneficial traits.
The reason for the stable inheritance of acquired traits in TAQed strains remains unknown. In strains obtained by simple cell fusion, chromosomes harboring genes "favorable" for the acquired traits might often be eliminated during multiple rounds of cell divisions, thereby causing trait instability. On the other hand, TAQed stains carry recombined chromosomes with increased copies of favorable genes as a consequence of CNVs, which ultimately reduces the risk of trait loss over multiple passages.
The TAQing system also allows researchers to breed yeast strains for industrial use, even sterile parental strains (e.g., brewery yeast and Candida utilis). Additionally, the TAQing system can overcome the bottlenecks of conventional breeding methods, such as mutagen treatments, which results in relatively limited genetic alterations (e.g., point mutations), and radiationbased mutagenesis, which has limitation of genomic rearrangements due to the lethal effects of irradiation.
Another important approach to artificially evolving the yeast genome is "SCRaMbLE" (synthetic chromosome rearrangement and modification by LoxPsym-mediated evolution) 43 . This method is based on the synthetic yeast genome-engineering project called Sc2.0, in which large segments of the S. cerevisiae genome are systematically replaced with synthetic DNA. In SCRaMbLE, LoxPsym sites are inserted into the 3′ untranslated region of each non-essential gene to permit various chromosomal rearrangements, such as translocations, inversions, and deletions, at each site. The SCRaMbLE-induced complex rearrangements enable artificial evolution and synthesis of the yeast genome at will.
Compared with SCRaMbLE, the TAQing system is not good at systematically altering the genome or specifically altering it between coding segments. However, this system enables the shuffling of TaqI-cleavage sites in DNA elements across multiple organisms in the absence of a requirement to artificially synthesize chromosomes with LoxPsym sites. In addition, the use of CRISPR-Cas9 or different restriction endonucleases that recognize other DNA sequences allows alteration of sites of interest and the extent of subsequent genome rearrangements. Both SCRaMbLE and TAQing represent powerful and complementary tools for artificial genome evolution.
In conclusion, our findings confirmed that genome rearrangements can be promoted after WGD to markedly enhance phenotypic diversification. These finding also raises new questions to be addressed in future studies, including the degree of genome rearrangements allowed after multiple rounds of TAQing, and the natural source of DSBs in adaptive evolution. Additionally, TAQing can be used as a new tool to study genome instability and genetic networks or as a methodology for metabolic engineering to generate new genomic sequences that improve pathway flux in many organisms.
Xylose-utilizable and thermotolerant yeast preparation. Expression plasmids were constructed to overexpress the following genes of the nonoxidative pentose-phosphate pathway: XKS1-encoding xylulokinase, RK11-encoding ribulose 5-phosphate isomerase, RPE1-encoding ribulose 5-phosphate epimerase, TKL1encoding transketolase, and TAL1-encoding transaldolase. The integration vector pXhisHph-HOR7p-ScXK which is targeted to the HIS3 locus in chromosome XV, was used for the construction of the XKS1 overexpression plasmid 45 . The integration vector pXAd3H-HOR7p-ScTAL1-ScTKL1, which is targeted to the upstream region of ADH3 in chromosome XIII, was used for the construction of TAL1-and TKL1-overexpression plasmids 45 . The integration vector pXGr3L-HOR7p-ScRPE1-ScRKI1, which is targeted to the GRE3 locus in chromosome VIII, was used for the construction of the RPE1-overexpression and RKI1-overexpression plasmids 46 . All the genes were expressed under the control of the HOR7 promoter and the CYC1 terminator.
A pRS524-HOR7p-PiXI vector that enabling multicopy chromosomal integration of the Piromyces sp. E2-derived XI gene (PiXI) onto a chromosome was constructed, with the vector containing the PiXI gene (GenBank: AJ249909) along with the HOR7 promoter added 5′ of the gene and a TDH3 terminator added to the 3′ region. Furthermore, to promote HR into yeast genome, R45 and R67 sequences homologous to the rRNA gene (rDNA), and as a marker, the gene sequence of a TRP1d marker and exhibiting low levels of expression through deletion of a portion of its promoter. These R45 and R67 sequences enable multicopy integration of the sequence containing PiXI to the rDNA locus on chromosome XII 47 .
TaqI activation and viability assay. Haploid cells harboring pTaqI or the vector backbone were cultured in SD/MSG or MM at 30°C. Upon confluency, 150 µM CuSO 4 was added to the medium and cultured for an additional 4 h to induce TaqI expression. Cells were washed with water, aliquoted, and incubated for the indicated times at 42°C. Viability was assessed by plating on YPD or SD-uracil agar. Diploid strains were cultured in SD at 30°C, and addition of CuSO 4 and heat treatment were performed as described.
Assessment of HR and loss of heterozygosity assays. The MJL1720 strain, a diploid derivative of SK1 with arg4-bgl/arg4-nsp heteroalleles, was used to monitor HR frequency. After TaqI activation, cells were spread on SD-arginine and YPD plates, which were examined for ARG4 + colony formation. Fused diploid WT14 cells harboring a heterozygous mutation in the ADE2 + locus were subsequently examined for HR. Loss of heterozygosity (loss of functional ADE2 + ) was assessed by the presence of red (ade2 − ) colonies on SD plates.
Canavanine-resistance assay. To investigate Rev3-dependent TAQing-induced mutation frequency, we employed a canavanine-resistance assay. The arginine analog canavanine is toxic to cells, and mutations in the arginine permease gene (CAN1) restores this toxicity. After TaqI activation, cells were spread on SDarginine plates in the presence or absence of 100 µg/mL canavanine. Resistant colonies were counted and compared to mock-treated controls.
TL rearrangement-distance estimation from Hi-C data. To determine the closeness of TL-junction sites are close in nuclear space, 3D-coordinate information of yeast chromosomes was obtained from a previous study 22 , and 10,000 pairs of points on the chromosomes were randomly selected using Matlab_R2017b (https://www.mathworks.com/) to estimate the approximate distances between any pairs of points and plotted in a histogram (considering ã 2 µm nuclear size for the diameter). The distances between pairs of breakpoints at the sites of TL events, including six non-homologous junctions and two Ty-like homologous junctions, were also calculated from the 3D coordinates. Based on the random distance distribution, the probability of a certain distance was estimated, and each TL-junction distance was plotted. The appearance of pairs of breakpoints in the 3D-coordinate plane was plotted in the 3D space using Matlab.
Growth-rate measurement and ethanol production mass. Cells from strains W700M2, N44D, non-TAQed strain SNT17, and the TAQed mutant SNT11-2×40 were pre-cultured in SD (6.7 g/L yeast nitrogen base with all amino acids and 20 g/ L glucose) at 30°C. Cells were then shifted to 5 mL of test media SD or SX (6.7 g/L yeast nitrogen base with all amino acids and 20 g/L xylose) at the very low concentrations in test tubes for culturing at 30 or 40°C while measuring the rate of transmitted light (950 nm) volume at 1-min intervals using an ODBox-C/ ODMonitor system (Taitec Corp., Saitama, Japan). The transmission rate was converted into optical density (OD), and the average OD over 20 min was calculated for each sample. At the time points of 8, 24, and 48 h from the medium change, 250 µL of the samples was harvested to measure ethanol concentration using an F kit-ethanol (J.K. International, Tokyo, Japan) according to the manufacturer instructions. Computational analysis. Analysis methods for variant detection was differed for S. cerevisiae and A. thaliana. For yeast, genomic sequences of parental haploid strains were determined as the reference sequences for mapping and variant detection. The sequenced reads of TAQed strain genome were mapped using a Burrows-Wheeler aligner (BWA) 51 , and small variants were called using the Genome Analysis ToolKit (GATK) 52 . For Arabidopsis, the genome sequence uploaded onto the TAIR10 database (https://www.arabidopsis.org/) was used as the reference sequence for mapping and variant detection. Furthermore, sequenced reads of TAQed plant genome were mapped using commercial software (CLC Genomics Workbench v.8.5; Qiagen). These variants were also called using this software, with variant-detection workflow shown in Supplementary Fig. 12.
Yeast reference-genome sequencing. To detect the rearrangements induced by the TAQing system, genomic sequences of parental haploid strains were analyzed. Bioinformatics workflow for reference-genome construction is shown in Supplementary Fig. 13. PacBio sequencing data for the two parental haploid wild-type strains were filtered and sub-reads were generated (Supplementary Fig. 13 and Supplementary Data 3) using SMRT Analysis v.2.3.0 (Pacific Biosciences) and smrtpipe.py (v1.87.139483) with the default parameters. De novo assembly was conducted using Sprai 53 (v.0.9.9.14) with the default parameters. Newly assembled contigs were roughly equivalent to those of the widely used laboratory strain S288C (previously reported chromosomes I-XVI and ChrM; obtained from the Saccharomyces Genome Database 54 ), except chromosome XII. Because chromosome XII harbors up to 200 copies of rDNA at~10 kb in length, the contigs with partial chromosome XII sequences were separated into three or more contigs [left arm, the right arm, and the rDNA region(s) in both strains]. These consensus contigs were partially corrected using the paired-end short-read sequencing data from MiSeq (Illumina). For error correction, the MiSeq data were mapped to the backbone of the assembled contigs using BWA, and then the SNPs and InDels in the backbone contigs were detected and modified by Picard (http://broadinstitute.github.io/ picard/), GATK, and perl scripts. The contigs were further improved using the gap-closing module in PBJelly 55 (PBSuite 15.8.24), and the sequences were corrected once again with the MiSeq data as described above. Each of the contigs from YPH499 and S799 were sorted and renamed after S288C annotation using BLAST 2.2.29+ 56 . For each chromosome, centromeres with sequences identical to those of S288C were marked using EMBOSS 57 . Two haploid genome references were combined for use as the reference genome of the "diploid" cell-fusion strains.
Variant detection in haploid genome references. SNPs and InDels between the S799 and YPH499, YPH499, and S288C strains were detected, and their distributions in the respective genomes, as well as between the two strains, were determined. SNPs and InDels between S799 and YPH499 were widely spread along the entire chromosome (average 6.5 SNPs/kb and 0.7 InDels/kb;~0.72% difference; Supplementary Fig. 14) and sufficient to distinguish two different alleles, except for those in the rDNA regions of chromosome XII and some telomeric regions, using 300-bp × 2 paired-end Illumina sequencing. YPH499 derived from S288C is highly conserved with its ancestor, S288C (a~0.05% difference; Supplementary Fig. 14). SNP and InDel detection was performed as follows. First, haploid MiSeq sequencing data for S799 was aligned with that of the YPH499 reference genome using BWA. Additionally, haploid YPH499 data was aligned with that of the S288C reference genome. Then PCR duplicates were then removed using Picard (http:// broadinstitute.github.io/picard/) tools, and raw variants were called using GATK  Fig. 15).
Mutation and rearrangement detection. MiSeq reads of the cell-fusion (control) strain and 13 TAQed strains were mapped using BWA. Both whole-genome sequencing and tiling-array data were used to detect large-scale genome rearrangements and aneuploidies ( Supplementary Fig. 16). Chromosome-wide CNVs detected by both sequencing and tiling array were defined as "aneuploidy". Chromosomal rearrangements with the junction of different segments of chromosomes were defined as "translocation (TL)." For the TL detection, the local CNVs of the segemnets were also confirmed using the tiling-array data. The corresponding breakpoints were detected by extracting chimeric reads around the junction sites, which were mostly non-homologous; however, three homologousmutation points were detected. These junction sequences were further analyzed by BLAST to determine whether they were derived from Ty-like repeat sequences or rDNA regions. These TL sites except for the TL between rDNA regions were verified by PCR (local amplification of~5 kb across the junctions). Chimeric chromosomes of homologous sequences from S799 and YPH499 and exhibiting changes in partial coverage or signal intensity in both sequencing and tiling-array data were defined as "Break Induced Repairs (BIRs)". Chimeric chromosomal regions (~0.7-30 kb) in S799 and YPH499 and exhibiting changes in partial coverage detected in the sequencing data were referred to as "Short Gene Conversions (SGCs)". To search for BIR and SGC, zero-coverage regions on one of the references were marked as candidate regions to test whether their homologous chromosome pair displayed double-coverage reads 58 . If each end of the chromosome was derived from both S799 and YPH499, the mutation was annotated as a BIR, and rearranged positions were further searched from the chimera sequences around the events using BLAST, samtools (http://samtools.sourceforge.net/), and perl/shell scripts. Partial chimeric sequences from the pair within a chromosome were annotated as SGCs. Because homologous TLs, BIRs, and SGCs represent homologous rearrangements, approximate rearrangement positions were notated. Small variants were detected using Picard and GATK. SNVs and InDels reported in AT-rich regions, rDNA regions, telomeres, and regions with too low-coverage (<50% of the average coverage) and that are often seen especially around the large chromosomal rearrangements were eliminated as false positives 59 . Common SNVs and InDels detected in all control and TAQed strains were also eliminated as false positives. Genome-rearrangement/mutation events are visualized in Figs. 2a and 6c as circular plots generated by the R package (ggbio; https://bioconductor.org/ packages/release/bioc/html/ggbio.html) 60 .  Detection of InDels and structural variants was performed on locally realigned read mappings using the "InDels and Structural Variants" tool in the CLC Genomics Workbench (v8.5; Qiagen). The optional parameters are shown below; P-value threshold = 0.0001, Maximum number of mismatches = 3, Minimum number of reads = 2. Based on InDel and structural-variant information, mapped reads were locally realigned using the "Local Realignment" tool in the CLC Genomics Workbench (v8.5; Qiagen). The following parameters were changed from default: Force realignment to guidance-variants = Yes. Detection of small variants, InDels, and structural variants was performed on a locally realigned read mapping using the "Basic Variant Detection" and the "InDels and Structural Variants" tools in the CLC Genomics Workbench (v8.5; Qiagen).
Verification of small variants. After local realignment, small variants were called using the Basic Variant Detection tool in CLC Genomics Workbench (v8.5; Qiagen). To narrow small-variant candidates in TAQed strains, those detected in control plants were excluded. Variants meeting the criteria of mutation frequency (mutated read count/total read count > 0.4) were selected and manually verified. For SNVs, regions harboring variants on only one locus were excluded. InDels were selected based on two criteria: (1) <2 reads containing InDels in control plants and (2) InDel uniqueness.
Validation of structural variants. To narrow the list of called structural variants, those detected in control strains were excluded. We adopted three-step strategies to verify structural variants ( Supplementary Fig. 17). In the first step, CNV mapping results were compared with tiling-array data to verify the CNV region, and if they were not included in the list of structural variants, they were added as candidates.
In the second step, soft-clipped sequences were extracted on the boundary line of CNVs (boundary region A) where partially unmapped reads (>10% reads of total reads were partially unmapped) were highly accumulated. The soft-clipped sequences were analyzed by BLAST against the TAIR10 database (https://www. arabidopsis.org/), and those at the site of the best (e < 1E−04) BLAST result were extracted (boundary region B). Extracted sequences were analyzed by BLAST against the reference genome, and if the site of the best BLAST result was region A, it was considered a reciprocal best hit. For the third step, SAM files were analyzed, and accumulated discordant pair-end reads surrounding regions A and B were mapped onto boundary regions A and B, respectively. Alignment start position for the pair reads was subsequently analyzed, with these results agreeing with BLAST searches of soft-clipped sequences.
Plant materials and growth conditions. The A. thaliana GU-US reporter (direct repeat-type) line 1406 has been previously described 21,61 . Ecotype Col was used as a control strain. Plants were grown in corner dishes on germination medium (MS medium containing 1% sucrose and 0.8% agar) for 3 weeks with a 16-h/8-h light/ dark cycle. To activate TaqI, 1-week-old plants were grown at 37°C for 24 h. All plant strains are listed in Supplementary Data 2.
Construction of transgenic plants. The 35S:TaqI-NLS (NLS; nuclear localization signal) plasmid was constructed by cloning PCR-amplified fragments. The primers BamHI-TaqI-F and TaqI-SacI-R were used to amplify the TaqI fragment while BamHI-TaqI-F and TaqI-NLS-SacI-R were used to add the NLS tag. The amplified sequence (TaqI-NLS) was ligated into the pCR2.1 vector with the TOPO TA Cloning Kit (Invitrogen). The TaqI-NLS sequence was inserted behind the 35S promoter in the pBI121 plasmid to generate the plant transformation vector, which was introduced into the Arabidopsis line 1406 by Agrobacterium-mediated transformation using the floral dip method 62 . TaqI-NLS transgenic plants were selecting by growing T1 seeds on plates containing MS medium supplemented with 50 mg/L kanamycin. T3 progeny of homozygous plants harboring a single copy of the transgene were selected [TaqI + (2n)]. The following primers were used for cloning: BamHI-TaqI-F (5′-AGGATCCCCGGGTGGTCAGTCCCTTATGGCCCCTACA-CAAGCCCA-3′), TaqI-SacI-R (5′-AGAGCTCTGTACCTCACGGGCCGGTGAG GGC-3′), and TaqI-NLS-SacI-R (5′-AGAGCTCCCCGGGCTATCCTCCAACCT TTCTCTTCTTCTTAGGCTGCAGACCTCCCGGGCCGGTGAGGGCTTCC-3′).
Generation of polyploid plants. Three-week-old plants [control and TaqI + (2n)] were dipped in a 0.01% colchicine solution for 3 min, and progeny seeds were obtained by self-propagation. The nuclear phase of progeny plants was analyzed by flow cytometry on a Cell Lab Quanta SC MPL (Beckman Coulter, Brea, CA, USA) using a mercury arc lamp (365 nm), a UG1 excitation filter, a 560 dichroic shortpass filter, and a 450/55 bandpass filter. Nuclear extraction and DNA staining of nuclear DNA from Arabidopsis tissue were executed with CyStain UV Precise P (Sysmex Corp., Kobe, Japan). Tetraploid control plants (1406P4) and tetraploid 35S:TaqI plants [TaqI + (4n)] were selected.
Recombination assays. For recombination assays, strains 1406 and 1406P4 served as controls, and histochemical GUS staining was performed as described previously 63 . The number of GUS blue spots (indicative of a recombination event) on each cotyledon was determined visually under an optical microscope (Olympus). For each line, 30 to 100 cotyledons were analyzed.