Synthetic hybrids of six yeast species

Allopolyploidy generates diversity by increasing the number of copies and sources of chromosomes. Many of the best-known evolutionary radiations, crops, and industrial organisms are ancient or recent allopolyploids. Allopolyploidy promotes differentiation and facilitates adaptation to new environments, but the tools to test its limits are lacking. Here we develop an iterative method of Hybrid Production (iHyPr) to combine the genomes of multiple budding yeast species, generating Saccharomyces allopolyploids of at least six species. When making synthetic hybrids, chromosomal instability and cell size increase dramatically as additional copies of the genome are added. The six-species hybrids initially grow slowly, but they rapidly regain fitness and adapt, even as they retain traits from multiple species. These new synthetic yeast hybrids and the iHyPr method have potential applications for the study of polyploidy, genome stability, chromosome segregation, and bioenergy.

P olyploidy generates diversity by increasing the number of copies of each chromosome 1 . Allopolyploidy instantly adds chromosomal variation from multiple species through hybridization, while autopolyploidy leads to variation as gene copies from a single species diverge during evolution. Allopolyploidy facilitates differentiation and adaptation to new environments 2 . Many plants, animals, fungi, and other eukaryotes are ancient or recent allopolyploids, including some of the best-known industrial organisms, crops, and evolutionary radiations 3,4 .
Phylogenomic analyses support an ancient allopolyploid origin for the baker's yeast Saccharomyces cerevisiae 5 . S. cerevisiae has been one of the most important model organisms to study polyploidy in the context of evolution 6 , its effects on mutation rate 7 , and as a model of how cancer progresses as clonal populations adapt through driver mutations 8 . Despite the decreased fitness of newly generated polyploids 9 , experimental evolution assays in S. cerevisiae and comparisons of the genomes of industrial Saccharomyces interspecies hybrids have shown that they return to high fitness through the generation of aneuploidies, chromosomal rearrangements, and loss-of-heterozygosity 10,11 .
Genome rearrangements are common in Saccharomyces allopolyploids used to make fermented beverages 12,13 , but experimental tools to test the limits of polyploidy and genome rearrangements are lacking. Random chromosomal aberrations can be easily generated by using techniques from synthetic biology, such as SCRaMbLE 14 . However, SCRaMbLE is currently only available in single, partly synthetic S. cerevisiae strain, limiting the genomic diversity that can be explored.
Saccharomyces species have similar genome content, identical numbers of chromosomes (n = 16), and genomes that are mostly syntenic 15 . Since they have limited pre-zygotic barriers, interspecies hybrids can be generated easily when haploid strains of opposite mating types encounter each other. Much more rarely, diploid yeast cells can become competent to mate by inactivating or losing one MAT idiomorph or undergoing gene conversion at the MAT locus 16 . To attempt the generation of allododecaploid (base ploidy of 12n) hybrids of six species, here we develop an iterative Hybrid Production (iHyPr) method. iHyPr combines traits from multiple species, such as temperature tolerance, and through adaptive laboratory evolution (ALE), facilitates rapid adaptation to new environments. This method enables basic research on polyploidy and chromosome biology. iHyPr is applicable to research on bioenergy and synthetic biology where it can harness genomic diversity to generate more efficient strains that produce new bioproducts 17 or to combine industrially useful traits from multiple species 18,19 .

Results
Hybrids of six yeast species can be generated with iHyPr. iHyPr allowed us to experimentally test the limits of chromosome biology and allopolyploidy by constructing a series of higherorder interspecies hybrids ( Supplementary Fig. 1). First, we used two differentially marked HyPr plasmids, which each encode a drug-inducible HO gene (homothallic switching endonuclease) that promotes mating-type switching, to efficiently generate and select for two-species hybrids as done previously 20 . Next, using two newly created, differentially marked HyPr plasmids, we crossed these two-species hybrids to construct three-species and four-species hybrids. The construction of higher-order synthetic hybrids has not been reported previously. Finally, we constructed six-species hybrids using three different crossing schemes (Fig. 1,  Supplementary Fig. 2).
In all three schemes, diploid genomes were successfully introduced from each of the six parent species (Fig. 2b, Supplementary Fig. 3).
During hybrid construction, as more and more genomes were introduced, the frequency of successful matings decreased (Spearman rank-sum test R = −0.89, p value = 1.1 × 10 −5 , Fig. 3a, Supplementary Data 3), and the fitness of synthetic hybrids declined (Spearman rank-sum test R = −0.77, p value = 7.6 × 10 −4 , Fig. 3c). The fitness decrease may be due to the increased cell area, which was correlated with the increased genome size (Spearman rank-sum test R = 0.97, p value = 1.4 × 10 −5 , Fig. 3b). One model proposes that increased ploidy prolongs spindle assembly and results in more frequent spindle checkpoint-associated mitotic arrest 21 . Genetic incompatibilities could also play a role in the decreased fitnesses of interspecies hybrids.
Genome size and stability limits. The largest synthetic hybrid expanded its genome size 3.3 times (from 24 Mb to~80 Mbp) ( Supplementary Fig. 5A, Supplementary Data 2), and its cell area was 2.3 times larger than a diploid cell (Fig. 3b). The increases to cell volume, estimated from cell area, closely matched the increases to genome size. For example, 3.3-fold increase in genome size of the largest synthetic hybrid corresponded to a 3.4-fold increase in cell volume (Supplementary Data 2). These data fit well with the long-held observation that cell size is correlated with genome size within taxa and cell types 22 , as well as recent work suggesting that the DNA:cytoplasm ratio imposes the primary limitation on cell size due to fitness defects 23 .
Some species contributed many fewer chromosomes than others to the synthetic hybrids of six species (defined as the ancestor hybrids) (Fig. 2b). During construction, chromosome losses were widespread and outnumbered gains (two-sided t-test, t = −3.4408, d.f. = 6, p value < 1.37 × 10 −2 , Supplementary Data 2). These aneuploidies rose dramatically as the number of species donating genomes increased (linear regression r 2 = 0.79, p value = 3.01 × 10 −6 ) (Fig. 2a). Complete chromosomal aneuploidies were much more common than aneuploidies caused by unbalanced translocations or deletions (97.31% versus 2.69% of total detected chromosomal aberrations). Aneuploidies involving chromosome III, where 88.9% of translocations or deletions in this chromosome were unbalanced (Fig. 2b, Supplementary Fig. 3, Supplementary Note 1), were especially common because it contains the MAT locus being cut by the Ho endonuclease during iHyPr. These results suggest that, in addition to the expected gene conversion events, iHyPr generated mating-competent cells via partial chromosome losses.
Although our goal was to generate true allododecaploid (12n) six-species hybrids, a ploidy level acquired by only a few organisms 24,25 , we obtained at most eight copies of some chromosomes. The maximum ploidy reached was~7n, and none six-species hybrids were euploid (Fig. 2b). Due to massive chromosome loss, we inferred the six-species hybrid with the largest genome had an average of~7 copies of each chromosome (i.e. 12n −88) when estimated using genome sequencing and bioinformatic tools (visual inspection of sppIDer plots) or an average of~8 copies of each chromosome (i.e. 12n−64) when total DNA content was estimated using flow cytometry (Fig. 2b Mitochondrial inheritance affects genotype and phenotype. During interspecific hybridization, hybrids can inherit one of the two parent mitotypes or a recombinant version depending on the budding location 26 . In general, one of the parent mitotypes was quickly fixed during the generation of our hybrids, except for three cases: the allotetraploid Saccharomyces kudriavzevii × Saccharomyces mikatae yHRWh4, the allotetraploid S. cerevisiae × Saccharomyces uvarum yHRWh10, and the six-species hybrid yHRWh36, which were all heteroplasmic ( Supplementary Fig. 4). In rich medium at 20°C, for strains with similar numbers of hybridized species, hybrids with a S. cerevisiae mitochondrial genome (mtDNA) grew 7-15% faster than the hybrids with the mtDNA of another species (Fig. 3c, d). mtDNA inheritance was also significantly correlated with nuclear genome retention (ANOVA multifactor F value = 19.9, d.f. = 1, p value = 7.77 × 10 −4 ), with the mtDNA donor tending to contribute more nuclear chromosomes (Fig. 4, Supplementary Fig. 4). These results are consistent with recent observations in hybrids used in the fermented beverage industry 12 .
Trait combination, adaptive evolution, and genome stability. Higher-order synthetic hybrids allow investigators to rapidly combine traits from many different parents, such as differences in sugar consumption and temperature preferences. To determine if the inherent chromosomal instability of these six-species hybrids could be harnessed as a diversity generator, we tested how these new six-species hybrids altered their kinetic parameters during ALE. ALE was performed for an estimated 80 generations in a medium containing glucose (as a control) or xylose, a sugar poorly metabolized by most Saccharomyces species 27 . To provide baseline xylose metabolic capability upon which to improve, we chose a S. cerevisiae parent strain that had been engineered by inserting xylose utilization genes into Chromosome IV 28,29 . Ancestor six-species hybrids grew slowly, and despite differing from each other in chromosomal composition (Figs. 2b and 4a), single-colony isolates of all 12 ALE replicates (three replicates for the two ancestor hybrids retaining the chromosome IV in two ALE conditions) outperformed their ancestors in culturing conditions identical to the ALE (one-sided Wilcoxon rank-sum test, p value = 3.51 × 10 −4 ). Many evolved strains even outperformed the S. cerevisiae reference strain (Fig. 5a), but we detected no significant differences between hybrids evolved in YPD and YPX (one-sided Wilcoxon rank-sum test, p value > 9.39 × 10 −2 ). However, in microtiter plate culturing conditions where more replicates could be achieved, evolved hybrid populations grew as much as 71% faster on xylose than the reference S. cerevisiae strain, and populations evolved on xylose outperformed those evolved on glucose (one-sided Wilcoxon rank-sum test, p value = 1.29 × 10 −2 ) ( Supplementary Fig. 6, Supplementary Data 6). Importantly, all our evolved hybrids grew well at low-temperature conditions (4°C) where the S. cerevisiae parent could not grow (Fig. 5b), demonstrating that the cold tolerance of the other parents 30,31 had been retained through hybridization and ALE.
Since maximum growth rate on xylose improved considerably regardless of whether hybrids were evolved on xylose or glucose, we hypothesized several factors that could be responsible, such as xylose cassette amplification or genome stabilization. Neither chromosome IV nor the xylose utilization genes themselves were selectively amplified in either condition (Fig. 2b, Supplementary  Fig. 7, Supplementary Data 7). Evolved hybrids with more reduced genome sizes tended to have slightly higher fitness among ALE replicates, but the correlation was not significant (Spearman ranksum test, p value 0.054) ( Supplementary Fig. 8). These results suggest that the regions of the genome that are lost or amplified matter more for adaptation than total size. Although genome instability increased after each step during the construction of the six-species hybrids (Fig. 2a), genome instability decreased after 80 generations of ALE (Fig. 6a). Nonetheless, genome sequencing of a random selection of colonies from one of the evolved six-species hybrids demonstrated that genomic diversity was still being generated at a prodigious rate (Fig. 6b, c) determine whether and when the stabilization of the hybrid genomes would be sufficient to apply to industry.

Discussion
iHyPr is a powerful addition to the yeast synthetic biology toolkit that can be used to generate and select for genome diversity, while combining industrially relevant traits from multiple parents.
Here, we used iHyPr to combine xylose utilization from a biofuel strain of S. cerevisiae with cold tolerance, a trait critical for the production of many fermented beverages [32][33][34] . Previous efforts to generate higher ploidy Saccharomyces cells were arduous. A documented autohexaploid S. cerevisiae strain was produced by using a complex combination of auxotrophic intermediates 9 , and allotetraploids of S. cerevisiae × S. kudriavzevii have been generated by using protoplast fusion and raremating 35 . Recently, a CRISPR/Cas9 system was developed to switch mating types and generate tetraploid yeast cells in a manner similar to HyPr 36 . We show here that iHyPr can be used to produce higher-order hybrids iteratively without additional transformations. Our six-species hybrids were unstable and quickly lost chromosomes, a result nicely modeled with polyploid yeasts. As the number of chromosomes increases, the time required for spindle assembly increases, generating chromosomes without attached spindles, which results in chromosome loss during anaphase 21 .
The levels of allopolyploidy reached in our study will facilitate the understanding of the cellular fitness consequences in eukaryotes. The increase in ploidy was associated with a short-term fitness defect. However, ALE rapidly improved fitness, while allowing multiple parent traits, such as cold tolerance and xylose utilization, to be retained. Since both ALE selection regimes improved performance on xylose, most improvements were likely not in direct response to the environment. Instead, they may have been driven by endogenous factors, including possibly genome  yHRWh82  yHRWh83  yHRWh84  yHRWh85  yHRWh86  yHRWh87  yHRWh88  yHRWh89  yHRWh90  yHRWh91  yHRWh92  yHRWh93   I II III  IV V  VI  VI   I  VI  II IX  X XI  XI   I  XI   II  XI   V  XV  XV I  m  tD  N  A  I II III  IV V  VI  VI   I  VI  II IX X XI  XI   I  XI   II  XI   V  XV  XV I  m  tD  N  A  I II III  IV V  VI  VI   I  VI  II IX  X XI  XI   I  XI   II  XI   V  XV  XV I  m  tD  N  A  I II III  IV V  VI  VI   I  VI  II IX X XI  XI   I  XI   II  XI   V  XV  XV I  m  tD  N  A  I II III  IV V  VI  VI   I  VI  II IX  X XI  XI   I  XI   II  XI   V  XV  XV I  m  tD  N  A I II III IV V VI VI  I  VI  II IX X XI XI  I XI  II  XI  V  XV XV I   m  tD  N  A  I  II  III  IV  V  VI  VI   I  VI   II  IX  X Fig. 3), which were corrected based on flow cytometry ploidy estimations. a The number of chromosomal aberrations was inferred for each synthetic hybrid as new translocation, gain, and loss events not seen in the preceding hybrid ( Supplementary Fig. 3). Chromosomal aberrations involving parts of chromosomes were conservatively counted only in cases of clear fusion of entire arms, whereas smaller loss-of-heterozygosity events were not counted. The synthetic hybrids generated from each independent scheme are represented with different shapes. Color points are colored according to the number of species genomes contributing to the strain. A LOESS regression line and the 95% confidence interval of the fit are represented with a discontinuous black color and gray shadow, respectively. b Chromosome content is colored according to the species donor. Mitochondrial inheritance was inferred using mitosppIDer ( Supplementary Fig. 4). The numbers of chromosomes for each species are colored according to the left heatmap legend. Incomplete and recombinant mtDNA are colored in gray. Total number of chromosomes is shown in the right part, which is colored according to the right legend. Ploidy estimates based on de novo genome assemblies, which correlates with flow cytometry (Spearman rank-sum test R = 0.88, p value = 7.5 × 10 −8 , Supplementary Fig. 5c), are indicated at the right side. Synthetic hybrids are reported in the order constructed ( Supplementary  Fig. 2). yHRW134 and yHRWh4 are shown multiple times because of their use in multiple crossing schemes. Evolved hybrids are grouped based on the conditions in which they were evolved, and they are colored according to their ancestor hybrid. Red squares highlight chromosomes that were retained or lost in all hybrids evolved in the same condition when compared to their siblings evolved in the other condition. S. cerevisiae chromosome IV, where the xylose utilization genes were inserted, is indicated by the black square. Considerable karyotypic diversity continued to be generated during 80 generations of ALE (Fig. 6), but each evolved strain is easily recognized as more similar to its ancestor six-species hybrid. Source data are provided in the Source Data file and at http://bit.ly/2v1rq1T. stabilization or the removal of interspecies genetic incompatibilities. The capacity of this approach to generate extensive karyotypic and phenotypic diversity will be of great interest for many industrial applications.
Mitochondrial inheritance also greatly influenced the genotypes and phenotypes of our synthetic hybrids. Even though a homoplasmic mtDNA state was quickly reached in most cases, a heteroplasmic state was detected in three exceptions that were all part of the same crossing scheme, and we offer a set of related possible explanations. The presence of selfish elements, such as homing endonucleases, could explain why multiple mitotypes were retained in yHRWh4. In this case, a portion of S. mikatae COX1, a gene with a high number of introns invaded by homing endonuclease genes 37 , seems to have been introduced into the S. kudriavzevii mtDNA ( Supplementary Fig. 4A, C). An even more intriguing result occurred while generating the yHRWh10 hybrid, which remained in a heteroplasmic state and retained most of the mtDNAs of both parents (S. cerevisiae and S. uvarum) (Supplementary Fig. 4A). We recently demonstrated that, during the formation of S. cerevisiae × S. uvarum hybrids, the frequency of strains without a functional mtDNA was higher when the hybrid inherited a S. uvarum mtDNA, but introgression of the F-SceIII homing endonuclease gene restored normal mitochondrial retention 38 . Therefore, the absence of F-SceIII in yHRWh10 may have influenced the loss of mtDNA in its descendants, such as the six-species hybrid yHRWh36, which retained only small regions   Step of hybridization   Fig. 2). yHRWh4 is shown multiple times because of its use in two crossing schemes. We did not expect 100% genome contribution for each Saccharomyces species, even for recently created hybrids, because some genomic regions (e.g. repeats) are not unambiguously detectable with Illumina sequencing data. Genome size bars are colored according to each species' contribution. The strain names are colored based on the mtDNA inheritance inferred from mitosppIDer ( Supplementary  Fig. 4), with two or more mtDNAs or regions shown as a gradient. b The nuclear compositions of the S. cerevisiae parent, synthetic hybrids, and evolved hybrids are plotted according to mtDNA inheritance (n = 1 sequenced strain). Hybrids with S. cerevisiae mtDNA or with other mtDNA are colored in red and light blue, respectively. Source data are in the Source Data file and at http://bit.ly/2v1rq1T.  of Saccharomyces arboricola and S. uvarum mtDNAs (Supplementary Fig. 4C). In another recent study, mtDNA inheritance was dominated by one parent due to nuclear-mitochondrial interactions, rather than occurring stochastically 39 . The loss of mtDNAs in particular hybrid combinations, as well as the unusually high or low coverage of specific regions in others, merits further study (Supplementary Note 2). In summary, we generated and extensively characterized two-, three-, four-, and six-species synthetic hybrids, using our tool for yeast synthetic biology, iHyPr. We also improved the fitness of evolved strains, where polyploidy drives genome instability and evolution. Our higher-order allopolyploids acquired genome aberrations involving multiple species as they rapidly adapted to new environmental conditions. This technology pushes the budding yeast cell toward its limits in pursuit of basic research questions in chromosome biology and evolutionary genetics, as well as potential industrial applications.

Methods
Yeast strains and maintenance. The reference strain chosen for improvement was GLBRCY101, a haploid derivative of the Saccharomyces cerevisiae GLBRCY73 strain, which had been engineered with xylose utilization genes from Scheffersomyces (Pichia) stipitis and aerobically evolved for the consumption of xylose [27][28][29] . Representative strains were selected from five additional Saccharomyces species based on published nuclear and mtDNAs (Supplementary Data 1). These six parent strains were used to generate the six-species hybrids. Yeast strains were stored in cryotubes with YPD (1% yeast extract, 2% peptone, and 2% glucose) and 15% glycerol at −80°C. Routine cultures were maintained in YPD plus 2% agar plates at 24°C.
Two new Hybrid Production (HyPr) plasmids. We previously published two HyPr plasmids with natMX (pHCT2) and hphMX (pHMK34) resistance cassettes 20 . We amplified the ble (Zeocin resistance) and nptII (G418 resistance) coding regions for marker swaps to generate pHRW32 and pHRW40 plasmids, respectively (Supplementary Data 8). The new HyPr plasmids enabled complex, iterative crossing schemes without adding extra steps to remove one of the two HyPr plasmids between the hybridization steps ( Supplementary Fig. 1).
Saccharomyces transformation with HyPr plasmids. Before transforming GLBRCY101 with a HyPr plasmid, we removed its nuclear kanMX cassette by swapping the kanMX marker to tkMX 40 . Next, we transformed this strain using a short DNA fragment designed to allow the tkMX gene to be removed via homologous recombination and selecting for successful marker loss on synthetic complete (SC) + FUdR medium (0.17% yeast nitrogen base, 0.5% ammonium sulfate, 0.2% complete drop out mix, 2% glucose, and 50 µg/ml 5-fluorodeoxyuridine). S. cerevisiae yHWA85 and representative strains of Saccharomyces paradoxus, S. mikatae, S. kudriavzevii, S. arboricola, and S. uvarum were transformed with one of the four HyPr plasmid versions (Supplementary Data 1). The diploid parent strains contain levels of heterozygosity lower than 0.037% (Supplementary Data 1).
Transformation of yeast strains was done using the lithium acetate/PEG-4000/ carrier DNA method 41 , adjusting for the temperature tolerances of species 20 . S. cerevisiae yHWA85 was first diploidized using the HyPr plasmid pHRW40, creating yHRW134 for subsequent crosses. The generation of this diploid strain occurred in one step, which was confirmed by polymerase chain reaction (PCR) amplification of the MAT loci (see below). The experimental reference strain yHRW135 was derived from yHRW134 by screening for spontaneous plasmid loss.   Fig. 6 Synthetic hybrids as a tool to study genome instability. a Box plots of the number of chromosomal aberrations inferred for ancestor and evolved synthetic hybrids (Fig. 2b, Supplementary Fig. 3 The iHyPr method. Following the HyPr method to facilitate mating-type switch 20 , we pre-cultured strains with differentially marked HyPr plasmids in the presence of doxycycline to express the endonuclease encoded by HO, which is under a Tet-ON promoter ( Supplementary Fig. 1); each plasmid also contains the full machinery for inducible expression of the promoter. To generate the six-species hybrids yHRWh36 and yHRWh39, we first hybridized three separate pairs of species, generating two-species hybrids (Supplementary Fig. 2A, B). In each case, once the three two-species hybrids were generated, two of those two-species hybrids were themselves hybridized to create a four-species hybrid, which finally was hybridized with the last two-species hybrid to generate the six-species hybrid. To generate the six-species hybrid yHRWh56, two two-species hybrids were separately crossed with diploid Saccharomyces strains from other species to create two separate threespecies hybrids, which were then mated to generate the six-species hybrid (Supplementary Fig. 2C). Before each cross, parent strains were transformed with differentially marked HyPr plasmids ( Supplementary Data 1 and 8, Supplementary  Figs. 1 and 2) and treated with doxycycline in YPD at room temperature, except for S. cerevisiae which was incubated at 30°C. The doxycycline triggers the expression of the Ho endonuclease 20 , which cuts one or more MATa/MATα idiomorphs and generates mating-compatible strains that behave as either MATa or MATα. A sample of each culture was combined in a 1-ml Eppendorf tube and patched on a YPD plate. After 2-3 days, a sample was taken with a toothpick and streaked on a YPD plate supplemented with the corresponding drugs to select for successful matings. In contrast to the original HyPr method, we pre-cultured the new hybrid in YPD with one of the two drugs used during the selective medium step, and that hybrid was then crossed with another strain containing one or two of the other HyPr plasmids not used previously (Supplementary Figs. 1 and 2). During these subsequent steps, we expected (and phenotypically verified) the loss of the HyPr plasmid containing the drug-resistance cassette not under selection. This approach and the additional HyPr plasmids made for this study facilitated the iterative crosses required to make six-species hybrids by avoiding the steps of plasmid removal and minimizing the number of generations between crosses (Supplementary Fig. 1). The frequency of successful two-, three-, four-, and six-species hybrid generation were quantified in duplicates (n = 2) (Supplementary Data 3). The patch of co-culture was diluted in sterile H 2 O, and a sample was spread onto both YPD plates and YPD supplemented with the appropriate drugs. The frequency of successful matings was calculated as the ratio between the number of colonies observed in YPD supplemented with the corresponding drugs and the number of colonies observed in YPD.
Mating-type and PCR-RFLP confirmation of strains. Diploidization of the S. cerevisiae strain was confirmed by PCR at the mating-type locus. Hybrid statuses were confirmed by restriction fragment length polymorphism (RFLP) analysis. We used the Standard Taq Polymerase (New England Biolabs, Ipswich, MA) and the primers listed in Supplementary Data 9. Genomic DNA was extracted using the phenol:chloroform method on a strain grown from pre-culture to saturation in YPD. Aliquots of 700 µl of saturated culture were located in 1.5 ml microcentrifuge tubes that contained acid-washed beads. Each tube was centrifuged at maximum speed (21,130 × G) for 5 min, and the supernatant was removed. Two hundred microliters of buffer EB (10 mM Tris-Cl, pH 8.0), 200 µl of DNA lysis buffer (10 mM Tris pH 8.0, 1 mM EDTA, 100 mM NaCl, 1% SDS, 2% Triton X-100), and 200 µl of phenol:chloroform were added to each tube. Vigorous vortexing was performed for 3-4 min, followed by 5 min of centrifugation at maximum speed. The top aqueous layer was transferred to 1 ml 100% EtOH. After an inversion mixture, DNA was precipitated at −80°C for at least 10-15 min. A second centrifugation at maximum speed was performed, and the supernatant was discarded. We washed the pellet with 700 µl of 70% EtOH, and we centrifuged again to remove any residue or trace of the supernatant. The pellet was dried and resuspended in 100 µl of EB at 50-60°C for 30 min. To remove RNA, we incubated the solution with 0.5 µl of 10 mg/ml RNase A for 30 min at 37°C. DNA was quantified with a Qubit 2.0 Fluorometer (ThermoFisher Scientific). For PCR-RFLP, resulting PCR products were digested with a restriction enzyme or a combination of multiple restriction enzyme assays able to discriminate among Saccharomyces species (New England Biolabs, Ipswich, MA). An extended PCR-RFLP pattern, developed in previous publications 20,42 and this study, is detailed in Supplementary Data 10. Undigested PCR products were visualized on a 1.5% agarose gel, while digested PCR products were visualized on a 3% agarose gel.
Ploidy estimation by flow cytometry. Both asynchronous and hydroxyureaarrested (G1/S arrested) mid-log cultures were prepared for each strain. Hydroxyurea-arrested strains were prepared to assist in the identification of G1 peaks in samples with broad and undefined cell cycle peaks. Briefly, cultures were grown to saturation and then diluted back 1:200. Back-diluted cultures were grown to a 0.4-0.6 optical density at 600 nm (OD 600 ). For each strain, 1 ml mid-log culture was transferred into 200 μl 1 M hydroxyurea and incubated on a room temperature culture wheel for approximately half of the time that the respective strain took to grow from back-dilution to an OD 600 of 0.4-0.6. This ranged between 3 and 12 h. At the same time, 1 ml of asynchronous mid-log culture was harvested for fixation. All samples were fixed in 70% ethanol overnight, treated with RNase and Proteinase K, and finally stained with SYTOX Green dye (Molecular Probes) 43 . Stained cell suspensions were sonicated before flow cytometry. Fluorescence was measured with a BL1 laser (488 nm) on an Attune NxT (Invitrogen) flow cytometer at the lowest available flow rate. To accommodate for extremes in ploidy and cell size, voltage was adjusted to 250 for FITC dump channel (BL1) and FSC (Forward SCatter). All samples were run at the same voltage.
Flow cytometry data files were processed in FlowJo v10.4.2 (ref. 44 ). Samples were first gated on SSC (Side SCatter) and FSC to remove debris ( Supplementary Fig. 9). Doublets were then removed by gating on BL1-A and FSC-A. A histogram of BL1-A values were then generated for remaining cells. Hydroxyurea peaks were identified and gated manually. Asynchronous G1 and G2 peaks were identified by applying a Watson (Pragmatic) Cell Cycle model and identifying G1 and G2 means. When cell cycle models did not fit the asynchronous sample data automatically, hydroxyurea samples were used to identify G1 peaks and these were manually gated to constrain G1 in asynchrounous samples. Ploidy estimation were performed by comparing with fluorescence values of a haploid laboratory reference S. cerevisiae strain, S288C (Supplementary Data 1).
Cell size estimation. A subset of strains was used for microscopy analysis of cell size. Each strain was spotted from frozen stock onto YPD agar plates and grown at room temperature for 4 days. Water-cell suspensions were prepared for each strain, which were bright-field-imaged on an EVOS FL Auto 2.0 (Invitrogen) imaging system at ×400. Cell area was analyzed in FIJI v2.0.0-rc-34/1.5a 45 using the Analyze Particles tool. Cell volume was estimated by assuming spherical cells and calculating radii (r) from cell area measurements.
Fitness quantification of the newly generated hybrids. To measure the impact of genome size increases on fitness, we performed a growth test in a rich medium. All parent species and the two-, three-, four-, and six-species hybrids were precultured in 3 ml of YPD at room temperature. After pre-culture, 10 µl of saturated culture was inoculated into a 96-well plate (Nunc, Roskilde, Denmark) containing 240 µl of YPD. Spaces between the wells in the plates were filled with sterile H 2 O to maintain the humidity of the plates and limit culture evaporation.
To monitor the growth of strains and populations in the different media conditions, the inoculated 96-well plate was placed in a BMG FLUOstar Omega (Ortenberg, Germany) at 20°C. Absorbance at 595 nm was monitored every 15 min for 4 days. Background absorbance was subtracted from the average of nine negative controls containing the uninoculated medium being tested. Kinetic parameters for each condition were calculated in GCAT v6.3 (ref. 46 ). Median and standard deviations from six independent biological replicates, except yHRWh36 and yHRWh56 from which we obtained three replicates, were calculated in R (ref. 47

) (Supplementary Data 4).
Genome sequencing and chromosome composition analyses. Genomic DNA (gDNA) samples from the diploidized S. cerevisiae strain and the two-, three-, four-, and six-species hybrids were submitted to the DOE Joint Genome Institute for pairedend Illumina sequencing. Evolved six-species hybrids and six individual colonies from yHRWh88 (see ALE section) were also submitted for sequencing. Libraries were constructed according to the manufacturer's instructions. Sequencing of the flow cell was performed on an Illumina MiSeq using MiSeq Reagent kits, following a 2×150 nucleotide, indexed run recipe. Data were collected with Illumina RTA 1.18.54 and converted to fastq with Bcl2Fastq 2.20.0. Curated raw reads were submitted to the SRA database as Bioproject PRJNA476226 (Supplementary Data 11).
Genomic characterization was performed with sppIDer v1 (ref. 48 ). Our combined nuclear reference genome was built with the genome assemblies of S. cerevisiae GLBRCY22-3 (ref. 49 ), which is a close relative of the biofuel reference strain used here; S. paradoxus CBS432; S. arboricola CBS10644 50 ; S. mikatae IFO1815; S. kudriavzevii ZP591; S. uvarum CBS7001 18 ; and Saccharomyces eubayanus FM1318 (ref. 51 ). Our combined mitochondrial reference genome was built with the mitochondrial assemblies of the aforementioned strains 50-52 , except for CBS7001, whose mtDNA is still not completely assembled 38 . Instead, we used the mtDNA of a close relative, S. uvarum CBS395 52 . Raw Illumina paired-end reads and the combined reference genomes were the input data of sppIDer, which is a wrapper that runs published tools to map the short reads to the combined reference genomes and creates several colorful and visually intuitive outputs 48 . Here, we show depth of coverage plots from those species contributing genomes.
For each strain, the number of chromosomes and the ploidy were estimated from the sppIDer plots. This approximation gave a significant positive correlation with the ploidy estimated by flow cytometry (Spearman rank test r = 0.91, p value = 3.2 × 10 −6 ) ( Supplementary Fig. 5C). The number of chromosomal aberrations was based on the number of gains, losses, or unbalanced translocations detected in the sppIDer plots (Supplementary Data 2). One chromosomal gain, loss, or unbalanced translocation was counted as one aberration. Aberrations observed in one hybrid and maintained in the offspring of subsequent crosses were not counted again; only new aberrations for each cross were reported in the aberration plot (Figs. 2a and 6a). Chromosomal aberrations involving parts of chromosomes were conservatively counted only in cases where there were clear fusions of entire chromosome arms.
Genome size and ploidy quantification. Two different approaches were performed to quantify the genome size of the sequenced strains. In the first approach, genome assemblies were performed using the collection of assemblers included in iWGS v1.1 (ref. 53 ). The assembly with the best assembly stats reported by iWGS was selected, and the genome size was reported (Supplementary Data 2). In the second approach, sppIDer coverage outputs (StrainName_winAvgDepth-d.txt) were parsed to quantify the percentage of each Saccharomyces nuclear genome retained in the hybrid, which was calculated as follows: where Pspp is the percentage for one of the parent species; Ct is the number of windows with a coverage mean value above 2; Ws is the window size; and Gs is the reference genome size for that parent species. These two calculations yielded a good approximation of the increased genome size, but both generated estimates that assumed the highly homozygous genome donated by each parent was haploid; iWGS and sppIDer plots were significantly correlated (Spearman rank test r = 0.95, p value = 2.2 × 10 −16 , Supplementary Fig. 5D).
To get a better approximation of the genome size of each allopolyploid, we first determined the total number of copies of each chromosome contributed by each species, as quantified by sppIDer. Genome size was then calculated by multiplying the number of copies of each chromosome by its length and adding all these values together. Genome size and flow cytometry fluorescence were correlated (Spearman rank test r = 0.93, p value = 1.1 × 10 −7 , Supplementary  Fig. 5B).
Quantification of the number of copies of the xylose utilization cassette. Illumina reads were extracted using the xylose utilization cassette sequence (8.7 kbp) as bait for HybPiper v1.2 (ref. 54 ). The generated bam files were viewed and sorted with samtools v1.4 (ref. 55 ), and the coverage for each nucleotide was quantified with genomeCoverageBed, which is included in bedtools v2.2.27 (ref. 56 ). The mean coverage of the coding sequence of the three engineered xylose utilization genes (XYL1, XYL2, and XYL3) (3.9 kbp) was calculated from the genomeCoverageBed output. For the chromosome IV, mean coverage values for windows of 3.9 kbp were calculated from the genomeCoverageBed output generated by sppIDer. The cassette value and chromosome distributions for each strain were compared by a one-side Wilcoxon rank-sum test for a significant deviation from the expected ratio 1:1 (one copy of the cassette to one copy of chromosome IV) (Supplementary Data 7).
ALE and colony selection. Two of the three six-species hybrids (during construction, the third lost S. cerevisiae chromosome IV, where Sch. stipitis xylose utilization genes had been inserted) were evolved in triplicate at room temperature in tubes with two independent media conditions: 3.0 ml YPD or 3.0 ml YPX (1% yeast extract, 2% peptone, and 2% xylose). Three to five days of fermentation were performed to allow cells to consume the sugars, and an aliquot of each replicate was transferred at of 0.1 OD 600 to a fresh medium until it reached approximately 80 generations. A colony from each independent ALE experiment, regardless of whether they were evolved in glucose or xylose, was selected on YPX plates (1% yeast extract, 2% peptone, 2% xylose, and 2% agar) and cryopreserved.
Microtiter plate growth curves. We compared the growth kinetics of the S. cerevisiae reference strain yHRW135, the ancestors of the two six-species hybrids retaining the chromosome IV (yHRWh39, yHRWh56), and populations of the evolved hybrids. Growth was tested in YPD and YPX at room temperature. Strains or populations were pre-cultured in 3.0 ml YPD or YPX, depending of the medium tested. After pre-culture, 10 µl of saturated culture was inoculated into a 96-well plate (Nunc, Roskilde, Denmark) containing 240 µl of identical medium as the preculture. Spaces between the wells in the plates were filled with sterile H 2 O to maintain the humidity of the plates. The reference strain was cross-inoculated in all conditions; for example, yHRW135 pre-cultured in YPX was tested in both YPD and in YPX.
To monitor the growth of strains and populations in the different media, we inoculated 96-well plates and placed them in a BMG FLUOstar Omega at 20°C. Absorbance at 595 nm was monitored every 15 min for 5 days. Background absorbance was subtracted from the average of three negative controls containing the uninoculated medium being tested. Kinetic parameters for each condition were calculated in GCAT v6.3 (ref. 46 ). Median and standard deviations from three independent biological replicates were calculated in R 47 (Supplementary Data 6). For each medium, parameters were normalized against the data generated by the reference strain yHRW135 when it was pre-cultured and grown in the medium tested.
Cold tolerance spot test. Temperature growth profiles are well known to vary among Saccharomyces species 30,31 . In particular, S. uvarum and S. kudriavzevii are able to grow at low temperatures where S. cerevisiae cannot grow. To test if some phenotypic traits might be retained independently of the media regime, we performed spot tests in rich medium at different temperatures (22, 10, and 4°C). The S. cerevisiae reference strain (yHRW135) and the evolved six-species hybrids were compared. All strains were pre-cultured in liquid YPD medium at room temperature to saturation. Cultures were subjected to a series of 10-fold dilutions in YPD. Five microliters of each dilution was spotted onto three YPD agar plates identically. Plates were incubated in sealed plastic bags to keep them from drying out at the temperatures mentioned above. Each plate was photographed when most strains exhibited significant growth (4 days for 22°C, 11 days for 10°C, and 38 days for 4°C).
Culture wheel growth curves. Strains isolated from single colonies from evolved hybrids, ancestor hybrids, and the reference strain (yHRW135) were pre-cultured in YPX and inoculated at an initial OD 600 of 0.1 into 3 ml glass tubes containing YPX. Growth was monitored by measuring OD 600 . Kinetic parameters were calculated as above. Median and standard deviations from six independent biological replicates were calculated as above. These experimental conditions most closely matched the conditions in which the strains were evolved, and they are reported in Fig. 5a and Supplementary Data 5.
Statistical analyses. Data analyses and plots were performed in R v3.6.1 (ref. 47 Fig. 2a using the geom_smooth option in the R package ggplot2. For aberration data (Fig. 2a), r 2 and significance of regression were calculated with summary(lm(y~x)), where x was the number of species, and y was the number of observed aberrations. Correlations for ploidy and assembly comparisons were calculated in R using the ggpubr v0.2.3 package to apply a Spearman rank-sum test (Fig. 3, Supplementary Figs. 5 and 8), and plots were generated using ggplot2.
The impact of mitochondrial inheritance (Fig. 4b) in the retention of the nuclear genome of those hybrids involving S. cerevisiae was tested using a multifactor ANOVA in R, using summary(aov(P~M * C)), where P is the percentage retained of the S. cerevisiae nuclear genome; M is the mtDNA, which was encoded as a binary character (either as the S. cerevisiae mtDNA or that of another species); and C is the type of strain (i.e. classified as the S. cerevisiae parent; two-, three-, four-, or ancestor six-species hybrid; and evolved six-species hybrid). t-tests for significant differences between frequency of chromosome gains and losses and Wilcoxon rank-sum tests for significant differences in the kinetic parameters shown in Figs. 3d and 5a and Supplementary Fig. 6, respectively, were performed in R.
Flow cytometry data were analyzed and plotted in R.
Correlations were tested in R using a Spearman rank-sum test and plotted using ggplot2.
The quantified numbers of chromosomes and mitochondrial inheritance inferred from sppIDer were represented by heatmaps using the R package pheatmap v1.0.12.
Statistics and reproducibility. iHyPr was tested three times using different schemes. The rest of experiments were tested once with biologically independent replicates according to the legend descriptions in each figure.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.