Contrasting genome dynamics between domesticated and wild yeasts

Structural rearrangements have long been recognized as an important source of genetic variation with implications in phenotypic diversity and disease, yet their evolutionary dynamics are difficult to characterize with short-read sequencing. Here, we report long-read sequencing for 12 strains representing major subpopulations of the partially domesticated yeast Saccharomyces cerevisiae and its wild relative Saccharomyces paradoxus. Complete genome assemblies and annotations generate population-level reference genomes and allow for the first explicit definition of chromosome partitioning into cores, subtelomeres and chromosome-ends. High-resolution view of structural dynamics uncovers that, in chromosomal cores, S. paradoxus exhibits higher accumulation rate of balanced structural rearrangements (inversions, translocations and transpositions) whereas S. cerevisiae accumulates unbalanced rearrangements (large insertions, deletions and duplications) more rapidly. In subtelomeres, recurrent interchromosomal reshuffling was found in both species, with higher rate in S. cerevisiae. Such striking contrasts between wild and domesticated yeasts reveal the influence of human activities on structural genome evolution.

rearrangements are needed to understand the evolutionary balance between genome stability and 17 plasticity as well as its functional implications in adaptation and disease. 18 The major challenge of studying structural rearrangements at the genomic level lies in the limited power 19 and resolution of detecting rearrangement variants with traditional capillary and short-read sequencing 20 data. Moreover, many structural rearrangement are embedded in complex genomic regions that are 21 repetitive and highly dynamic, which further exacerbate this problem. For example, subtelomeric regions 22 are known hotspots of rampant interchromosomal reshuffling which significantly contribute to inter-23 individual genetic variation and phenotypic diversity of eukaryotic organisms (Pryde et al. 1997; Mefford 24 and Trask 2002; Eichler and Sankoff 2003;Dujon 2010). However, the detailed landscape and 25 on the ancestral chromosomal identity of the chromosomal cores that they are attached to, which accounts 1 for the large-scale interchromosomal rearrangements that have occurred in some strains (Table S12). For 2 example, we named the subtelomere located on the right arm of chrXI in UWOPS03-461.4 as the "chr07-3 L subtelomere" since this subtelomere, together with its flanking chromosomal core, came from the left 4 arm of the ancestral chromosome 7 (chr07-L) ( Figure S6). Such accurately assigned subtelomere 5 orthology together with our explicit chromosome partitioning allows us to treat each chromosomal 6 domain separately and to have in-depth examination of their respective evolutionary dynamics. 7 8 Contrasting patterns of structural rearrangements between the two species in 9 chromosomal cores 10 Structural rearrangements can be balanced (e.g. inversion, reciprocal translocation, and transposition) or 11 unbalanced (e.g. large novel insertion, deletion, and duplication) depending on whether the actual amount 12 of genetic material is affected (Feuk et al. 2006). We systematically identified both types of structural 13 rearrangements in the chromosomal cores of the 12 strains, with a focus on events in which protein -14 coding genes are involved. The identified structural rearrangements were further mapped back to the 15 strain phylogeny to reconstruct their evolutionary history. 16 We identified 35  paradoxus. The interchromosomal rearrangement found in the Malaysian S. cerevisiae UWOPS03-461. 4 7 is particularly striking, in which chrVII, chrVIII, chrX, chrXI, and chrXIII were completely reshuffled,  Finally, as previously observed in yeasts with larger divergence scales (Fischer et al. 2000;Kellis et al. 17 2003), the breakpoints of these balanced rearrangements are clearly associated with tRNAs and Tys, 18 highlighting the roles of these elements in triggering genome instability. 19 Considering unbalanced structural rearrangements, we identified eight novel insertions, 19 deletions, four 20 dispersed duplications and at least seven tandem duplications ( Figure 3A and Supplementary data 4). 21 There are another two cases (one dispersed duplication and one insertion/deletion) of which the 22 evolutionary history cannot be confidently determined due to potentially multiple independent origins or 23 secondary deletions (Supplementary data 4). Although these numbers can be slightly underestimated 24 given that we only count unambiguous cases in our analysis, our identified unbalanced structural 25 rearrangements clearly outnumbered those balanced ones, as recently found in Lachancea yeasts 26 We found 31 (41.3%) of these unique pairs to be shared between strains or even between species with 1 highly dynamic strain-sharing patterns, most (87.1%) of which cannot be explained by the strain 2 phylogeny ( Figure 5B and Supplementary data 6). This suggests a constant gain and loss process of 3 subtelomeric duplication throughout evolutionary history. 4 Given such highly dynamic nature of subtelomeric reshuffling, we investigated to what extent those 5 orthologous subtelomeres could reflect the intra-species phylogeny. We measured the proportion of 6 conserved orthologous subtelomeres in all strain pairs within the same species and performed hierarchical 7 clustering accordingly ( Figure 5C). While the clustering on S. paradoxus strains correctly recapitulated 8 their true phylogeny, our parallel analysis in S. cerevisiae turned out to be very noisy, with only the 9 relationship of the most recently diversified strain pair (DBVPG6044 vs. SK1) being correctly recovered. 10 The fact that the distantly related European/Wine (DBVPG6765) and Sake (Y12) S. cerevisiae strains 11 were clustered together indicates likely convergent subtelomere evolution during their respective 12 domestication. The proportion of conserved orthologous subtelomeres between S. cerevisiae strains 13 (56.3%-81.3%) is comparable to that between S. paradoxus strains (50.0%-81.3%), despite the much 14 smaller diversification timescales of S. cerevisiae. Therefore, the contrasting result between the two 15 species from our clustering analysis implies potentially more rapid subtelomeric reshuffling in S. 16 cerevisiae than in S. paradoxus during their respective diversifications. Indeed, we found a 4.3-fold rate 17 difference in subtelomere reshuffling between the two species (Wilcoxon rank sum test, p-value = 2.52E-18 4; see Methods for details about rate estimation) ( Figure 5D), which explains the substantial erosion of 19 true phylogenetic signals in the subtelomere evolution of S. cerevisiae. Interestingly, the most recently 20 diversified strain pairs in both species (DBVPG6044 vs. SK1 in S. cerevisiae and YPS138 vs. 21 UFRJ50816 in S. paradoxus) showed exceptionally high subtelomeric reshuffling rates compared with 22 the other strain pairs within the same species (the outliers in Figure 5D), suggesting possible accelerated 23 subtelomere reshuffling in the incipient phase of strain diversification. The frequent reshuffling of 24 subtelomeric sequences can have drastic impacts on subtelomeric gene content both qualitatively and 25 quantitatively. For example, four genes (PAU3, ADH7, RDS1, and AAD3) were lost in the Sake S. 26 cerevisiae (Y12) due to a single chr08-L to chr03-R subtelomeric duplication event ( Figure S8). 1 Therefore, the more rapid subtelomere reshuffling in S. cerevisiae could have important functional 2 implications. 3 4 Non-canonical chromosome-end structures at native telomeres 5 S. cerevisiae chromosome-ends are characterized by two telomere associated sequences: the core X-and 6 Y'-elements (Louis 1995). The core X-element is present in almost all chromosome-ends, whereas the Y'-7 element is highly variable in terms of both presence/absence and copy numbers across different 8 chromosome-ends and strains. The typical structure of S. cerevisiae chromosome-ends can be 9 summarized into two general types: 1) with a single core X-element but no Y'-elements and 2) with a 10 single core X-element followed by one or several distal Y'-elements (Louis 1995). S. paradoxus 11 chromosome-ends also contain core X-and Y'-elements (Liti et al. 2009b), but their detailed structures 12 have not been systematically characterized in a genome-wide fashion due to the lack of complete 13 assemblies. Across our 12 strains, most (~85%) chromosome-ends have one of the two typical structures 14 previously characterized in S. cerevisiae but we also discovered several non-canonical structures that 15 have not been described before (Table S13). For example, we found several examples of tandem 16 duplications of the core X-element in both species. Such core X-element duplications are unlikely to be 17 assembly artifacts given that we also detected them in the S. cerevisiae reference genome (chrVIII-L and 18 chrXVI-R) with degenerated proximal copies. In most cases, the proximal duplicated copies of the core-19 X element were degenerated but we also found two examples where intact duplicated copies were 20 retained: the chrXII-R end in the Sake S. cerevisiae Y12 and the chrIII-L end in the European S. 21 paradoxus CBS432. The latter case is especially striking, where six copies (including three complete 22 ones) of the core X-element were tandemly arranged. Even more surprisingly, we discovered five 23 chromosome-ends consisting of only Y'-element (one or more copies) but no core X-element, despite the 24 importance of the core X-element in maintaining genome stability (Marvin et al. 2009a(Marvin et al. , 2009b. For 25 example, the chrV-L ends in S. cerevisiae DBVPG6044 and SK1 have one and three Y'-elements 1 respectively without any trace of the core X-elements. The discoveries of these non-canonical 2 chromosome-end structures offer a new paradigm to investigate the functional role of the core X-3 elements. The mitochondrial genome constitutes a natural genetic compartment that is replicated and transmitted 8 independently from the nuclear genome. Despite its pivotal evolutionary and functional importance, 9 sequencing and assembling the yeast mitochondrial genome has always been challenging due to its highly 10 repetitive and AT-rich genome composition. Obtaining complete mitochondrial genome assemblies from 11 our long-read sequencing gave us a great opportunity to investigate the structural dynamics of the 12 mitochondrial genome with high resolution and accuracy. We found a high degree of collinearity in S. 13 cerevisiae mitochondrial genomes for all pairwise comparisons, even between the most distantly related 14 strains (e.g. DBVPG6044 vs. S288c) ( Figure 6A). In contrast, the S. paradoxus mitochondrial genomes 15 show lineages-specific structural rearrangements in several strains. The two Eurasian strains (CBS432 16 and N44) share a transposition of the entire COX3-rnpB-rns segment, in which rns was further inverted 17 either before or after the transposition ( Figure 6B-D). Independent tandem duplications of the OLI1-18 VAR1 segment and rnpB were found in the South American (UFRJ50816) and Hawaiian (UWOPS91-19 917.1) S. paradoxus respectively ( Figure 6E and Table S9). In addition, it seems that the COB gene was 20 recently transposed to its current mitochondrial genomic position in S. cerevisiae and S. paradoxus prior 21 to the divergence of the two species given the gene orders in the two outgroups. 22 The phylogenetic tree inferred from the six one-to-one mitochondrial orthologs (ATP6, ATP8, COB, 23 COX1, COX2 and COX3) deviates from the phylogeny based on nuclear genes, although the relationships 24 between those most closely related strains (e.g. DBVPG6044 vs. SK1 and DBVPG6765 vs. S288c in S. 25 cerevisiae as well as CBS432 vs. N44 in S. paradoxus) are the same. The low topology consensus across 1 different gene loci (normalized quartet score = 0.594) suggests a highly heterogeneous phylogenetic 2 history of mitochondrial genes across different loci. Together with the drastically dynamic 3 presence/absence pattern of mitochondrial group I and group II introns (Table S10), this supports the idea 4 of extensive cross-strain recombination in yeast mitochondrial evolution (Wu et al. 2015a). According to 5 this mitochondrial phylogeny, the Eurasian S. paradoxus lineage (CBS432 and N44) was clustered 6 together with the seven S. cerevisiae strains before joining with the other S. paradoxus strains, which 7 reinforces the argument for mitochondrial introgression from S. cerevisiae to the Eurasian S. paradoxus 8 lineage (Wu and Hao 2015) ( Figure 6E). In addition, we noticed that the COX3 gene in the South 9 American S. paradoxus UFRJ50816 started with GTG rather than the typical ATG start codon in our 10 assembly, which was further confirmed by the Illumina reads. This suggests either an adoption of an 11 alternative nearby ATG start codon (e.g. the one 45 bp downstream) or a rare case of near-cognate start Fully resolved structural rearrangements illuminate complex phenotypic traits 16 Structural rearrangements are expected to account for a substantial fraction of phenotypic variation but 17 the lack of complete assemblies have prevented a deep understanding of structural variation-phenotype 18 associations. Here we used the CUP1 locus and ARR cluster as case studies to illustrate how the fully 19 resolved structural rearrangements based on complete assemblies can illuminate complex phenotypic 20 traits. The CUP1 gene encodes a copper scavenging short metallothionein that keeps the intracellular 21 level of free copper extremely low and mediates copper tolerance. Across our 12 strains, this gene was 22 tandemly amplified into four, seven, and 11 copies in three S. cerevisiae strains (DBVPG6765, Y12 and 23 S288c respectively) while maintaining the ancestral single copy configuration in all the other strains 24  Figure 7B). In general, higher copy number of CUP1 translates into faster growth (i.e. shorter generation 2 time) in copper, although Y12 with seven copies appears to grow slightly faster than S288c with 11 3 copies (including one pseudogene copy), which could be explained by differences in the amplification 4 segment and/or the overall genetic backgrounds. The ARR cluster contains three consecutive subtelomeric 5 genes (ARR1, ARR2 and ARR3) that function collectively to provide arsenic resistance. Despite their 6 tricky genomic locations (only a few kb from the core-X element), we successfully characterized the 7 exact genomic arrangement of the ARR cluster in all the 12 strains ( Figure 7C). Consistent with our 8 previous estimates based on read mapping coverage (Bergström et al. 2014), the ARR cluster was 9 duplicated in the European S. paradoxus CBS432 while completely lost in two S. cerevisiae (SK1 and 10 UWOPS03-461.4) and two S. paradoxus (N44 and UWOPS91-917.1) strains ( Figure 7C). Our growth 11 rate assay confirmed the link between ARR cluster loss and extreme susceptibility to arsenic (3 mM 12 arsenite, As[III]) ( Figure 7D). Despite having two copies of the ARR cluster, CBS432 grew poorly in 13 arsenic. Since no strongly deleterious mutation was detected in either gene or copy, the arsenic sensitivity 14 of CBS432 should derive from its genetic background. The As[III] sensitivity of the South American S. 15 paradoxus UFRJ50816 could potentially be explained by the pseudogenization of its ARR2, although it is 16 only known to protect against pentavalent arsenic, As[V] (Mukhopadhyay et al. 2000). The genes 17 immediately proximal to the ARR cluster are located at the chr16-R subtelomere in all the 12 strains, 18 implying that this subtelomere should be the ancestral location for the ARR cluster. In the West African S. 19 cerevisiae DBVPG6044, ARR became relocated to the chr03-R subtelomere. In the European S. 20 paradoxus CBS432, the two duplicated ARR clusters were redistributed to the chr02-R and chr07-R 21 subtelomeres respectively. In the two American S. paradoxus (YPS138 and UFRJ50816), the ARR cluster 22 was shifted to the ch13-L subtelomere with an inverted orientation. 23 To quantify how different genomic arrangements of the ARR cluster can affect fitness in arsenic (As[III]), 24 we performed quantitative trait locus (QTL) mapping using 826 phased outbred lines (POLs) derived 25 from an advanced intercross of the North American (YPS128) and West African (DBVPG6044) S. 26 cerevisiae strains (see Methods for details). The linkage analysis accurately mapped a large-effect QTL at 1 the chr03-R subtelomere (the location of ARR in DBVPG6044), but no contribution to arsenic variation 2 from the YPS128 ARR on the chr16-R subtelomere ( Figure 7E). This profile is consistent with the 3 relocation of an active ARR cluster to the chr03-R subtelomere in DBVPG6044 and the presence of The landscape of genetic variation is shaped by multiple evolutionary processes, including mutation, 10 drift, recombination, gene flow, natural selection and demographic history. The combined effect of these 11 different factors can vary considerably both across the genome and between species, resulting in different 12 patterns of evolutionary dynamics. The complete genome assemblies that we generated for multiple 13 representative strains from both domesticated and wild yeasts provide a valuable dataset to explore such 14 patterns with unprecedented resolution. 15 Considering the dynamics across the genome, it has been observed in many organisms (e.g. fruitfly 16  2006). All currently identified S. paradoxus strains were isolated from natural habitats with no evident 25 human interference, which provides an ideal control relative to S. cerevisiae to examine the influence of 26 human activities in shaping genome evolution. Here, we summarized the major differences in the 1 evolutionary dynamics of these two species during their respective diversification ( Figure 8). In nuclear 2 chromosomal cores, S. cerevisiae strains show much lower rate of accumulating balanced structural 3 rearrangements compared with S. paradoxus strains. This pattern is likely explained by the mixture and 4 crossbreeding between different S. cerevisiae subpopulations during their recent association with human 5 activities, which would considerably impede the fixation of balanced structural rearrangements in 6 different subpopulations. In contrast, the geographical isolation of different S. paradoxus subpopulations 7 would be favored for the fixation of balanced rearrangements in different subpopulations (Leducq et al. 8 2016). As for unbalanced rearrangements in chromosomal cores, we observed an opposite pattern, in 9 which the S. cerevisiae strains exhibit higher rate of accumulating such changes than their S. paradoxus 10 counterparts. A strong association was further found between genes affected by unbalanced 11 rearrangements and the cellular adaptation to metal ion stress, which collectively indicates a significant 12 role of selection in shaping this pattern. Compared with S. paradoxus, S. cerevisiae strains are evolving 13 under a much wider spectrum of selection regimes and thus will likely favor accumulating more adaptive 14 changes in strains from different ecological niches. Consistent with this notion, the more rapid 15 interchromosomal reshuffling in S. cerevisiae than in S. paradoxus is probably also a consequence of 16 selection given the functional importance of subtelomeric genes in promoting adaptation. In both core 17 and subtelomeres, we observed consistently higher rates of molecular evolution and CNV accumulation 18 in S. cerevisiae strains, which provide further supports to this argument ( Figure S5). In addition, we 19 found that the mitochondrial genomes of the S. cerevisiae strains maintained high degrees of collinearity, 20 whereas those of the S. paradoxus strains showed lineage-specific structural rearrangements and 21 introgression, suggesting distinct mitochondrial evolution between the two species. Taken together, many 22 of these observed differences between S. cerevisiae and S. paradoxus reveal the influence of human 23 activities on structural genome evolution. lineages. Our strain sampling also includes the reference strains for S. cerevisiae (S288c) and S. 5 paradoxus (CBS432) as well as another popular S. cerevisiae lab strain (SK1), which were also used for 6 quality control of our sequencing, assembly, annotation and downstream analysis. 7 All the strains were taken from our strain collection stored at -80°C and cultured on YPD plates. Single 8 colony for each strain was picked and cultured in 5 mL YPD liquid at 30°C 220 rpm overnight. The DNA 9 extraction was carried out using the MasterPure™ Yeast DNA Purification Kit (Epicentre, WI, USA) 10 following the manufacturer's protocol. technology. The raw PacBio reads were generated by the PacBio RS II platform with the P6-C4 16 chemistry and were processed and assembled using the standard SMRT analysis pipeline (v2.3.0). The de 17 novo assembly was carried out following the standard hierarchical genome-assembly process (HGAP) 18 assembly protocol with Quiver polishing (Chin et al. 2013). 19 20 Assembly evaluation and manual refinement 21 We retrieved the reference genome assemblies (including both the nuclear and mitochondrial genome) of 22 S. cerevisiae (The Saccharomyces Genome Database (SGD) version of strain S288c) from SGD 23 (http://downloads.yeastgenome.org/sequence/S288C_reference/) (version R64-1-1). We obtained the 1 reference nuclear genome assembly of S. paradoxus (strain CBS432) from the Saccharomyces Genome 2

Resequencing
Project (SGRP) data depository 3 (ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/misc/para2/ref/genome.fa) and its mitochondrial 4 genome assembly from NCBI Genbank (accession number: JQ862335). chromosome. For the cases where assembly gaps occurred in the middle of chromosomes, we performed 10 manual gap closing by referring to the assemblies that we generated in the pilot phase of this project. The 11 only gap that we were unable to close is the highly repetitive rDNA array (usually consisting 100-200 12 tandem copies) on chrXII. The S. cerevisiae reference genome used a 17,357 bp sequence of two 13 tandemly arranged rDNA copies to represent this complex region. For our assemblies, we trimmed off the 14 partially assembled rDNAs at this gap and re-linked the two contigs with 17,357 bp Ns to keep 15

consistency. 16
The mitochondrial genomes of the 12 strains were recovered by single contigs in the raw HGAP 17 assemblies. We further circularized them and reset their starting position as the ATP6 gene using The percent coverage of these annotation features by different assemblies was summarized accordingly.    As for mitochondrial genomes, we performed the annotation by using MFannot 20 (v17.03.2015) was subsequently used to process the RepeatMasker output with options "-g -k <clustalw> 5 -f <fuzzy_file1> -d 10000 -t " for Ty defragmentation. We defined the fuzzy file to treat Ty1-LTR and 6 Ty2-LTR equivalently in the defragmentation process due to their high sequence identity and frequent 7 recombination. The identified full-length Ty1 and Ty2 were manually curated based on their sequence 8 alignment. We performed this Ty annotation in a two-pass manner, in which the internal sequences and 9 LTRs of the representative full-length S. paradoxus Tys annotated in the first pass were further added 10 into our Ty library before initiating the second pass. All the truncated Tys and soloLTRs were further 11 curated based on the blastn search against our Ty library (cutoffs: identity >= 70%, aln_length >= 100 12 bp). 13

14
Annotation of the core X-elements 15 We retrieved core X-element sequences for the S. cerevisiae reference genome according to the 16 annotation from SGD and aligned them using MUSCLE (v3.8.31) (Edgar 2004). Based on the alignment, 17 we built an HMM profile for the core X-element using the hmmbuild program (option: --dna) from the 18 HMMER package (v3.1b2) (Eddy 1998). This HMM profile was searched against our PacBio assemblies 19 by nhmer (from the same package) (options: -E 1e-3 --tblout) to identify the core X-element. 20 21 Annotation of the Y'-elements 22 We retrieved the Y'-element sequences of the S. cerevisiae reference genome based on the feature 23 annotation from SGD. There are two major classes of Y'-element for S. cerevisiae, the short version and 24 the long version, differed by several large indels (Louis and Haber 1992). We aligned all the retrieved Y'-1 element sequences by MUSCLE (v3.8.31) (Edgar 2004). Based on this alignment, we selected the chrIX-2 L Y'-element as the representative query for our search. The search was performed by BLAT ( Kent 3 2002) (option: -maxIntron=1000) with subsequent filtering by pslCDnaFilter (options: -minId=0.9 -4 minAlnSize=1000 -bestOverlap -filterWeirdOverlapped). Relative rate test 13 To test the rate heterogeneity between S. cerevisiae and S. paradoxus in molecular evolution, we 14 Conserved synteny blocks identification 6 We used the SynChro program from the CHROnicle package (version: January 2015) (Drillon et al. 7 2013(Drillon et al. 7 , 2014 to identify conserved synteny blocks. We prepared the input files for SynChro with custom 8 Perl scripts to provide information about various annotated features (centromere, protein-coding genes, 9 tRNAs, and Tys) together with the genome assembly and proteome sequences. SynChro subsequently 10 performed all possible pairwise comparisons to identify synteny blocks shared in the given strain pair. 11 Multiple plots were also generated by SynChro for easy visualization of the identified synteny blocks. ends. However, this definition seems arbitrary in a sense that it treats all subtelomeres indiscriminately. 17 In this study, we defined yeast subtelomeres based on the change of gene synteny conservation profile 18 across the 12 strains. For each chromosome arm, we examined the syntenic blocks shared across all the 19 12 strains and used the most distal syntenic blocks to define the distal boundary for the chromosomal 20 cores (Table S11). In parallel, we defined the proximal boundary of the chromosome-end region for this 21 chromosome arm based on the first occurrence of yeast telomere associated sequences (i.e. core X-and 22 Y'-element). The region between these two boundaries was defined as the corresponding subtelomere 23 with 400 bp interstitial transition zones on both sides ( Figure S3). Each defined subtelomere was named 24 according to the ancestral chromosome identity of the core region that it attaches to. 1 2 Identification of balanced and unbalanced structural rearrangements 3 We identified balanced genome rearrangements (inversion, translocation, and transposition) by the 4 ReChro from the CHROnicle package (version: January 2015) (Drillon et al. 2013(Drillon et al. , 2014. We set the 5 synteny block stringency parameter delta=1 for the main analysis. A complementary run was performed 6 with delta=0 to identify single gene inversions. As for unbalanced genome rearrangements (large 7 insertion, deletion and duplication), we first generated whole genome alignment for every strain pair by rearrangements, their corresponding orthologs in the SGD reference gene set (when available) was used 2 to compile a non-redundant test gene set. We performed Fisher's exact test (Fisher 1922) to detect 3 significantly enriched GO terms of our test gene set relative to the genome-wide background. False 4 discovery rate (FDR) (cutoff: 0.05) (Benjamini and Hochberg 1995) was used for multiple correction. (CNV), we calculated the proportion of genes involved in CNV (i.e. those are not one-to-one orthologs) 12 in the two compared strains. We denoted this measurement as P CNV , a quantity analogous to the P-13 distance in sequence comparison. The Poisson distance correction was further applied to account for 14 multiple changes at the same gene loci. The Poisson corrected distance D CNV can be given as !"# = 15 −ln (1 − !"# ). Then the CNV accumulation rate (R CNV ) can be calculated as !"# = !"# /2 , in 16 which T is the diversification time of the two compared strains obtained from our molecular dating 17 analysis. The calculation values for dN/dS, CNV proportion, and CNV accumulation rate were further 18 summarized by "core genes" and "subtelomeric genes" based on our genome partitioning described 19 above. 20 21 Subtelomeric homology search 22 For each defined subtelomeric region, we hard masked all the Ty-related features (full-length Ty, 23 truncated Ty and Ty soloLTRs) involved and then used the masked sequence to search against all the 24 other subtelomeric regions to detect shared sequence homology. The search was performed by BLAT 1 (Kent 2002) (options: -noHead -stepSize=5 -repMatch=2253 -minIdentity=80 -t=dna -q=dna -2 mask=lower -qMask=lower). We further used pslCDnaFilter (options: -minId=0.9 -minAlnSize=1000 -3 bestOverlap -filterWeirdOverlapped) to filter out trivial signals and used pslScore to calculate sequence 4 alignment score for those filtered BLAT matches. Since the alignment scores for a given subtelomere pair 5 are not exactly symmetrical, we considered the average score between the two ordered pairs in such 6 cases. Such subtelomeric homology search was carried out for both within-strain and cross-strain 7 comparison and subtelomere pairs with strong homology (BLAT alignment score >= 5000 and sequence 8 identity >= 90%) were considered. 9

10
Hierarchical clustering analysis and reshuffling rate calculation for orthologous 11 subtelomeres 12 For all the strains within the same species, we performed pairwise comparisons of their subtelomeric 13 regions to identify conserved orthologous subtelomeres in any given strain pairs based on homology 14 search described above. For each strain pair, the proportion of conserved orthologous subtelomeres was 15 calculated as a measurement of the overall subtelomere conservation between the two compared strains. 16 Such measurements were converted into a distance matrix, based on which the hierarchical clustering 17 analysis was further performed by R (v3.1) (R Developement Core Team 2015). We measured the 18 reshuffling rate of orthologous subtelomeres (R reshuffling ) similarly to how we calculated the CNV 19 accumulation rate (R CNV ). For any given strain pair, we first calculated the proportion of the non-20 conserved orthologous subtelomeres in this strain pair as P reshuffling , then the subtelomeric reshuffling rate 21 R reshuffling can be calculated as !"#!!""#$%& = −ln (1 − !"#!!""#$%& )/2 , in which T is the diversification 22 time of the two compared strains.   In this illustrated example, we partitioned the left arm of chromosome 1 (chrI-L) into the core (light green), subtelomere (light yellow) and chromosome-end (pink) based on synteny conservation and yeast telomere associated sequences (the core X-and Y'-elements). The cladogram (left side) depicts the phylogenetic relationship of the 12 strains, while gene maps (right side) illustrate syntenic conservation in the core region with gene names within syntenic block underlined. Genes are colored in light orange, Ty-related features (all soloLTRs in the plotted region) in light purple, core X-elements, Y'-elements and TG 1-3 -repeats are shown in red, yellow and green respectively.  Dotplot for the chr15-R subtelomere comparison between the S. cerevisiae DBVPG6765 and S288c. The much longer DBVPG6765 chr15-R subtelomere is explained by a previously reported 65-kb eukaryote-to-eukaryote horizontal gene transfer (HGT) event. (C) Dotplot for the chr07-R subtelomere comparison between the S. paradoxus CBS432 and N44. The much longer chr07-R subtelomere in CBS432 is explained by a series of tandem duplications of the MAL31-like and MAL33-like genes and an addition of the ARR-containing segment from the ancestral chr16-R subtelomere. (D) Dotplot for the chr15-L subtelomere comparison between the S. paradoxus UFRJ50816 and YPS138. The much longer chr15-L subtelomere in UFRJ50816 is explained by the relocated subtelomeric segments from the ancestral chr10-L and chr03-R subtelomeres respectively. (E) Dotplot for the chr03-R subtelomere comparison between the S. paradoxus UFRJ50816 and YPS138 reveals an inversion occurred at the HMR locus in UFRJ50816. (F) Dotplot for the chr03-R subtelomere comparison between the S. paradoxus CBS432 and N44 reveals an inversion occurred at an MAL11-like locus in CBS432. Please note that the region coordinates for B-F are based on the extracted subtelomeric regions rather than the full chromosomes.         Table S13. Numbers in parenthesis denotes the copy number of X-or Y'-elements that the corresponding chromosome-end structure could have. Figure S5. Pronounced contrasts between cores and subtelomeres in evolutionary dynamics. Nonsynonymous to synonymous substitution rate ratios (dN/dS), proportions of genes involved in copy number variation (CNV), and accumulation rates of CNV in both cores and subtelomeres were shown in A-C. Three comparison scales were examined: within S. cerevisiae (Sc-Sc), within S. paradoxus (Sp-Sp) and between the two species (Sc-Sp). The y-axes for proportions of genes involved in CNV (B) and accumulation rates of CNV (C) are in log-10 scales. Figure S6. Naming rationale of the subtelomeric region based on its ancestral identity to account for large interchromosomal rearrangements occurred in the Malaysian S. cerevisiae UWOPS03-461.4 and the South American and Hawaiian S. paradoxus (UFRJ50816 and UWOPS91-917.1 respectively). In this example, the current chrXI (named based its centromere) of UWOPS03-461.4 contains material from both ancestral chr07 and chr11 due to a large interchromosomal rearrangement. We used grey blocks to denote the homologous relationship of different section of the current UWOPS03-461.4 chrXI relative to the ancestral chr07 and chr11. Figure S8. Gene loss due to a subtelomeric duplication event in the chr03-R subtelomere of the S. cerevisiae strain Y12. Four genes (PAU3, ADH7, RDS1, and AAD3) (highlighted in red) in the ancestral chr03-R (displayed here based on S288c) were lost in Y12 due to the subtelomeric duplication event from chr08-L subtelomere. Homologous regions with sequence identity >90% were highlighted in grey blocks. Table S1: Strain sampling for this study.