Protein innovation through template switching in the Saccharomyces cerevisiae lineage

Abraham, May; Hazkani-Covo, Einat

doi:10.1038/s41598-021-01736-y

Download PDF

Article
Open access
Published: 19 November 2021

Protein innovation through template switching in the Saccharomyces cerevisiae lineage

Scientific Reports volume 11, Article number: 22558 (2021) Cite this article

1051 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

DNA polymerase template switching between short, non-identical inverted repeats (IRs) is a genetic mechanism that leads to the homogenization of IR arms and to IR spacer inversion, which cause multinucleotide mutations (MNMs). It is unknown if and how template switching affects gene evolution. In this study, we performed a phylogenetic analysis to determine the effect of template switching between IR arms on coding DNA of Saccharomyces cerevisiae. To achieve this, perfect IRs that co-occurred with MNMs between a strain and its parental node were identified in S. cerevisiae strains. We determined that template switching introduced MNMs into 39 protein-coding genes through S. cerevisiae evolution, resulting in both arm homogenization and inversion of the IR spacer. These events in turn resulted in nonsynonymous substitutions and up to five neighboring amino acid replacements in a single gene. The study demonstrates that template switching is a powerful generator of multiple substitutions within codons. Additionally, some template switching events occurred more than once during S. cerevisiae evolution. Our findings suggest that template switching constitutes a general mutagenic mechanism that results in both nonsynonymous substitutions and parallel evolution, which are traditionally considered as evidence for positive selection, without the need for adaptive explanations.

Intragenomic rDNA variation - the product of concerted evolution, mutation, or something in between?

Article Open access 04 July 2023

Synthetic reversed sequences reveal default genomic states

Article Open access 06 March 2024

Deep mutational scanning reveals the molecular determinants of RNA polymerase-mediated adaptation and tradeoffs

Article Open access 09 October 2023

Introduction

Inverted repeats (IRs) are sequences with two copies of a DNA sequence in a reverse-complement orientation (e.g., 5′ATGTGxxxxCACAT 3′). IRs include internal symmetry, enabling them to switch between inter-strand and intra-strand base-pairing, resulting in non-canonical DNA structures such as cruciforms and hairpins. Long IRs lead to genome instability^1,2, either because they are processed to create a double-strand break or because they block the replication fork^1,2,3,4,5,6. The resulting genomic instability events are diverse, and include gene amplification^7,8,9, translocations^10,11,12,13, insertions¹⁴, and deletions². Despite the potential of IRs to destabilize genomes, short IRs have several functions in organisms throughout the tree of life, e.g., IRs found in promoters enable the binding of homodimer transcription factors¹⁵. IRs also have a functional role in the viral origin of replications¹⁶, the CRISPR immune system¹⁷, alternative termination of bacterial genes¹⁸, and immunoglobulin V(D)J rearrangement¹⁹.

A striking characteristic of IRs is their tendency to undergo homogenization, abolishing variation between the two arms of the IR. Numerous examples have been reported of long-IR^{20,21,22,23,24,25} and short-IR homogenization^26,27,28. From a mechanistic standpoint, DNA polymerase template switching can eliminate variations as short as a few bases between IR arms²⁸. This occurs when one arm of the IR serves as a template for the synthesis of the second arm. Template switching, first suggested by Ripley (1982) in bacteriophages, requires two hops of DNA polymerase between templates and can occur either intramolecularly or intermolecularly^{26,29,30,31,32,33}.

Since its identification, template switching has been reported throughout the tree of life^{33,34,35,36,37}. Short-IR homogenization via template switching is known to occur in numerous organisms^26,27,35,36 and is associated with mutational hotspots^27,34. The extent of template switching is affected by the directionality of the replication fork^38,39,40, the level of transcription⁴⁰, and the local sequence context⁴¹. There are several genotypes in which template switching is more common. For example, we have previously shown that template switching occurs in yeast strains lacking Rad27p, a key player in Okazaki fragment maturation⁴².

While the evolutionary consequences of homogenization of long IRs are well-studied^{21,43,44,45,46,47}, the impact of short IR homogenization on genome evolution has received little attention. Our previous evolutionary analysis of non-coding regions with short IRs in proteobacteria orthologs⁴⁸ indicated that these regions are more conserved than their immediate surrounding. This suggests that repeated template switching between IR arms is common during the evolution of proteobacteria. Template switching between IR arms was also recently shown to be abundant in humans^49,50.

To date there have only been a few reports of template switching in genes and these have usually been considered in their mutagenic context. Template switching was identified with the context of loss-of-function in T4 rII gene in T4⁵¹, E. coli rpsL⁵², and thyA³⁴ genes, as well as in the S. cerevisiae CYC1³⁵ gene. It was suggested that template switching contributes to the spectrum of mutations that affect the TP53 gene in human cancers³⁶. Template switching was also shown to be involved in certain mutagenic processes that lead to several genetic diseases, such as hereditary angioneurotic edema³⁷. In our previous analysis of rad27 mutants in S. cerevisiae, we identified nine template switching events in coding genes⁴². It is unknown if and how template switching affects genes during evolution. The present work studied the effect of template switching on coding regions from an evolutionary perspective.

Template switching between short IR arms can cause mutation clusters though arm homogenization⁵³. Multinucleotide mutations (MNMs), which comprise ~ 1% of single nucleotide polymorphisms in genomes^54,55,56,57, form one type of mutation cluster, in which mutations appear at adjacent sites. MNMs in codons can be the outcome of two entirely different scenarios: mutational mechanisms that simultaneously affect nearby nucleotides or multiple changes that occur via adaptive evolution. Ignoring the contribution of mechanisms that simultaneously affect nearby nucleotides in codons may lead to false identifications of positive selection^54,55,57. This is because positive selection tests determine nonsynonymous to synonymous ratios, while assuming an independency of mutations. Identifying the mechanisms that cause nonsynonymous replacements through MNMs is essential to the understanding of protein evolution. Template switching between short IR arms is a potential mechanism of MNM formation in genes.

Here, protein-coding genes of 50 closely related wild type Saccharomyces cerevisiae strains were analyzed to identify MNMs arising from template switching between IR arms. Such events were identified in 39 yeast proteins and were responsible for nonsynonymous substitutions and, thus, for amino acid replacements. While template switching primarily introduced single amino acid changes, events simultaneously affecting up to five nearby amino acids were also recorded. The presented results indicate that template switching is an important mechanism in protein evolution.

Results

IRs are associated with MNMs on IR arms in wild type yeast

To identify the effect of template switching between IR arms on coding genes, we sought out IRs associated with MNMs. To classify IRs as associated with MNMs, we first identified perfect IRs in a S. cerevisiae strains. Next, based on the reconstructed phylogenic tree, we identified MNMs that occurred between a strain and its parental node (see “Methods” section). Finally, we looked for cases of MNMs with coordinates overlapping IR arms that mapped to the terminal branch leading to the same strain with the IR (Fig. 1a). Identification of MNMs on the specific branch is based on ancestral sequence reconstruction. To increase the reliability of the analysis and avoid uncertainty resulting from ancestral sequence reconstruction, focus was placed on MNMs associated with IRs on external branches only. IRs with an arm length of ≥ 7 bp and a spacer ≤ 70 that are associated with MNMs were identified in 68 genes (Table 1).

Table 1 Number of genes that include IRs with an arm length of ≥ 7 bp and are associated with multinucleotide mutations.

Full size table

Next, we sought to determine whether the identified events differ from what could be expected when no special mechanism for IR homogenization exists. If no specific mechanism acts on IRs, then IR regions are expected to evolve under the same mutation-selection regime as non-IR regions in the gene. To test the significance of the association between IRs and MNMs in S. cerevisiae genes and whether they can be ascribed to template switching between IR arms, we simulated each of the gene multiple sequence alignments (MSAs) until reaching 100 simulations with a similar number of IRs in the real data. All evolutionary parameters used for the simulations mirrored those of the real gene MSA: tree topology, branch lengths, phylogeny model, and proportion of invariant positions. An IR score was computed for each gene and each simulation, and represented the enrichment of MNMs presumably formed by IR homogenization. To account for MNM variation not associated with IRs, the score was calculated by dividing the number of IRs associated with MNMs by the number of non-IRs associated with MNMs (see “Methods” section). An IR score was calculated for each of the real genes and its 100 simulations. The empirical distribution of IR scores in 100 simulations served as a null distribution to which the score of the real gene was compared. If the value of the real IR score fell within 5% of the values of the IR scores of its 100 simulations, a gene was considered to have more MNMs associated with IRs than its simulations. Genes with a statistically significant association between MNMs and IRs were considered to have undergone template switching.

Our simulation revealed that IRs were significantly associated with MNMs in 30 out of 68 yeast genes (Table 1, Supplementary Tables 1, 2). The longer the IR arm, the higher the fraction of genes with a significant real IR score. For IRs with an arm length of 7 bp, only 19% (8/42) of the genes had a higher IR score than their null distribution; while for IRs with an arm length of 8 bp, 69% of the genes (11/16) had a higher IR score than their null distribution. All IRs with arm lengths of 9–11 bp (10/10) had a higher IR score than randomly expected (Table 1). Due to the inability to simulate sufficient sequences with an IR arm of length of 16 bp, no statistical evaluation was performed for this event. However, since the association between IR arm length and recent MNMs was stronger for longer IR arms, the association of a MNM with IR arm length 16 bp is likely real. In conclusion, template switching between IR arms formed MNMs and modified protein-coding DNA during the evolution of 30 S. cerevisiae genes.

The gene UTP5 in BC187 S. cerevisiae strain has a perfect IR with an arm length of 11 bp, while at this locus S228c and other S. cerevisiae strains have an imperfect IR, with a continuous arm length of only 4 bp (Fig. 1b). Herein, we describe the template switching event that formed the perfect IR in BC187 from the ancestral form presenting in S288c (Fig. 1b–d). The process was comprised of two switches. The first switch moved the polymerase from its nascent template to the other IR arm, via either an intramolecular or intermolecular mechanism. In case of the intramolecular mechanism (Fig. 1b), the nascent strand folds upon itself using the arm base pairing. Thus the first arm is used as a template for replicating the second arm. In case of the intermolecular mechanism (Fig. 1c), the first switch is achieved when the nascent strand replicating one arm invades the template of the other sister chromatid. In both scenarios, the first switch is followed by a second switch, whereupon the polymerase returns to use the original strand as a template (Fig. 1d). The fork then resolves, leaving one daughter cell with an imperfect IR and one daughter cell with a perfect IR. The template switching in UTP5 resulted in six base substitutions and two amino acid replacements (T38V, S39H) in strain BC187 (Fig. 1e).

Out of the 30 genes with IRs significantly associated with MNMs, 17 occurred uniquely in a single strain. The additional 13 genes included IRs that appeared in more than one S. cerevisiae strain, of which six had multiple IRs associated with MNMs on terminal branches (Supplementary Table 1). For example, in 14 strains, the MSH4 gene had an IR with an arm length of 8 bp formed by an AA → TT MNM, which resulted in two amino acid changes, L394F and I395F (Fig. 2a). IR homogenization arose on three external branches of the S. cerevisiae phylogeny, leading to the strains YJM269, RedStar, and EC9-8. The ancestral sequence reconstruction revealed that one event had also occurred on one internal branch.

MNMs on IR arms formed by template switching caused nonsynonymous substitutions in 28 out of 30 genes. Twenty-five of these led to a single amino acid replacement, while two amino acid changes were observed in three genes. Thus, template switching is a source of parallel events and genetic innovation in proteins.

Template switching causes spacer inversions and parallel evolution

Template switching between IR arms not only homogenizes the arms, but can form inversion of IR spacers^58,59. This occurs when the first switch takes place through intermolecular template switching, during the replication of the first arm (Fig. 3). A search for IRs on S. cerevisiae strains carrying MNMs on the spacer, identified cases in which the MNM on the spacer arose from a complete spacer inversion. Ten IR spacer inversion events were identified within the coding sequence of wild type yeasts (Table 2, Supplementary Table 3). All events were statistically significant—none of the 100 simulations of the genes with spacer inversion showed inversions. Spacer inversions were observed only on IRs with arms longer than 9 bp. Inverted spacers ranged between 2 and 5 bp substitutions and resulted in up to two amino acid replacements.

Table 2 Genes with inverted repeats and spacer inversion.

Full size table

In nine out of the 10 genes with a spacer inversion, the inversion occurred in the case of previous homogenously perfect IR arms (only the spacer sequence changed). SYG1 has 4 bp spacer inversion (Fig. 3). In this gene, a perfect IR arm of 10 bp was observed both in the derived JAY291 strain and in the ancestral form represented by S288c. The derived JAY291 strain had an inversion on the spacer, forming a 4 bp MNM. First, an intermolecular template switch occurring during the synthesis of the left arm, caused the 4 bp spacer inversion (Fig. 3b). Next, the nascent strand switched back to the original template (Fig. 3c). This inversion resulted in a Y46G replacement in the JAY291 strain (Fig. 3d).

In only one event (REG2), did a spacer inversion occur together with conversion of an arm to form a perfect IR (Fig. 4). In this example, the ancestral form is represented by S288c, which has an IR with a 5 bp arm and a 5 bp spacer. This form evolved into a perfect IR with an arm length of 16 bp in the YJM339 strain. This event occurred through an intermolecular mechanism, similar to that presented in Fig. 3, and resulted in seven point mutations. Only two of these mutations were part of a continuous MNM, while the others were in a mutation cluster, 2 bp apart from each other (Fig. 4b, c). A total of five amino acid replacements were observed (Fig. 4d), the highest number of amino acid replacements resulting from template switching between IR arms that we identified in S. cerevisiae strains. Three amino acid replacements occurred on the arms (L208Q, D209G, P210R) and two occurred on the spacer (K205D, S206F).

In four of the genes, recent inversion of the spacer occurred independently in several strains, resulting in parallel evolution of this amino acid position. Spacer inversion occurred twice in AYR1, three times in ICT1 and SPO75, and five times in YBZ1. In the genes SPO75 and YBZ1, a flip inversion to two amino acid forms was observed.

The two-base IR spacer appearing in SPO75, where perfect IRs with 11 bp arms appear in all S. cerevisiae strains (Fig. 5a), displays two forms: either the ancestral form AA (red), or the derived form TT (blue) in positions 1 and 2 of the alternative codons AAA (K) and TTA (L), respectively. As a result, Spo75 has one of two forms: K or L on codon 409. Three transitions were identified on the terminal branches (two from L to K, and one from K to L), two of which are on highly supported branches. In addition, ancestral sequence reconstruction indicated that one K-to-L reversal event occurred on an internal branch with a high support, and another with a low support (Fig. 5b). We concluded that spacer inversion is an event that can change forms within a short evolutionary time, usually on the basis of a perfect IR.

Spacer length is a key player in the formation of DNA structure and thus influences the frequency of template switching^60,61. We examined whether spacer length of IRs associated with template switching differs from that of IRs that are not associated with template switching (Supplementary Fig. 1). The spacer length of IRs associated with MNMs with significant IR scores is shorter than that of IRs associated with MNMs with non-significant IR scores (one-tail Wilcoxon rank sum test p < 0.0011). Similarly, spacer length of IRs associated with MNMs with significant IR scores is also shorter than that of IRs with only single mutations on IR arms (one-tail Wilcoxon rank sum test p < 4.4171 × 10⁻⁴). Thus, spacers of IRs that undergo template switching to form MNMs are shorter than other IRs in coding genes.

The nonsynonymous substitutions resulted from template switching

MNMs associated with IRs can be located on a single codon or on two neighboring codons. Multiple differences in a single codon will change the amino acid (except for the rare case of serine). MNMs located on two codons will appear on the third position of the first codon and the first position of the second codon, and can result in a change between zero and two amino acids. Out of the 39 events associated with template switching, two resulted in synonymous substitutions only. In contrast, in 37 cases, at least one nonsynonymous substitution occurred (Fig. 6a). Out of these 37 events, in 24 genes, MNMs spanned a single codon and in 13 they spanned two codons. The number of transitions between the strains and their parental nodes in the template switching regions was higher than the number of transversions. The transition/transversion ratio was, however, smaller in template switching regions than in non-IR regions. As previously reported⁵⁷, these nonsynonymous substitutions are prone to misidentification as positive selection sites. Indeed, arm and spacer MNMs in 26 out of the 37 genes were inaccurately estimated to have undergone positive selection, 15 of which showed very strong support (Supplementary Table 1).

Next, we sought to determine whether the amino acid replacements have the potential to alter the structure or function of proteins. Figure 6b presents all the amino acid changes we observed, against the background of a Grantham's physicochemical distances table⁶². This table is based on amino acid properties such as composition, polarity, and molecular volume. While most amino acid changes displayed low Grantham physicochemical distances, representing similar amino acid properties, 13 amino acid replacements displayed high Grantham's physicochemical distances (above 120).

We then used PredictSNP⁶³, a classifier that combines several prediction tools, to identify the effect of mutations on protein structure and function. PredictSNP predicted that most amino acid changes had a neutral effect on protein function, while changes in six nonessential genes were considered non-neutral. These changes were L208Q in REG2, D98C in DLD3, G191K in RTT10, P50F in OM45, W127Y in RHO5, and G166Y in CCS1. The EggNOG database⁶⁴ was then used to determine whether these novel amino acids are represented in other species during Saccharomycotina evolution. While the divergence time of S. cerevisiae and S. paradoxus is 4.0–5.8 mya, the origin of the budding yeast subphylum Saccharomycotina is 317–523 mya⁶⁵. Five of the six genes had an orthologous group in EggNOG. In these genes, the alternative amino acid did not appear in Saccharomycotina (OM45, RHO5, CCS1, DLD3, and RTT10). In RTT10, the novel amino acid was, however, common when considering the entire Ascomycota phylum. Only two proteins of the genes RHO5 and DLD3 had an amino acid replacement in residues that show conservation throughout evolution. In conclusion, template switching introduced new nonsynonymous mutations into DNA coding genes during S. cerevisiae evolution. Most of the selected positions had physicochemical properties that were similar to those of their ancestral counterparts. In addition, amino acids with physicochemical properties distant from those of their ancestors, rarely occurred in conserved protein regions.

Discussion

Template switching events have been previously reported in the context of loss of function in genes^{34,35,36,37,51,52}. The question of how this affects normal gene evolution, however, has not been addressed. Given that template switching events involve multiple substitutions, our identification of template switching mediated by IRs in 39 wild type S. cerevisiae coding genes (~ 1% of the analyzed genes) was surprising. In most cases we identified in S. cerevisiae strains, template switching events yielded nonsynonymous substitutions. Most template switching events resulted in a single nonsynonymous substitution, although one extreme circumstance of five nearby amino acid replacements was also identified (Fig. 4). The influence of template switching on coding genes is probably even stronger than reported here, as we ignored cluster mutations that are not MNMs, indels, and MNMs occurring on internal branches.

This work focused on the effect that template switching between IR arms have on gene evolution. What we learned on the genetic mechanism of template switching was heavily affected by selection. We were, however, able to show that IRs involved in template switching have shorter spacers than other IRs, as previously reported^60,61 (Supplementary Fig. 1). In our evolutionary data our findings were also influenced by the selection for short spacers in intermolecular template switching with spacer inversion (Table 2). This is because long inversion in proteins are unlikely to be neutral. There was a preference for spacer inversion when long IR arms were involved (Table 2, Figs. 3, 4).

In contrast to previously published data from bacteria in which template switching favored transversion over transition⁶⁶ here we observed more transitions than transversions. The reason for this is probably due to selection over evolutionary time. Based on the structure of the genetic code, transitions are more likely than transversions to be synonymous, making transitions less often selected against than transversions^67,68. IRs are unique since homogenization can occur when the nascent DNA strand folds and is used as a template. It is also of interest to further study the effect of other forms of repeats on protein evolution. Our preliminary results suggest that direct repeats can also promote the formation of MNMs.

The discovery that template switching between IR arms results in multiple events through evolution and, hence, in cryptic hotspots for parallel evolution, is an important finding. We note that identifying multiple template switching events (Figs. 2, 5) relies on the accuracy of the S. cerevisiae tree and the assumption that all genes within the S. cerevisiae lineage have the same evolutionary history. The reconstructed concatenated tree had a high bootstrap support.

Mutational hotspots have been previously suggested to be associated with parallel evolution at the amino acid level. For example, an elevated mutation rate at CpGs was shown to underlie hemoglobin adaptation in high-altitude Andean house wrens⁶⁹. Template switching, however, was not previously considered a mechanism that contributes to parallel evolution in proteins. Here, we identified the reoccurrence of template switching over a short evolutionary period, yielding both arm homogenization and IR spacer inversion. Spacer inversions were reversible through sequential template switching events.

Codons with multiple changes between similar species can be the outcome of either a mechanism that simultaneously affects adjacent nucleotides, or of positive selection. Because substitutions are assumed to be independent, MNMs are sometimes considered evidence of multistep adaptive changes. Similarly, parallel evolution is usually considered evidence of adaptive selection. Such interpretations can be incorrect when the mutations occur together^54,55,57. Indeed, when tested, 70% of what we recognized as template switching events on arms were inaccurately estimated as positive selection (Supplementary Table 1). Thus, our results suggest that template switching is a general mutagenic mechanism that causes MNMs, as well as parallel evolution, eliminating the need for adaptive explanations. Identification of the complex mechanisms that cause MNMs, such as template switching, error-prone translesion DNA synthesis⁷⁰, gene conversion⁷¹, and probably other yet to be discovered processes, is essential in order to prevent overestimation of adaptive selection.

Although adaptive processes are not needed to explain MNMs caused by template switching and other complex mechanisms, this does not rule out the option that mutations formed by template switching can be a target for positive selection⁵⁴. Most nonsynonymous mutations are eliminated by purifying selection and those that are fixed are usually replaced by physicochemically similar ones. However, nonsynonymous substitutions are still the ones with a small chance of improving function. By causing multiple nonsynonymous substitutions, template switching can, therefore, enable hopping between adaptive peaks without the crossing of low-fitness valleys in the adaptive landscape^54,72,73. Based on the reported effect of synonymous mutations on RNA stability and protein translation efficiency⁷⁴, even a combination of synonymous and nonsynonymous substitutions (Fig. 6) can promote a similar outcome.

Methods

Data collection and IR detection

The sequence of 6569 orthologous sets of coding DNA and protein sequences from 50 wild type S. cerevisiae strains were downloaded from the Saccharomyces Genome Database⁷⁵. The Saccharomyces paradoxus ortholog of each gene was assigned based on the Fungal Orthogroups Repository⁷⁶. Sequences with more than 20 Ns and sequences found in fewer than four strains were removed from the orthologous sets. These screenings resulted in a total of 4304 genes. For each orthologous group, we searched for IRs with an arm length of at least 7 bp and a spacer of up to 70 bp, in each of the available strains, using the EMBOSS palindrome package⁷⁷. Each IR arm length was analyzed separately.

Phylogeny

Each of the 4304 orthologous sets was aligned using MAFFT V3.705⁷⁸ with default parameters. DNA MSAs were concatenated, and the best maximum likelihood (ML) tree was reconstructed by RAxML version 8.2.11⁷⁹ under the GTR replacement matrix⁸⁰, with among-site-rate-variation accounted for by assuming a discrete gamma distribution⁸¹ and with rapid bootstraps. This was the species tree used in this study.

Ancestral tree reconstruction

Each orthologous set was also aligned by codon alignment. The phylogenetic species tree was pruned and used together with each orthologous codon MSA to estimate branch length and reconstruct ancestral codon sequences using FASTML⁸² with the M5 codon model⁸³. In this step, orthologous sets with immature stop codon were eliminated, resulting in the 4252 orthologous sets that were used in this study.

Identifying IRs that overlap MNMs

An MNM was defined when two or more adjacent substitutions were observed between a S. cerevisiae strain and its immediate parent node, as determined from FASTML ancestral reconstruction output. Thus, neighboring mutations mapping to two different branches were not identified as MNMs⁵⁴. Insertions and deletions (indels) were not considered for MNM classification. IRs that mapped to a S. cerevisiae strain with overlapping MNMs in the terminal branch leading to this strain, were further analyzed. For each IR arm length, overlapping IRs on the same strain were excluded from the analysis. However, when an IR was fully nested in another IR, they were both analyzed.

Elimination of false MSAs

To ensure MSA accuracy and avoid false MNMs, we used GUIDANCE2⁸⁴ to score alignment regions with IRs in a codon model, using default parameters. We looked for IR regions with a cutoff higher than 0.95. In addition, codon MSA might not give the best MSA if insertion/deletions (indels) are not a multiple of three. We therefore used the similarity between nucleotide and codon comparisons in the IR region to identify high-quality MSAs. Codon and nucleotide pairwise alignments of the IR with a MNM and its sister taxa (plus a tail of 50 bp) were scored with a scoring matrix of match = 1, mismatch = − 1, and gap = 0. IRs were ignored if the difference between nucleotide and codon MSA scores was higher than 15, indicating a problematic codon MSA. Moreover, IR regions that included indels between the S. cerevisiae strain and their immediate parent node were ignored. Finally, to avoid mutation saturation, IR corresponding to branch lengths longer than 0.2 were ignored.

Simulation

To determine the significance of association between IRs and MNMs, sequences were simulated along rooted phylogenetic trees using INDELible⁸⁵ with the M5 model⁸³. Each orthologous group codon MSA was simulated according to the FASTML phylogenetic tree using the inferred M5 evolutionary model parameters (kappa: transition/transversion ratio, and omega: dN/dS ratio) and PAML codeml⁸⁶. In each simulation, the root was set to S. paradoxus. The sequence length for the INDELible simulation was set to four-five times the S. paradoxus sequence length in order to yield sufficient IRs with the exact arm length in each simulation. The length factor needed was selected based on the empirical evaluation of simulations. We continued the simulations until we obtained 100 MSAs in which the number of IRs in each was equal to or greater than the number of IRs in the real MSA. For each simulation in each orthologous set, the analysis performed for the real codon MSA was repeated. Each IR length was simulated separately.

Control sequences

To correct for MNM enrichment that was not associated with IRs in the real MSA compared to the simulation, we searched for IR-less alignment segments of the same length as the IR. A control sequence was matched for each IR arm of the same length, in a non-IR region with the closest proximity to the IR. Thus, the number of IRs and control regions for each orthologous group was the same. If identical IRs were identified in multiple strains, the control regions were also chosen in the same positions and the same strains. Exact control analysis was also carried out for all simulations of the orthologous group.

IR score

For each orthologous group and for each of its 100 simulations, the IR score was calculated as follows:

$$IR\;score = \frac{IRs\;with\;MNMs + 1}{{IRs\;without\;MNMs + 1}}{:}\frac{Controls\;with\;MNMs + 1}{{Controls\;without\;MNMs + 1}}$$

Namely, we calculated the ratio between the number of IRs with and without MNMs divided by the number of controls with and without MNMs. A pseudo count of 1 was added to all elements to prevent division by zero. The analysis was carried out separately for each IR arm length.

Estimation of positive selection

Positive selection was estimated by PAML codeml⁸⁶. MNMs that overlapped IRs were estimated with positive selection if Bayes empirical Bayes⁸⁷ posterior probability under the positive selection model was above 0.5. Strong support for positive selection was identified if posterior probability under the positive selection model was above 0.95.

Novelty of amino acid replacements

PredictSNP⁶³ classifier with default parameters was used to identify amino acid replacements with a potential affect of protein structure. Proteins with a predicted affect on structure were manually inspected in the EggNOG database of orthology relationships⁶⁴. For each protein we identified whether the replaced amino acid was represented in the MSA of Saccharomycotina⁶⁴.

References

Gordenin, D. A. et al. Inverted DNA repeats: A source of eukaryotic genomic instability. Mol. Cell Biol. 13, 5315–5322 (1993).
CAS PubMed PubMed Central Google Scholar
Lobachev, K. S. et al. Factors affecting inverted repeat stimulation of recombination and deletion in Saccharomyces cerevisiae. Genetics 148, 1507–1524 (1998).
Article CAS PubMed PubMed Central Google Scholar
Connelly, J. C. & Leach, D. R. The sbcC and sbcD genes of Escherichia coli encode a nuclease involved in palindrome inviability and genetic recombination. Genes Cells 1, 285–291 (1996).
Article CAS PubMed Google Scholar
Leach, D. R. Long DNA palindromes, cruciform structures, genetic instability and secondary structure repair. BioEssays 16, 893–900. https://doi.org/10.1002/bies.950161207 (1994).
Article CAS PubMed Google Scholar
Leach, D. R., Okely, E. A. & Pinder, D. J. Repair by recombination of DNA containing a palindromic sequence. Mol. Microbiol. 26, 597–606 (1997).
Article CAS PubMed Google Scholar
Lewis, S. M. Palindromy is eliminated through a structure-specific recombination process in rodent cells. Nucl. Acids Res. 27, 2521–2528 (1999).
Article CAS PubMed PubMed Central Google Scholar
Narayanan, V., Mieczkowski, P. A., Kim, H. M., Petes, T. D. & Lobachev, K. S. The pattern of gene amplification is determined by the chromosomal location of hairpin-capped breaks. Cell 125, 1283–1296. https://doi.org/10.1016/j.cell.2006.04.042 (2006).
Article CAS PubMed Google Scholar
Tanaka, H. & Yao, M. C. Palindromic gene amplification—An evolutionarily conserved role for DNA inverted repeats in the genome. Nat. Rev. Cancer 9, 216–224. https://doi.org/10.1038/nrc2591 (2009).
Article CAS PubMed Google Scholar
Tanaka, H. et al. Intrastrand annealing leads to the formation of a large DNA palindrome and determines the boundaries of genomic amplification in human cancer. Mol. Cell Biol. 27, 1993–2002. https://doi.org/10.1128/MCB.01313-06 (2007).
Article CAS PubMed PubMed Central Google Scholar
Zackai, E. H. & Emanuel, B. S. Site-specific reciprocal translocation, t(11;22) (q23;q11), in several unrelated families with 3:1 meiotic disjunction. Am. J. Med. Genet. 7, 507–521. https://doi.org/10.1002/ajmg.1320070412 (1980).
Article CAS PubMed Google Scholar
Mizuno, K., Miyabe, I., Schalbetter, S. A., Carr, A. M. & Murray, J. M. Recombination-restarted replication makes inverted chromosome fusions at inverted repeats. Nature 493, 246–249. https://doi.org/10.1038/nature11676 (2013).
Article ADS CAS PubMed Google Scholar
Kurahashi, H. et al. Palindrome-mediated chromosomal translocations in humans. DNA Repair (Amst.) 5, 1136–1145. https://doi.org/10.1016/j.dnarep.2006.05.035 (2006).
Article CAS Google Scholar
Bzymek, M. & Lovett, S. T. Instability of repetitive DNA sequences: The role of replication in multiple mechanisms. Proc. Natl. Acad. Sci. U. S. A. 98, 8319–8325. https://doi.org/10.1073/pnas.111008398 (2001).
Article ADS CAS PubMed PubMed Central Google Scholar
Montgomery, S. B. et al. The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes. Genome Res. 23, 749–761. https://doi.org/10.1101/gr.148718.112 (2013).
Article CAS PubMed PubMed Central Google Scholar
Weingarten-Gabbay, S. & Segal, E. The grammar of transcriptional regulation. Hum. Genet. 133, 701–711. https://doi.org/10.1007/s00439-013-1413-1 (2014).
Article CAS PubMed PubMed Central Google Scholar
Leung, M. Y., Choi, K. P., Xia, A. & Chen, L. H. Nonrandom clusters of palindromes in herpesvirus genomes. J. Comput. Biol. 12, 331–354. https://doi.org/10.1089/cmb.2005.12.331 (2005).
Article MathSciNet CAS PubMed Google Scholar
Horvath, P. & Barrangou, R. CRISPR/Cas, the immune system of bacteria and archaea. Science 327, 167–170. https://doi.org/10.1126/science.1179555 (2010).
Article ADS CAS PubMed Google Scholar
Li, X., Lindahl, L., Sha, Y. & Zengel, J. M. Analysis of the Bacillus subtilis S10 ribosomal protein gene cluster identifies two promoters that may be responsible for transcription of the entire 15-kilobase S10-spc-alpha cluster. J. Bacteriol. 179, 7046–7054 (1997).
Article CAS PubMed PubMed Central Google Scholar
Cuomo, C. A., Mundy, C. L. & Oettinger, M. A. DNA sequence and structure requirements for cleavage of V(D)J recombination signal sequences. Mol. Cell Biol. 16, 5683–5690 (1996).
Article CAS PubMed PubMed Central Google Scholar
Willis, K. K. & Klein, H. L. Intrachromosomal recombination in Saccharomyces cerevisiae: Reciprocal exchange in an inverted repeat and associated gene conversion. Genetics 117, 633–643 (1987).
Article CAS PubMed PubMed Central Google Scholar
Rozen, S. et al. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423, 873–876. https://doi.org/10.1038/nature01723 (2003).
Article ADS CAS PubMed Google Scholar
Kolodner, R. & Tewari, K. K. Inverted repeats in chloroplast DNA from higher plants. Proc. Natl. Acad. Sci. U. S. A. 76, 41–45. https://doi.org/10.1073/pnas.76.1.41 (1979).
Article ADS CAS PubMed PubMed Central Google Scholar
Rattray, A. J. & Symington, L. S. Use of a chromosomal inverted repeat to demonstrate that the RAD51 and RAD52 genes of Saccharomyces cerevisiae have different roles in mitotic recombination. Genetics 138, 587–595 (1994).
Article CAS PubMed PubMed Central Google Scholar
Waldman, A. S., Tran, H., Goldsmith, E. C. & Resnick, M. A. Long inverted repeats are an at-risk motif for recombination in mammalian cells. Genetics 153, 1873–1883 (1999).
Article CAS PubMed PubMed Central Google Scholar
Tran, H., Degtyareva, N., Gordenin, D. & Resnick, M. A. Altered replication and inverted repeats induce mismatch repair-independent recombination between highly diverged DNAs in yeast. Mol. Cell Biol. 17, 1027–1036. https://doi.org/10.1128/mcb.17.2.1027 (1997).
Article CAS PubMed PubMed Central Google Scholar
Ripley, L. S. Model for the participation of quasi-palindromic DNA sequences in frameshift mutation. Proc. Natl. Acad. Sci. U. S. A. 79, 4128–4132 (1982).
Article ADS CAS PubMed PubMed Central Google Scholar
Rosche, W. A., Ripley, L. S. & Sinden, R. R. Primer-template misalignments during leading strand DNA synthesis account for the most frequent spontaneous mutations in a quasipalindromic region in Escherichia coli. J. Mol. Biol. 284, 633–646. https://doi.org/10.1006/jmbi.1998.2193 (1998).
Article CAS PubMed Google Scholar
Lovett, S. T. Template-switching during replication fork repair in bacteria. DNA Repair (Amst.) 56, 118–128. https://doi.org/10.1016/j.dnarep.2017.06.014 (2017).
Article CAS Google Scholar
Strawbridge, E. M., Benson, G., Gelfand, Y. & Benham, C. J. The distribution of inverted repeat sequences in the Saccharomyces cerevisiae genome. Curr. Genet. 56, 321–340. https://doi.org/10.1007/s00294-010-0302-6 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lisnić, B., Svetec, I. K., Sarić, H., Nikolić, I. & Zgaga, Z. Palindrome content of the yeast Saccharomyces cerevisiae genome. Curr. Genet. 47, 289–297. https://doi.org/10.1007/s00294-005-0573-5 (2005).
Article CAS PubMed Google Scholar
van Noort, V., Worning, P., Ussery, D. W., Rosche, W. A. & Sinden, R. R. Strand misalignments lead to quasipalindrome correction. Trends Genet. 19, 365–369 (2003).
Article PubMed Google Scholar
Bzymek, M. & Lovett, S. T. Evidence for two mechanisms of palindrome-stimulated deletion in Escherichia coli: Single-strand annealing and replication slipped mispairing. Genetics 158, 527–540 (2001).
Article CAS PubMed PubMed Central Google Scholar
Rosche, W. A., Trinh, T. Q. & Sinden, R. R. Leading strand specific spontaneous mutation corrects a quasipalindrome by an intermolecular strand switch mechanism. J. Mol. Biol. 269, 176–187. https://doi.org/10.1006/jmbi.1997.1034 (1997).
Article CAS PubMed Google Scholar
Viswanathan, M., Lacirignola, J. J., Hurley, R. L. & Lovett, S. T. A novel mutational hotspot in a natural quasipalindrome in Escherichia coli. J. Mol. Biol. 302, 553–564. https://doi.org/10.1006/jmbi.2000.4088 (2000).
Article CAS PubMed Google Scholar
Hampsey, D. M., Ernst, J. F., Stewart, J. W. & Sherman, F. Multiple base-pair mutations in yeast. J. Mol. Biol. 201, 471–486. https://doi.org/10.1016/0022-2836(88)90629-8 (1988).
Article CAS PubMed Google Scholar
Greenblatt, M. S., Grollman, A. P. & Harris, C. C. Deletions and insertions in the p53 tumor suppressor gene in human cancers: Confirmation of the DNA polymerase slippage/misalignment model. Cancer Res. 56, 2130–2136 (1996).
CAS PubMed Google Scholar
Bissler, J. J. DNA inverted repeats and human disease. Front. Biosci. 3, d408-418 (1998).
Article CAS PubMed Google Scholar
Seier, T. et al. Insights into mutagenesis using Escherichia coli chromosomal lacZ strains that enable detection of a wide spectrum of mutational events. Genetics 188, 247–262. https://doi.org/10.1534/genetics.111.127746 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yoshiyama, K., Higuchi, K., Matsumura, H. & Maki, H. Directionality of DNA replication fork movement strongly affects the generation of spontaneous mutations in Escherichia coli. J. Mol. Biol. 307, 1195–1206. https://doi.org/10.1006/jmbi.2001.4557 (2001).
Article CAS PubMed Google Scholar
Kim, N., Cho, J. E., Li, Y. C. & Jinks-Robertson, S. RNA∶DNA hybrids initiate quasi-palindrome-associated mutations in highly transcribed yeast DNA. PLoS Genet. 9, e1003924. https://doi.org/10.1371/journal.pgen.1003924 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schultz, G. E. & Drake, J. W. Templated mutagenesis in bacteriophage T4 involving imperfect direct or indirect sequence repeats. Genetics 178, 661–673. https://doi.org/10.1534/genetics.107.083444 (2008).
Article CAS PubMed PubMed Central Google Scholar
Omer, S., Lavi, B., Mieczkowski, P. A., Covo, S. & Hazkani-Covo, E. Whole genome sequence analysis of mutations accumulated in. G3 (Bethesda) 7, 3775–3787. https://doi.org/10.1534/g3.117.300262 (2017).
Article CAS Google Scholar
Zhao, G., Chang, K. Y., Varley, K. & Stormo, G. D. Evidence for active maintenance of inverted repeat structures identified by a comparative genomic approach. PLoS ONE 2, e262. https://doi.org/10.1371/journal.pone.0000262 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Warburton, P. E., Giordano, J., Cheung, F., Gelfand, Y. & Benson, G. Inverted repeat structure of the human genome: The X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res. 14, 1861–1869. https://doi.org/10.1101/gr.2542904 (2004).
Article CAS PubMed PubMed Central Google Scholar
Batzer, M. A. & Deininger, P. L. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370–379. https://doi.org/10.1038/nrg798 (2002).
Article CAS PubMed Google Scholar
Aygun, N. Correlations between long inverted repeat (LIR) features, deletion size and distance from breakpoint in human gross gene deletions. Sci. Rep. 5, 8300. https://doi.org/10.1038/srep08300 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Cook, G. W. et al. Alu pair exclusions in the human genome. Mob. DNA 2, 10. https://doi.org/10.1186/1759-8753-2-10 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lavi, B., Levy Karin, E., Pupko, T. & Hazkani-Covo, E. The prevalence and evolutionary conservation of inverted repeats in proteobacteria. Genome Biol. Evol. 10, 918–927. https://doi.org/10.1093/gbe/evy044 (2018).
Article CAS PubMed PubMed Central Google Scholar
Löytynoja, A. & Goldman, N. Short template switch events explain mutation clusters in the human genome. Genome Res. 27, 1039–1049. https://doi.org/10.1101/gr.214973.116 (2017).
Article CAS PubMed PubMed Central Google Scholar
Walker, C. R., Scally, A., De Maio, N. & Goldman, N. Short-range template switching in great ape genomes explored using pair hidden Markov models. PLoS Genet. 17, e1009221. https://doi.org/10.1371/journal.pgen.1009221 (2021).
Article CAS PubMed PubMed Central Google Scholar
de Boer, J. G. & Ripley, L. S. Demonstration of the production of frameshift and base-substitution mutations by quasipalindromic DNA sequences. Proc. Natl. Acad. Sci. U. S. A. 81, 5528–5531. https://doi.org/10.1073/pnas.81.17.5528 (1984).
Article ADS PubMed PubMed Central Google Scholar
Mo, J. Y., Maki, H. & Sekiguchi, M. Mutational specificity of the dnaE173 mutator associated with a defect in the catalytic subunit of DNA polymerase III of Escherichia coli. J. Mol. Biol. 222, 925–936. https://doi.org/10.1016/0022-2836(91)90586-u (1991).
Article CAS PubMed Google Scholar
Chan, K. & Gordenin, D. A. Clusters of multiple mutations: Incidence and molecular mechanisms. Annu. Rev. Genet. 49, 243–267. https://doi.org/10.1146/annurev-genet-112414-054714 (2015).
Article CAS PubMed PubMed Central Google Scholar
Schrider, D. R., Hourmozdi, J. N. & Hahn, M. W. Pervasive multinucleotide mutational events in eukaryotes. Curr. Biol. 21, 1051–1054. https://doi.org/10.1016/j.cub.2011.05.013 (2011).
Article CAS PubMed PubMed Central Google Scholar
Besenbacher, S. et al. Multi-nucleotide de novo mutations in humans. PLoS Genet. 12, e1006315. https://doi.org/10.1371/journal.pgen.1006315 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hodgkinson, A. & Eyre-Walker, A. Human triallelic sites: Evidence for a new mutational mechanism?. Genetics 184, 233–241. https://doi.org/10.1534/genetics.109.110510 (2010).
Article CAS PubMed PubMed Central Google Scholar
Venkat, A., Hahn, M. W. & Thornton, J. W. Multinucleotide mutations cause false inferences of lineage-specific positive selection. Nat. Ecol. Evol. 2, 1280–1288. https://doi.org/10.1038/s41559-018-0584-5 (2018).
Article PubMed PubMed Central Google Scholar
Schofield, M. A., Agbunag, R. & Miller, J. H. DNA inversions between short inverted repeats in Escherichia coli. Genetics 132, 295–302 (1992).
Article CAS PubMed PubMed Central Google Scholar
Lovett, S. T. Encoded errors: Mutations and rearrangements mediated by misalignment at repetitive DNA sequences. Mol. Microbiol. 52, 1243–1253. https://doi.org/10.1111/j.1365-2958.2004.04076.x (2004).
Article CAS PubMed Google Scholar
Voineagu, I., Narayanan, V., Lobachev, K. S. & Mirkin, S. M. Replication stalling at unstable inverted repeats: Interplay between DNA hairpins and fork stabilizing proteins. Proc. Natl. Acad. Sci. U. S. A. 105, 9936–9941. https://doi.org/10.1073/pnas.0804510105 (2008).
Article ADS PubMed PubMed Central Google Scholar
Sinden, R. R., Zheng, G. X., Brankamp, R. G. & Allen, K. N. On the deletion of inverted repeated DNA in Escherichia coli: Effects of length, thermal stability, and cruciform formation in vivo. Genetics 129, 991–1005 (1991).
Article CAS PubMed PubMed Central Google Scholar
Grantham, R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864. https://doi.org/10.1126/science.185.4154.862 (1974).
Article ADS CAS PubMed Google Scholar
Bendl, J. et al. PredictSNP: Robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput. Biol. 10, e1003440. https://doi.org/10.1371/journal.pcbi.1003440 (2014).
Article CAS PubMed PubMed Central Google Scholar
Huerta-Cepas, J. et al. eggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucl. Acids Res. 47, D309–D314. https://doi.org/10.1093/nar/gky1085 (2019).
Article CAS PubMed Google Scholar
Shen, X. X. et al. Tempo and mode of genome evolution in the budding yeast subphylum. Cell 175, 1533.e1520-1545.e1520. https://doi.org/10.1016/j.cell.2018.10.023 (2018).
Article CAS Google Scholar
Dutra, B. E. & Lovett, S. T. Cis and trans-acting effects on a mutational hotspot involving a replication template switch. J. Mol. Biol. 356, 300–311. https://doi.org/10.1016/j.jmb.2005.11.071 (2006).
Article CAS PubMed Google Scholar
Zhang, J. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50, 56–68. https://doi.org/10.1007/s002399910007 (2000).
Article ADS CAS PubMed Google Scholar
Schrider, D. R., Houle, D., Lynch, M. & Hahn, M. W. Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster. Genetics 194, 937–954. https://doi.org/10.1534/genetics.113.151670 (2013).
Article CAS PubMed PubMed Central Google Scholar
Galen, S. C. et al. Contribution of a mutational hot spot to hemoglobin adaptation in high-altitude Andean house wrens. Proc. Natl. Acad. Sci. U. S. A. 112, 13958–13963. https://doi.org/10.1073/pnas.1507300112 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Harris, K. & Nielsen, R. Error-prone polymerase activity causes multinucleotide mutations in humans. Genome Res. 24, 1445–1454. https://doi.org/10.1101/gr.170696.113 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ji, X., Griffing, A. & Thorne, J. L. A phylogenetic approach finds abundant interlocus gene conversion in yeast. Mol. Biol. Evol. 33, 2469–2476. https://doi.org/10.1093/molbev/msw114 (2016).
Article CAS PubMed PubMed Central Google Scholar
Weinreich, D. M., Delaney, N. F., Depristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114. https://doi.org/10.1126/science.1123539 (2006).
Article ADS CAS PubMed Google Scholar
Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proc. Sixth Int. Congr. Genet. 1, 356–366 (1932).
Google Scholar
Sauna, Z. E. & Kimchi-Sarfaty, C. Understanding the contribution of synonymous mutations to human disease. Nat. Rev. Genet. 12, 683–691. https://doi.org/10.1038/nrg3051 (2011).
Article CAS PubMed Google Scholar
Cherry, J. M. et al. Saccharomyces genome database: The genomics resource of budding yeast. Nucl. Acids Res. 40, D700–D705. https://doi.org/10.1093/nar/gkr1029 (2012).
Article CAS PubMed Google Scholar
Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61. https://doi.org/10.1038/nature06107 (2007).
Article ADS CAS PubMed Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000).
Article CAS PubMed Google Scholar
Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64. https://doi.org/10.1007/978-1-59745-251-9_3 (2009).
Article CAS PubMed Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313. https://doi.org/10.1093/bioinformatics/btu033 (2014).
Article CAS PubMed PubMed Central Google Scholar
Le, S. Q. & Gascuel, O. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320. https://doi.org/10.1093/molbev/msn067 (2008).
Article CAS Google Scholar
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314. https://doi.org/10.1007/BF00160154 (1994).
Article ADS CAS PubMed Google Scholar
Ashkenazy, H. et al. FastML: A web server for probabilistic reconstruction of ancestral sequences. Nucl. Acids Res. 40, W580–W584. https://doi.org/10.1093/nar/gks498 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A. M. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449 (2000).
Article CAS PubMed PubMed Central Google Scholar
Sela, I., Ashkenazy, H., Katoh, K. & Pupko, T. GUIDANCE2: Accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucl. Acids Res. 43, W7–W14. https://doi.org/10.1093/nar/gkv318 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fletcher, W. & Yang, Z. INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26, 1879–1888. https://doi.org/10.1093/molbev/msp098 (2009).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. https://doi.org/10.1093/molbev/msm088 (2007).
Article CAS PubMed Google Scholar
Yang, Z., Wong, W. S. & Nielsen, R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 22, 1107–1118. https://doi.org/10.1093/molbev/msi097 (2005).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Tal Pupko and Haim Ashkenazy for discussions. E.H-C. is supported by the Israel Science Foundation Grant 605/20 and by the Open University of Israel Research Fund.

Author information

Authors and Affiliations

Department of Natural and Life Sciences, The Open University of Israel, Ra’anana, Israel
May Abraham & Einat Hazkani-Covo

Authors

May Abraham
View author publications
You can also search for this author in PubMed Google Scholar
Einat Hazkani-Covo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.H.-C. conceived the study. E.H.-C. and M.A. designed the analysis. M.A. performed analysis. E.H.-C. finalized the manuscript. E.H.-C. managed the project and recruited the funding. All authors reviewed the manuscript.

Corresponding author

Correspondence to Einat Hazkani-Covo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Abraham, M., Hazkani-Covo, E. Protein innovation through template switching in the Saccharomyces cerevisiae lineage. Sci Rep 11, 22558 (2021). https://doi.org/10.1038/s41598-021-01736-y

Download citation

Received: 20 June 2021
Accepted: 27 October 2021
Published: 19 November 2021
DOI: https://doi.org/10.1038/s41598-021-01736-y

This article is cited by

Targeted accurate RNA consensus sequencing (tARC-seq) reveals mechanisms of replication error affecting SARS-CoV-2 divergence
- Catherine C. Bradley
- Chen Wang
- Christophe Herman
Nature Microbiology (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.