A central question in genetics and evolution is the extent to which the outcomes of mutations change depending on the genetic context in which they occur1,2,3. Pairwise interactions between mutations have been systematically mapped within4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 and between19 genes, and have been shown to contribute substantially to phenotypic variation among individuals20. However, the extent to which genetic interactions themselves are stable or dynamic across genotypes is unclear21, 22. Here we quantify more than 45,000 genetic interactions between the same 87 pairs of mutations across more than 500 closely related genotypes of a yeast tRNA. Notably, all pairs of mutations interacted in at least 9% of genetic backgrounds and all pairs switched from interacting positively to interacting negatively in different genotypes (false discovery rate < 0.1). Higher-order interactions are also abundant and dynamic across genotypes. The epistasis in this tRNA means that all individual mutations switch from detrimental to beneficial, even in closely related genotypes. As a consequence, accurate genetic prediction requires mutation effects to be measured across different genetic backgrounds and the use of higher-order epistatic terms.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank J. Schmiedel for statistical guidance. This work was supported by a European Research Council Consolidator grant (616434), the Spanish Ministry of Economy and Competitiveness (BFU2011-26206 and SEV-2012-0208), the AXA Research Fund, the Bettencourt Schueller Foundation, Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR), the EMBL-CRG Systems Biology Program, and the CERCA Program/Generalitat de Catalunya. Deep sequencing was performed in the EMBL Heidelberg GeneCore Genomics Core Facility.Reviewer information
Nature thanks Z. Blount, D. Marks, A. Wagner and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Maximum growth rate (measured in a plate reader using spectrophotometry) of tRNA-Arg(CCU) (HSX1) deletion strain carrying either an empty plasmid (red) or a single-copy plasmid expressing wild-type tRNA-Arg(CCU) (blue) at high temperature, high salt, and high temperature with high salt (n = 3 independent colonies from the plasmid transformation). b, Distribution of number of mutations per genotype in the library relative to the sequence of the tRNA from each species. c, Genotype network of the 4,176 tRNA-Arg(CCU) variants. Each node is one genotype. Colour indicates the ln(fitness) relative to S. cerevisiae. Edges connect genotypes differing by a single substitution, acquisition of a U2C mutation is highlighted in yellow as example. Genotypes are arranged in concentric circles according to the total number of substitutions (one to ten) from the S. cerevisiae tRNA, which is the central node. Highlighted nodes indicate the genotypes of the seven extant species. d, Table showing the possible number of mutation combinations from order one to eight, with or without a complete genotype space (whether all intermediate genotypes are measured in the library or not) when using S. cerevisiae as a reference or any other background (the effect of a given combination of mutations can be measured from at least one genetic background). The total number of unique backgrounds is also indicated, together with the minimum, median and maximum number of backgrounds in which these mutations can be found.
a, Single mutations (columns) have effects that differ significantly between genetic backgrounds from different species (rows). Paired two-sided t-test between fitness effects of mutations of tRNAs from different species (145 tests of n = 6). Significant fitness effects differences (FDR < 0.1) shown in blue (positive) or red (negative), non-significant differences (FDR ≥ 0.1) coloured in white. Mutations that were not shared are coloured in grey (that is, a substitution that would result in a mutation in one species but is part of the wild-type background in another). Bar plots show the percentage (absolute numbers on top) of species comparisons or shared mutations between species in which the effect of the mutation significantly changes in magnitude (light grey) or switches sign (dark grey). b, Proportion of genetic backgrounds in which each mutation has a beneficial (blue) or detrimental (red) fitness effect at different FDRs for backgrounds with −0.3 < ln(fitness) < −0.15 (left), backgrounds with −0.15 < ln(fitness) < 0.15 (middle left), genotypes with no more than four mutations from the S. cerevisiae sequence (middle right) and genotypes with average input read counts of more than 100 (right). q values were obtained after adjusting for FDR across the total number of single mutations with unique background after filtering (n = 10,746, 6,129, 3,568, 6,338 tests respectively). c, Fitness effect of single mutations plotted against the ln(fitness) of the backgrounds in which the mutation are made; for all genetic backgrounds (left), backgrounds with −0.3 < ln(fitness) < −0.15 (middle) and backgrounds with −0.15 < ln(fitness) < 0.15 (right).
a, Comparison of epistasis scores for species pairs not shown in Fig. 3c. Pairs of species that share less than three mutations are not shown. b, Decline of correlation between epistasis scores and Hamming distance between the tRNA genotypes from different species (inset). The left plot shows how this negative correlation holds when restricting the minimum number of shared pairs of mutations between the two species to compute the correlation.
Extended Data Fig. 4 Changes in pairwise epistasis between mutations across the seven extant species.
a, Comparison of pairwise epistasis (rows) between different species (columns) (1,000 paired two-sided t-tests of n = 6). Differences in epistasis are only shown for comparisons with FDR < 0.1 in orange or green for positive or negative differences respectively. Comparisons with FDR ≥ 0.1 are coloured in white. Pairs of mutations that are not shared between species are coloured in grey. Bar plots show the percentage of species comparisons (right) or shared pairs of mutations between species (top) that significantly change (light grey) or switch (dark grey). b, Interaction networks of four extant species not shown in Fig. 3b. Colours indicate epistasis sign (orange for positive, green for negative and grey for not significant at FDR < 0.1) and edge width indicates epistasis magnitude.
a, Epistasis scores between pairs of mutations plotted against the ln(fitness) of the genetic background. Scatter plots are divided into double mutants that restore WCBPs (left, n = 1,883), other double mutants in which both mutation are in facing base pair positions (middle left, n = 1,739), in base pair positions but not facing each other (middle right, n = 28,622), and the rest (right, n = 17,144). b, Proportion of genetic backgrounds in which each pair of mutations interacts with positive (orange) or negative (green) epistasis at different FDRs restricted to genetic backgrounds with −0.3 < fitness < −0.15 (top), with −0.15 < fitness <0.15 (top middle), with additive expected fitness outcome greater than−0.2 and less than 0.1 (middle bottom) or when excluding all genotypes with average input counts less than 100 (bottom). 23,128, 23,652, 29,628 and 15,306 one sample two-sided t-tests (n = 6). c, A small fraction of tRNA-Arg(CCU) from other eukaryotic species have lost the base pairing in positions 1–71, 2–70 and 6–66 of the tRNA (multiple sequence alignment (MSA) across 1,614 species was taken from previously published work27; sequences with indels were excluded). d, Number of positive, negative or not significant pairwise interactions at FDR < 0.1 within the acceptor stem of the tRNA (n = 23,237) when both mutations are found in the same helix strand or when each mutation is located in a different strand (n = 13,615). log2 odds ratio shown below together with two-sided Fisher’s exact test P values. e, Number of positive, negative and non-significant background-averaged pairwise interactions between pairs of mutations in the acceptor stem that are found in the same RNA strand and between mutations that are in positions that base pair with each other. log2 odds ratio and two-sided Fisher’s exact test P values are shown below. f, Distribution of pairwise epistasis values of mutation pairs that restore a canonical WCBP depending on the location of their background mutations in the acceptor stem (P values from Welch’s two-sided t-test, n = 263 or n = 1,368 when more than one background mutations are in the same strand or not, respectively). The same result is obtained when epistasis values are corrected for the ln(fitness) of the background (residuals of a linear model using background ln(fitness) to predict epistasis, data not shown).
Extended Data Fig. 6 Changes in base pairing partially explain the consequences on fitness of single mutations.
a, A single mutation can either disrupt or restore a canonical WCBP depending on the background context. b, Percentage of deleterious or beneficial single mutations (at FDR < 0.1) that restore or disturb a canonical WCBP in any base pairing position of the tRNA. From a total of 4,300 mutations that restore WCBP, 721 are beneficial and 498 deleterious. 13,195 mutations result in the loss of a canonical pair (n = 6,806 mutations that create a wobble base pair and n = 6,389 that completely break the base pair interaction), of these 3,030 and 721 have significant deleterious and beneficial effects, respectively. WC, Watson–Crick, W, wobble and L, lost base pair. c, Same as b but split by mutation identity. d, Distribution of the effects of mutations in the tRNA acceptor stem that break a base pairing (left, n = 1,356 single mutations with higher background fitness than −0.15) have more deleterious effects when the neighbour base-pairing positions are composed of one or more wobble interactions (n = 921), instead of all canonical WCBP (n = 435, average fitness effect difference = 0.028, Welch’s two-sided t-test P value shown). Right plot illustrates the context of the base pairing of the stem.
a, The most significant background-averaged third-order interactions (8 out of 74, FDR < 0.1, n = 3,691 tests for all interactions across all orders). The first three plots of each row show how the distribution of pairwise epistasis of two mutations across different genetic backgrounds (each double mutation can be found in a median of 506 different genetic backgrounds) changes in the presence or absence of a third mutation. The paired differences between pairwise interactions in those three cases correspond to third order epistatic coefficients. Distributions of third-order epistasis for the same three mutations are shown to the right. Horizontal lines correspond to the background-averaged third-order epistatic term, coloured by sign (orange or green for positive or negative respectively). b, Number of significantly positive and negative background-averaged epistatic interactions of order one to eight (at FDR < 0.1). c, Distribution of the absolute magnitude of averaged third-order interactions plotted against the mean nucleotide distance between the three mutations (n = 316 triple mutations). Welch's two-sided t-test P values for differences between the groups are shown. Significant interactions (one-sample two-sided t-test at FDR < 0.1) are coloured in orange or green for positive or negative epistasis respectively. d, Top, Number of positive, negative or non-significant background-averaged third-order interactions (FDR < 0.1) within the acceptor stem of the tRNA when both mutations are found in the same helix strand or not (n = 129). Bottom, the log2 odds ratios (when all three mutations are found in the same strand of the tRNA acceptor stem) of significantly positive interactions versus others (negative or not significant interactions) and significantly negative interactions versus other double mutants. P values reported from the two-sided Fisher’s exact test.
a, Mean RMSE of the fitness prediction for tenfold cross-validation held-out genotypes (purple, test set) or genotypes included in the training set (yellow) for each of the eight-mutation sub-landscapes when progressively adding the 100 most significant epistatic coefficients out of the 256 possible coefficients. Highlighted in red is the average number of epistatic coefficients to obtain the lowest RMSE across all the sub-landscapes. b, Histogram of the minimum number of epistatic coefficients that give the minimum RMSE when predicting the fitness of the test genotypes by tenfold cross-validation in all complete eight-mutation sub-landscapes (top). Histogram of the median number of coefficients for each sub-landscape (bottom).
Extended Data Fig. 9 Comparison of the combinatorially-complete tRNA sub-landscapes to theoretical fitness landscapes.
a, Expected pattern of the average correlation of fitness effects γd at different mutational distances for theoretical di-allelic fitness landscapes with three to eight mutated positions. The average γd behaviour is highlighted in bold for each theoretical landscape (n = 250 simulated landscapes for each theoretical model). The NK landscape was modelled with K = L/2 (L, number of mutated positions) and the RMF as a mixture of 50% additive and 50% HoC. b, Decay of γd with mutational distance for all tRNA complete di-allelic sub-landscapes containing the S. cerevisiae parental genotype of three to eight loci (mean behaviour of γd in bold). c, Mean euclidean distance between the γd for the tRNA sub-landscapes and the γd of theoretical landscapes (each tRNA landscape was compared to the 250 simulations of each theoretical landscape, n = 73,250, 142,000, 159,500, 100,750, 33,000 and 4,500 for tRNA landscapes from three to eight mutations respectively). d, e, Mean roughness-to-slope ratio (r/s) (d) and epistasis classes (e) for all combinatorially-complete tRNA di-allelic landscapes from three to eight mutations, as well as for all theoretical landscape models (n = 250 for each theoretical landscape models and 293, 568, 638, 403, 132 and 18 tRNA landscapes from three to eight mutations respectively). Error bars are s.d.
Shortest paths between some pairs of extant species (top) together with the proportion of them that are accessible (bottom; yellow, accessible; purple, inaccessible). Nodes are the ln(fitness) of the species genotypes and the intermediate genotypes between them. Edge colours indicate the frequency at which a one-step mutation belongs to an accessible path (completely accessible, yellow; completely inaccessible, purple). Error bars are ln(fitness) s.e.m. of each genotype (propagated error from the n = 6 replicates).