The structure of genotype-phenotype maps makes fitness landscapes navigable

Greenbury, Sam F.; Louis, Ard A.; Ahnert, Sebastian E.

doi:10.1038/s41559-022-01867-z

Download PDF

Article
Published: 29 September 2022

The structure of genotype-phenotype maps makes fitness landscapes navigable

Nature Ecology & Evolution volume 6, pages 1742–1752 (2022)Cite this article

8134 Accesses
13 Citations
38 Altmetric
Metrics details

Subjects

Abstract

Fitness landscapes are often described in terms of ‘peaks’ and ‘valleys’, indicating an intuitive low-dimensional landscape of the kind encountered in everyday experience. The space of genotypes, however, is extremely high dimensional, which results in counter-intuitive structural properties of genotype-phenotype maps. Here we show that these properties, such as the presence of pervasive neutral networks, make fitness landscapes navigable. For three biologically realistic genotype-phenotype map models—RNA secondary structure, protein tertiary structure and protein complexes—we find that, even under random fitness assignment, fitness maxima can be reached from almost any other phenotype without passing through fitness valleys. This in turn indicates that true fitness valleys are very rare. By considering evolutionary simulations between pairs of real examples of functional RNA sequences, we show that accessible paths are also likely to be used under evolutionary dynamics. Our findings have broad implications for the prediction of natural evolutionary outcomes and for directed evolution.

Emergence of fractal geometries in the evolution of a metabolic enzyme

Article Open access 10 April 2024

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Main

Ever since they were first introduced in Sewall Wright’s foundational paper¹, fitness landscapes have become an enduring and central concept in evolutionary biology^2,3,4,5,6. In particular, a low-dimensional picture of fitness ‘peaks’ and fitness ‘valleys’ has played an important role in shaping intuition around evolutionary dynamics. A key prediction is that a population must typically traverse an unfavourable valley of lower fitness to move from one fitness peak to another. But, as already pointed out by many since^{4,7,8,9,10,11}, the space of genotypes is typically extremely high dimensional. As illustrated in Fig. 1, what appears to be a fitness valley in a lower-dimensional landscape could be easily bypassed when dimensions are added^9,10,11.

**Fig. 1: High-dimensional bypasses facilitate landscape navigability.**

Three key open questions are: (1) does the low-dimensional picture of fitness valleys hold for realistic high-dimensional genotype spaces? And if we define accessible paths of point mutations between a low fitness phenotype and a high fitness phenotype as those with monotonically increasing fitness, (2) what properties of biological systems facilitate their presence and (3) are such paths sufficiently common that they can easily be found by an evolving population?

One way forward is to consider empirical fitness landscapes, where much recent progress has been made^5,12, particularly for molecular phenotypes^{5,13,14,15,16,17,18,19,20,21}. This body of work has yielded important insights, such as the role of local epistatic interactions in sculpting evolutionary paths^22,23,24. Nevertheless, ruling out high-dimensional bypasses is difficult in empirical studies because genotype spaces, which grow exponentially as K^L for alphabet size K and genotype length L, are almost always unimaginably vast²⁵. They are also highly connected since distances are linear; two genotypes are at most L point mutations away, but are connected by up to L! shortest possible paths given the L mutations may occur in any order. For example, even for a very short L = 20 strand of RNA, there are up to 20! ≅ 2 × 10¹⁸ paths between any two genotypes. Empirical landscapes can typically only ever sample a small fraction of the full genotype space, so what appears to be an isolated fitness peak, may in fact be accessible via pathways not included in the experiment.

A different strand of work, which can in principle address questions of global accessibility, has focused on model genotype-to-fitness landscapes^{3,6,10,11,26,27}. If fitness is assigned randomly to genotypes, as in Kingman’s ‘house of cards’ model²⁸, then the probability of finding accessible paths is small. If instead there are correlations between fitness and the genotypes, then, depending on details of the model, accessible paths can indeed be common^11,29. These correlations are often expressed in terms of ruggedness: a more rugged model has fewer correlations between genotypes and fitness, and so is less navigable. While much progress has been made in this literature, it is not clear how well these models capture the true correlations of biological fitness landscapes.

Here we take a different approach, and build on recent advances showing that many realistic genotype-phenotype (GP) maps share generic structural features that can enhance navigability^30,31,32. In contrast to the genotype-to-fitness models studied by others (above), we consider the genotype-to-phenotype-to-fitness map by inserting the GP map as an additional intermediate step that provides the non-random organization of the mapping from genotypes to fitness. This means correlations in fitness are naturally incorporated as a consequence of the GP map, rather than through an assumption explicitly parameterized.

One commonality, with important implications for evolutionary dynamics, is the existence of large neutral networks of genotypes that map to the same phenotype^21,33. Another is that the mutational robustness ρ_p of a phenotype p (defined as the mean probability that a point mutation leaves the phenotype unchanged) is much larger than what one would expect from a naïve uncorrelated model. Without correlations, ρ_p ≅ f_p, where f_p is the fraction of genotypes that map to phenotype p. However, as genotypes from the same neutral network are highly correlated³⁰, robustness is orders of magnitude larger than the naïve expectation, scaling as ${\rho }_{p}\propto -\log {f}_{p}$, a generic feature observed across GP maps^{31,32,34,35,36,37}. Such large robustness means that neutral networks are easily navigable, providing access to a large amount of potential variation^31,32,38,39.

We first explore features of several specific GP maps that affect the navigability and the ruggedness of the landscape: redundancy (large neutral sets), frequency of the unfolded or trivial phenotype, neutral correlations (robustness) and high dimensionality. We next investigate accessible paths for functional RNA (fRNA) phenotypes identified in vivo from the fRNA database (fRNAdb)⁴⁰. Finally, we explore whether accessible paths are used in evolutionary dynamics under a wide range of dynamical regimes. We consider both random and non-random fitness assignments exploring the additional role of non-neutral phenotypic correlations. Our findings demonstrate that generic structural properties shared across many maps from genotype to phenotype dramatically enhance the navigability of fitness landscapes with important implications for evolutionary dynamics.

Results

Well-studied GP maps induce navigable fitness landscapes

To concretely measure the effects of different properties of GP maps on the navigability of fitness landscapes, we consider several well-known systems in detail, including the RNA secondary structure GP map for lengths L = 12 and L = 15 (RNA12, RNA15)^{41,42,43,44,45,46,47} representing the RNA sequence’s minimum free energy folded secondary structure, the Polyomino lattice self-assembly maps (S_2,8, S_3,8)^30,48,49 modelling the topology of protein quaternary structure assembled from interacting constituent tiles, and several hydrophobic-polar (HP) lattice protein models for folding of a sequence into a tertiary structure (two compact models, HP5x5 and HP3x3x3, and two non-compact ones, HP20 and HP25)^50,51,52. See Self-assembly GP maps and Extended Data Fig. 1 for further descriptions of these maps.

We performed computational experiments in which fitness is assigned to phenotypes randomly, and two phenotypes are chosen randomly from the set of all phenotypes as the ‘source’ and ‘target’. This is a worst-case scenario that highlights the effect of the correlations between genotypes and phenotypes on fitness. The key property we study is the navigability $\left\langle \psi \right\rangle$, defined as:

$$\left\langle \psi \right\rangle =\frac{1}{N}\mathop{\sum }\limits_{k}^{N}{\psi }_{{s}_{k}{t}_{k}}$$

over a set of N source–target pairs (s_k, t_k), where ψ_ij is the probability that single-point mutation steps with monotonically increasing fitness (an accessible path) exist from a genotype of phenotype i to a genotype of phenotype j. In other words, the navigability $\left\langle \psi \right\rangle$ is the average probability of an accessible path between phenotype pairs in the fitness landscape (Navigability estimation algorithm).

The value of $\left\langle \psi \right\rangle$ is greater than 0.6 for all the GP maps we consider, apart from the non-compact HP models HP20 and HP25. The non-compact HP models have a navigability $\left\langle \psi \right\rangle \le 0.013$ (we report $\left\langle \psi \right\rangle$ for the full set of GP maps in Extended Data Table 1). In light of these differences, we next investigate what generic structural properties of GP maps promote navigability.

Common GP map properties are associated with navigability

Redundancy, deleterious frequency and genotypic robustness

The first property we consider is the redundancy R of a GP map, defined as the average number of genotypes per non-deleterious phenotype (equation (1)), which is closely related to the average size of the neutral networks. Next, we consider the deleterious frequency f_del, defined as the fraction of genotype space that does not map to a well-defined phenotype. In the case of RNA secondary structure, the deleterious phenotype would correspond to the unfolded RNA strand (that is, the absence of any secondary structure). In the HP model it corresponds to the absence of a unique folded ground state. In the Polyomino model it corresponds to unbounded or non-deterministic (UND) assembly. Finally, we measure the mean genotypic robustness $\langle {\rho }_{{\mathrm{g}}}\rangle$, defined as the mean proportion of genotypic neighbours that have the same phenotype averaged over the non-deleterious genotypes. This provides a measure of local neutral connectivity.

In Fig. 2 we plot navigability against redundancy, deleterious frequency and mean genotypic robustness with the numerical values provided in Extended Data Table 1 and association measured with Spearman’s rank correlation coefficient ρ_s. Taking the GP maps together without system-specific considerations, we observe a general increase in navigability for greater redundancy (ρ_s = 0.643), smaller f_del (ρ_s = −0.619) and greater genotypic robustness (ρ_s = 0.548).

**Fig. 2: The relationship of navigability to the GP map properties of redundancy, deleterious frequency and mean genotypic robustness.**

The results across different GP maps provide some intuition for factors that determine navigability. With decreasing redundancy, it becomes more difficult to access all phenotypes as they begin to occupy smaller fractions of the overall space. As f_del increases, more neighbours of a given genotype will have a fitness of 0, therefore localizing phenotypes to smaller components in the GP map, increasing the likelihood of each genotype having no neighbouring genotypes with greater fitness.

Mean genotypic robustness provides an overall aggregate measure of the connectivity of the neutral networks. HP3x3x3 presents an example of particular interest by maintaining navigability ($\left\langle \psi \right\rangle =0.669$) with less redundancy (${\log }_{10}R=2.2$), large deleterious frequency (f_del = 0.939) and low genotypic robustness ($\langle {\rho }_{{\mathrm{g}}}\rangle =0.115$). The two non-compact HP models appear to be just below the thresholds that allow for navigability.

Positive neutral correlations increase navigability

As seen above, robustness plays a key role in enhancing navigability. For a null model, where genotypes map randomly to phenotypes, ρ_p ≅ f_p and average robustness is typically extremely low. High robustness therefore corresponds to strong neutral correlations: if a genotype maps to a specific phenotype, the probability that genotypes one mutation away also map to the same phenotype is highly enhanced³⁰. As mentioned before, it is widely observed that ${\rho }_{p} \sim -\log {f}_{p}$, a scaling first pointed out for the RNA map⁵³, but expected to be universal⁴⁵, because it naturally arises from a picture of constrained and unconstrained portions of genotype sequences^34,35,36. We can break naturally occurring correlations by taking two genotypes g₁ and g₂ at random and assigning the phenotype of g₁ to g₂ and vice versa. Such random swaps remove the intrinsic local correlations. Increasing the total number of swaps s reduces the correlations. We define in equation (9) a natural measure c(s) of the amount of decorrelation caused by the swaps in terms of the frequency f_p averaged across the phenotypes of the GP map for a given number of swaps s. When c(s) = 1, the correlations are equal to the original GP map, and when c(s) = 0, the correlations are that of the randomized null model.

In Fig. 3a, we plot how navigability varies with c(s) in S_2,8, RNA12, HP5x5 and HP3x3x3 GP maps, a subset of the GP maps from the previous section that are both small enough to be tractable here, and have sufficiently large navigability such that the effect of reducing correlations and dimensionality may be sizeable. All four GP maps, on average, show greater navigability for greater c(s) with an approximately linear decay in navigability with decreasing c(s), saturating at a lower value specific to each GP map: 0.378 ± 0.005 for RNA12, 0.100 ± 0.003 for HP5x5, 0.000 ± 0.000 for HP3x3x3 and 0.949 ± 0.002 for S_2,8, substantial reductions apart from for S_2,8. In S_2,8, the navigability $\left\langle \psi \right\rangle$ takes a greater value for the decorrelated GP map (c < 1) than for the original one (c = 1). This is because not all phenotypes are directly accessible from each other in the original GP map. However, a slight randomization increases phenotype inter-connectivity due to the fact that the number of phenotypes for S_2,8 is smaller than the number of local mutations (N_P < (K − 1)L). We expect that in GP maps of longer sequence length L, the role of positive neutral correlations will become even more pronounced. We explore this in Navigability of fRNA fitness landscapes with respect to fRNA phenotypes.

**Fig. 3: The relationship of navigability to neutral correlations, dimensionality and ruggedness with the associations visualized for an RNA phenotype network.**

Large dimensionality increases navigability

We now examine the effect of dimensionality of the GP map. The dimensionality of the entire GP map is defined as L, the length of the sequence. During the search for an accessible path from the source to target phenotype, all bases can be mutated, making use of the full dimensionality of the GP map. We can, however, reduce the dimensionality of the search by allowing only a random set of D sites (where D < L) to be mutated during a given search for an accessible path from source to target. We then consider $\left\langle \psi \right\rangle$ as a function of the relative dimensionality d = D/L for all D ∈ {1,..., L}.

In Fig. 3b, we plot navigability $\left\langle \psi \right\rangle$ as a function of d. Decreasing dimensionality severely reduces the navigability of fitness landscapes, with a sigmoidal relationship between $\left\langle \psi \right\rangle$ and d. All the curves show an increase from low navigability to high navigability as d → 1 of the full GP map. The critical value of d, and general scale and shape, is different across the four GP maps indicating a complex dependence on other GP map properties.

In addition to identifying an accessible path during the search from source to target, we also count the number of genotypes that do not have a neutral neighbour or neighbour with greater fitness. In other words, the proportion of genotypes that are local fitness peaks, therefore providing a measure of landscape ruggedness. The average proportion of genotypes that are local fitness peaks across source–target phenotype pairs and fitness assignments in a given GP map, is represented as $\left\langle \kappa \right\rangle$. In Fig. 3c, the ruggedness for each relative dimensionality d = D/L is plotted in the same four GP maps. We observe increasing dimensionality reduces ruggedness and, as relative dimensionality drops below a certain level, ruggedness sharply increases. Of note is HP3x3x3, where ruggedness is greater at a given relative dimensionality than for the other GP maps. Where all bases may mutate at d = 1, around 7 in 100 genotypes are local peaks ($\left\langle \kappa \right\rangle =0.07$) but navigability remains high ($\left\langle \psi \right\rangle =0.66$), demonstrating that partially rugged landscapes can still be navigable.

We illustrate an example of a source–target search in a schematic of the RNA12 GP map in Fig. 3d. We choose a random source and target pair and, during the search for an accessible path, keep track of all phenotypes encountered, their fitness and any transition between phenotypes. Each phenotype is represented as a node, edges as transitions between phenotypes and the value on the vertical axis as the fitness. The N_P = 58 phenotypes of this GP map are assigned coordinates in the horizontal plane using multidimensional scaling (MDS) on the basis of the pairwise Hamming distance between phenotypes⁵⁴. This allows phenotypes that are similar to each other to be located in similar parts of the MDS1–MDS2 plane. The source and target phenotypes are labelled ‘S’ and ‘T’ respectively, edges that may form accessible paths are coloured red and the remaining edges grey. This depiction of the fitness landscape immediately shows that it is highly connected with many accessible paths.

In Fig. 3e, with the same schematic source–target pair and fitness assignments as Fig. 3d, we illustrate the joint effect of neutral correlations and dimensionality on connectivity and navigability of the phenotype network for three different degrees of correlation (no correlations, some correlations, original correlations) and three different dimensionalities (D = 2, 6, 12). The top right of the nine plots is the original GP map that is also shown enlarged in Fig. 3d. In the case of D = 2, the dimensionality in which fitness valleys are often visualized in the literature, phenotypic connectivity is sparse, making the landscape unnavigable. The increase in navigability with increases in both dimensionality and correlations highlights that both the correlation structure of the underlying GP map, and the high-dimensional nature of the evolutionary search, are essential for navigability.

Navigability of fRNA fitness landscapes

Next we focus on the RNA secondary structure GP map by specifically choosing source and target phenotypes that have been observed in nature. This is important as only a small subset of all possible phenotypes are typically seen in real biological systems^49,55 and it is navigability among this subset that has most relevance for evolutionary processes.

Fitness valleys are not observed between short fRNAs

We sample RNA secondary structures from the fRNAdb⁴⁰. We consider pairs of fRNA phenotypes from the database with sequence length L, assigning a random fitness 0 ≤ F_source < 1 and F_target = 1, with random uniform assignment of fitness for all non-trivial phenotypes found during the search process. We consider the range L ∈ [20, 40], which is larger than the model GP maps we studied more exhaustively. We perform two distinct types of search by either permitting or preventing neutral mutations in exploring a given genotype’s mutational neighbourhood. This provides a means to directly measure the role of neutral correlations in facilitating navigability for larger L. Additionally we test two different fitness assignment schemes: (1) random as previously and (2) using a given phenotype’s dot-bracket Hamming distance to the target phenotype (see Fitness landscapes for further details). The former ignores non-neutral phenotypic correlations in the GP map, while the latter introduces local phenotypic fitness correlations under the assumption that similar dot-bracket phenotypes have more similar genotypes. As the sequence length increases the number of phenotypes grows as N_P ≅ 1.76^L (ref. ⁵⁶) producing a large computational overhead to track all phenotypes and genotypes encountered during a search. The computational threshold T is the maximum number of genotypes whose neighbourhoods are searched before the search is aborted, the proportion of which is defined as α. In Navigability in the fRNAdb, we describe other computational details necessary to measure navigability for larger L due to computational limitations.

In Table 1, the navigability $\left\langle \psi \right\rangle$ for fitness landscapes with fRNA of sequence length L = 20 − 40 is reported along with the proportion of searches that were aborted and whether or not neutral mutations were permitted. With neutral mutations allowed, $\left\langle \psi \right\rangle \approx 1$, suggesting that fitness landscapes with fRNAdb source and targets are highly navigable. For L > 30 the proportion of aborted searches increases, leading to the greater potential for this estimate to be biased. However, there is a strong indication, that with a greater computational threshold, similarly large navigability would be achieved at even larger L fRNA landscapes due to the observed scaling of $\left\langle \psi \right\rangle$ with the computational threshold (Supplementary Information). When Hamming fitness assignment is used, $\left\langle \psi \right\rangle =1$ and aborted runs are rare, demonstrating that phenotypic correlations (such as genotypic correlations) enhance navigability.

Table 1 The navigability $\left\langle \psi \right\rangle$ for length L = 20−40 fRNAs, the number of phenotypes in the fRNAdb, the proportion of runs that are aborted α and the estimated navigability $\left\langle \psi \right\rangle$ for both random and Hamming fitness and with and without neutral mutations

Full size table

Where neutral mutations are disallowed, we find that navigability is markedly reduced below unity, although still substantially greater than zero ($\left\langle \psi \right\rangle \in [0.273,0.628]$). This finding is intriguing as it highlights that positive neutral correlations are important, but not essential, for the existence of accessible paths in this system. A possible explanation lies in the vast number of phenotypes N_P ≅ 1.76^L available in the GP map, coupled with its high dimensionality. As fitness is randomly assigned and new variation is only a few mutations away, there is a pool of non-neutral phenotypes with possibly larger fitness, potentially within a small mutational radius. Additionally, given the fRNAdb is occupied with highly frequent phenotypes⁵⁶, the source and target themselves will have greater robustness and therefore larger neutral spaces that may be found. With Hamming fitness assignment, the reduction in navigability is only marginal, suggesting that phenotypic correlations can overcome dramatically diminished neutral correlations.

In Fig. 4, we use the representation introduced in Fig. 3d to illustrate an accessible path in fRNA. For the successful traversal between a specific source and target fRNA, we see a vast array of background, ‘greyed out’ phenotypes discovered during the search for an accessible path, as well as a shortest accessible path connecting ten different phenotypes with the node colour and their vertical axis coordinate showing their fitness. This illustration further highlights the hyper-connectedness and high-dimensional bypasses present in fRNA GP maps that are afforded through exponentially increasing redundancy, positive neutral correlations and high dimensionality. The phenotype network also serves again as an alternative depiction of the fitness landscape in which the effect of GP map structure on the course of potential evolutionary explorations may be grasped more intuitively.

**Fig. 4: Example of an accessible path for a specific L = 30 fRNA source–target pair.**

Summarizing our results, we have demonstrated that fRNA GP maps have navigable fitness landscapes up to L = 30 fRNA, and probably up to L = 40 given observed scaling with increased computational time. They are highly likely to be navigable for even larger in vivo fRNAs due to the observed scaling of both the GP map properties and navigability with respect to the computational threshold. Neutral mutations drastically increase and non-neutral phenotypic correlations enhance navigability, but neither solely determine the presence of accessible paths.

Evolutionary dynamics between fRNAs use accessible paths

Having considered whether accessible paths exist in a variety of GP maps, we next consider whether these accessible paths are found under evolutionary dynamics. It is conceivable that, while accessible paths to the true fitness maximum exist in a fitness landscape, there are so many alternative paths leading to local fitness maxima and that a population will become trapped necessitating passage across a fitness valley to reach the fittest phenotype.

Under evolutionary dynamics the adaptive path taken may be dependent on population mutation rate (NμL, with N population size, μ point mutation rate and L sequence length). We therefore explored both monomorphic (NμL ≪ 1) and polymorphic (NμL ≫ 1) regimes in the main text, with the Supplementary Information further investigating the role of population size and mutation rates.

For monomorphic evolutionary dynamics, we simulated evolution with a sequential fixation model⁵⁷ combined with Kimura’s fixation probability for a haploid population⁵⁸. For polymorphic evolutionary dynamics, we simulated a Wright–Fisher model, implemented via a genetic algorithm. Further details are provided in Navigability of fRNA fitness landscapes. As in that section, we again consider both random and Hamming fitness assignments.

We chose N_s = 20 source phenotypes for each of N_t = 50 target phenotypes, with the population initialized to a clonal population of genotypes that map to the source phenotype. The fitness of the target was set to 1. The adaptive path was measured during evolutionary search. In the monomorphic case, this was the sequence of genotypes (and their phenotypes) that fixed, while for polymorphic dynamics, the change in fitness of the population’s majority phenotype (more than 50% of genotypes) was measured. Analogously to landscape navigability, we define evolutionary navigability $\langle {\psi }^{{{{\rm{evo}}}}}\rangle$ as the average probability that the adaptive path reaches a target phenotype from a source phenotype via an accessible path, with the phenotypic evolutionary navigability $\langle {\psi }_{p}^{{{{\rm{evo}}}}}\rangle$ as the probability that an adaptive path to a specific target phenotype p is an accessible one. We have previously required that an accessible path did not have any decrease in fitness expressed with a tolerance to a maximum decrease of ΔF = 0 between phenotypes along the path. Here, we also measure the effect of relaxing this constraint.

In Fig. 5a–c (left) we plot histograms of evolutionary navigability in monomorphic (NμL ≪ 1 at N = 100,000) and polymorphic (NμL = 100 at N = 100) dynamical regimes for fRNA source and target phenotypes at sequence lengths L = 20, 30, 40. For monomorphic dynamics, we find that the Hamming fitness assignment has $\langle {\psi }_{p}^{{{{\rm{evo}}}}}\rangle =1$ across all phenotypes with negligible aborted runs. Random fitness assignment has high navigability at L = 20 but decreasing with increasing L. Across all L the aborted fraction is sizeable α ∈ [0.318, 0.550]. Figure 5d (left) relaxes the requirement for an accessible path to have a maximum fitness decrease of ΔF = 0.05. We find random fitness assignment can be navigable for some phenotypes ($\langle {\psi }_{p}^{{{{\rm{evo}}}}}\rangle =1$) but with an increase in aborted runs (α = 0.887) where it remains uncertain. In Fig. 5a–c (right), a similar pattern is observed for polymorphic evolutionary dynamics, with Hamming fitness assignment having $\langle {\psi }_{p}^{{{{\rm{evo}}}}}\rangle =1$ but with an increasing proportion of aborted runs for increasing L. Random assignment has reasonable $\langle {\psi }_{p}^{{{{\rm{evo}}}}}\rangle =0.302$ at L = 20 but becomes negligible at L = 40. In Fig. 5d (right), again, allowing a tolerance of ΔF = 0.05 regains navigability for some phenotypes with random fitness assignment (overall $\langle {\psi }^{{{{\rm{evo}}}}}\rangle =0.487$) but with a large proportion aborted (α = 0.842). We explore tolerance further in the Supplementary Information by identifying a ΔF sufficient to generate navigability.

Fig. 5: Navigability under evolutionary dynamics with source and target phenotypes sampled from the fRNAdb with random and Hamming fitness assignments under monomorphic (left) and polymorphic (right) evolutionary dynamics.

Navigability under monomorphic dynamics is sensitive to population size N with typically lower navigability in smaller populations (Supplementary Information). Navigability under polymorphic evolutionary dynamics is sensitive to population mutation rate NμL, with lower evolutionary navigability observed for smaller population mutation rates in the Wright–Fisher model (Supplementary Information). Echoing our investigation into the role of neutral mutations in Navigability of fRNA fitness landscapes, we also considered monomorphic evolutionary dynamics with and without neutral mutations where we also found that neutral mutations enhance evolutionary navigability (Supplementary Information). To gain insight beyond the computational limits incurred with increasing L, in the Supplementary Information, we explore navigability of coarse-grained fRNA ‘shape’ phenotypes⁵⁹. These have recently been shown to possess similar GP map properties of redundancy, bias⁵⁵ and neutral correlations⁶⁰ that we have shown earlier to be associated with and facilitate navigability. With this model we find evolutionary navigability can be attained in the monomorphic setting at lengths L = 60, 100, 140, and also enhanced by the Levenshtein fitness assignment, an equivalent of the Hamming fitness assignment related to phenotypic correlations.

Our evolutionary simulations have shown that neutral genotypic correlations and phenotypic correlations are sufficient to allow evolution to find accessible paths in the fitness landscape for certain conditions in both monomorphic and polymorphic regimes. Additionally, when these properties are jointly available (that is, Hamming fitness assignment with neutral mutations), they facilitate navigability under a very broad range of dynamical regimes.

Discussion

Our main contribution is to explicitly include the phenotype as an intermediate step between genotype and fitness, and therefore implicitly include generic properties such as redundancy and correlations that dramatically increase the navigability of fitness landscapes. We demonstrated for a wide range of evolutionary dynamical regimes that biological systems can be navigable, even when fitness is assumed to be distributed randomly. When fitness correlations based on phenotypic similarity are incorporated, navigability is enhanced even further. Our conclusions, that true fitness valleys are probably rare, should be relevant for a broad scope of issues in biological evolution.

Open questions remain: first, our computational explorations only allow for relatively small systems to be studied. However, there is evidence in our findings to suggest that navigability will hold at larger L too: (1) we found navigability to be monotonic for increasing L in the RNA we studied; (2) the deleterious fraction decreases monotonically with L for RNA; (3) while the number of sequences grows exponentially (N_P ≅ 1.76^L), as the number of sequences grows exponentially as 4^L, the average redundancy R will grow exponentially too at R ≅ 2.27^L. Given that robustness scales with frequency, the average genotype’s robustness will also grow, meaning that genotypes encountered along paths will have more neutral dimensions available; (4) phenotypes found in vivo are taken from a tiny fraction of phenotypes with the largest neutral sets and largest robustness^46,55,56, a phenomenon that may hold much more widely⁴⁹ and should greatly enhance navigability. (5) We have mainly studied a worst-case scenario with random assignment of fitness to phenotypes. For the fRNA strands, we also studied a fitness landscape based on Hamming distance between structures, showing that correlations between phenotypes improved navigability drastically. While much less is known about such phenotypic fitness correlations, they are likely to exist more generally and so enhance navigability. Taken together, these arguments suggest that landscapes at larger L should also have accessible paths and be navigable.

Another issue to consider is that the model systems we study all relate to some form of self-assembly, where we assign fitness to the physical structure alone. This will not always hold for all biological systems. For example, where a specific sequence is necessary to facilitate binding of a protein, an additional sequence constraint is imposed on top of that required to specify the structure. This additional specificity potentially reduces both the redundancy of the phenotype and the dimensionality available for accessing alternative genotypes.

Our findings support work on the role of high dimensionality in promoting accessibility^{4,7,8,9,10,11}, as well as attempts to create an up-to-date metaphor for evolutionary adaptation⁶¹, but moves well beyond the current literature by demonstrating both the generality across multiple systems and the presence of navigability with either random fitness assignments to phenotypes, or ones grounded in phenotypic similarity. A fuller understanding of the role of the GP map in structuring the high-dimensional fitness landscape could provide vital insights into areas such as the arrival of drug resistance^62,63 or the mutational progressions of cancer⁶⁴. In particular, understanding the fitness landscapes in cancer is notoriously challenging due to the difficulty of inferring the fitness of mutants⁶⁵. Introducing the notion of a mapping from genotypes to phenotypes and studying generic properties such as genetic correlations and redundancy may provide new insights into cancer evolution. Another example of particular current interest is found in viruses such as influenza or SARS-CoV-2 where mutations across a multitude of sites (high dimensionality) leads to variants (phenotypes) that evade host immune responses. Understanding whether accessible paths are afforded to such pathogenic viruses could provide important insights into their progression and population dynamics.

Methods

Self-assembly GP maps

We consider three GP maps for different systems of biological self-assembly: the RNA secondary structure GP map⁴² for secondary structure of RNA sequences, the HP lattice model for protein tertiary structure^50,66 and the Polyomino model for protein quaternary structure⁴⁸. The phenotype in each is solely related to the assembled structure. The GP maps have been extensively studied and compared in ref. ³⁰ and are shown in Extended Data Fig. 1. We summarize their details:

RNA secondary structure: genotypes are sequences where each position is one of the four RNA nucleotide bases (an alphabet ${{{\mathcal{A}}}}=\{A,C,G,U\}$). Phenotypes are the secondary structure bonding pattern of the minimum free energy fold of the genotype, represented with the dot-bracket notation⁴², apart from in the Supplementary Information where ‘RNA shapes’ are used instead⁵⁹. We use the Vienna package⁴² (v.1.8.5) with default parameters to convert RNA sequences to dot-bracket secondary structures. GP maps are represented as RNAL with sequences of length L. Extended Data Fig. 1 illustrates three example GP maps at L = 12, 15, 30.
HP lattice model: genotypes are sequences where each position is an amino acid base classified as either hydrophobic or polar (an alphabet ${{{\mathcal{A}}}}=\{H,P\}$)^50,66. Phenotypes are the minimum energy fold of the genotype, restricting the fold to occur on either a square or cubic lattice, with the energetics determined by interactions between neighbours on the lattice that are non-adjacent in the sequence. We represent folds with a string describing the moves that are required to construct fold on the lattice with the basis: ‘Up’, ‘Down’, ‘Right’, ‘Left’ for 2D lattices, and additionally ‘Forward’ and ‘Back’ for three-dimensional (3D) lattices. We follow refs. ^51,52 and consider energetic interactions between non-adjacent pairs to have values E_HH = −1, with E_HP = E_PP = 0, where H are hydrophobic and P are polar amino acids. If a sequence has a unique minimum energy structure, its phenotype is that structure, otherwise it is considered degenerate and not defined. We consider both the non-compact GP map and compact GP maps. The former identifies the minimum energy fold among all folds of a given length and is referred to as HPL. The latter only considers the set of compact structures as possible folds and is referred to as HPlxw for 2D lattices (for example, HP5x5) and HPlxwxh for 3D lattices (for example, HP3x3x3). The compact HP model only allows folds that fit within the prescribed grid (for example, either 5 × 5 or 3 × 3 × 3 here). These maximally compact subsets aim to capture the globular nature of in vivo proteins⁶⁷, vastly reducing the number of folds at a given length while being more faithful to observed protein structure topology. Extended Data Fig. 1 depicts examples from the two compact (HP3x3x3 and HP5x5) and two non-compact (HP20 and HP25) GP maps studied here.
Polyomino model: the Polyomino GP map represents protein quaternary structure on a 2D square lattice, with constituent tiles from assembly kit placed where interactions are present. Genotypes represent an assembly kit of N_t tiles, where each edge of the tile may have one of N_c colours (interface types) denoted by integers. Here we follow refs. ^30,48 and consider the GP maps ${S}_{{N}_{\mathrm{t}},{N}_{c}}$, specifically S_2,8 and S_3,8. We use N_c = 8 with bases from an alphabet ${{{\mathcal{A}}}}=\{0,1,2,3,4,6,7\}$ for each tile edge. Interactions are only allowed between 1 ↔ 2, 3 ↔ 4, 5 ↔ 6, with 0 and 7 being neutral. The genotype sequence is transformed from a sequence of bases and encoded in blocks of four clockwise around each assembly kit tile. To construct the phenotype from the assembly kit, the first encoded tile is used to ‘seed’ the assembly, with subsequent tile places made at randomly available points of interaction with assembly kit tiles that may be placed on the lattice. The assembly process terminates on no available placements remaining or if the structure becomes unbounded. The assembly process is repeated k = 200 times with the final Polyomino compared across the ensemble of assemblies. The phenotype is the unique bounded shape across the ensemble of assemblies, allowing for rotations, with a classification of UND otherwise.

The GP maps may be further characterized by their genotype sequence length L, base K, number of genotypes N_G = K^L and number of phenotypes N_P. The redundancy n_p of a given phenotype p is the number of genotypes that map to p and this is normalized by the size of the genotype space to give the frequency f_p = n_p/K^L. The overall redundancy R of a GP map is defined as the average number of genotypes per non-deleterious phenotype:

$$R={K}^{L}(1-{f}_{{\mathrm{del}}})/({N}_{{\mathrm{P}}}-1)$$

(1)

We provide Extended Data Table 2 to summarize the characteristic properties used to differentiate the GP maps.

A particular feature of all three GP maps is a single phenotype that is of a different nature to the others: for RNA secondary structure this is the unfolded ‘trivial’ structure, the HP lattice model it is sequences that have a degenerate minimum energy state and for the Polyomino model it is when there is UND growth. We refer to this phenotype here as the deleterious or del phenotype as, in each GP map, we consider it low fitness due to the non-specificity of the structural phenotype. We assign a fitness of zero for del throughout this work. While this is a strong assumption, given the large-scale dominance of the del phenotype in Polyomino and HP GP maps, we expect this assumption to exacerbate the presence of valleys rather than introducing a bias towards navigability.

Measuring landscape navigability

Definitions and formulation

To establish the presence of fitness valleys in a fitness landscape, we consider whether it is possible to reach the fittest phenotype from any given point in the genotype space via a path where the fitness increases monotonically defined as an accessible path^11,68. Landscape navigability has previously been defined as the proportion of accessible paths to a given genotype from all other genotypes¹⁷. To briefly summarize, here we specifically define the navigability as the average probability that a randomly chosen phenotype pair have at least one accessible path between them, given a fitness assignment process to phenotypes. We denote accessibility with ψ, where ψ = 1 indicates the presence of at least one accessible path between two phenotypes for a specific set of fitness assignments and ψ = 0 indicating no accessible paths. When ψ = 0, a fitness valley must be traversed between the phenotypes. With this notation, we use $\left\langle \psi \right\rangle$ to represent navigability of fitness landscapes for a given GP map.

Fitness landscapes

In conjunction with the GP map M, a fitness landscape instance is defined by the set of phenotype fitnesses ${{{\mathcal{F}}}}:= {\{{F}_{{p}_{i}}\}}_{i = 1}^{{N}_{{\mathrm{P}}}}$, with i denoting the ith indexed phenotype p_i. We refer to the source phenotype p and target phenotype q in the search for an accessible path from p → q. We consider two fitness assignments in this paper:

Random fitness: random samples ${F}_{{p}_{i}} \approx {{{\rm{Uniform}}}}(0,1)$ with target phenotype q having F_q = 1
Hamming distance: where the similarity of phenotype p compared to a phenotype q is measured by the number of matching positions in the aligned phenotype string representation given by $F(p,q)=1-\mathop{\sum }\nolimits_{j}^{L}\delta ({p}^{(j)},{q}^{(j)})/L$, where p^(j) is the string character representing phenotype p at the jth base position and F(p, q) is the fitness of phenotype p compared to a target phenotype q
F_del = 0 etc

for all fitness assignments

Navigability estimation

The probability of an accessible path (ψ = 1) between a source phenotype p and target phenotype q, given a random fitness landscape instance ${{{\mathcal{F}}}}$, is deterministic with a binary outcome. We can define the probability of ψ more explicitly as a function of p, q and ${{{\mathcal{F}}}}$ as follows:

$$\psi (p,q,{{{\mathcal{F}}}}):= P(\psi =1| p\to q,{{{\mathcal{F}}}})$$

(2)

where

$$\psi (p,q,{{{\mathcal{F}}}})=\left\{\begin{array}{ll}1&{{{\rm{if}}}}\,{{{\rm{at}}}}\,{{{\rm{least}}}}\,{{{\rm{one}}}}\,{{{\rm{accessible}}}}\,{{{\rm{path}}}}\,{{{\rm{exists}}}}\\ 0&{{{\rm{otherwise}}}}\end{array}\right.$$

(3)

We can take the expectation over ${{{\mathcal{F}}}}$ yielding the mean probability of an accessible path from p to q as:

$${\psi }_{pq}={E}_{{{{\mathcal{F}}}}}[\psi (p,q,{{{\mathcal{F}}}})]$$

(4)

With this notation, we can define the navigability for the GP map as the expectation over equation (4) for phenotypes p and q sampled uniformly at random:

$$\left\langle \psi \right\rangle ={E}_{p,q}[{\psi }_{pq}]$$

(5)

We can estimate this probability of reaching a given target phenotype q from a uniform randomly chosen source phenotype p by computationally measuring $\psi (p,q,{{{\mathcal{F}}}})$ for N_s randomly chosen sources for each of N_t randomly chosen targets, with a new random fitness landscape instance ${{{\mathcal{F}}}}$ for each pair. During the practical estimation, it is convenient to understand the outcome of the search as:

$$\psi ({p}_{{\mathrm{st}}},{q}_{{\mathrm{t}}},{{{{\mathcal{F}}}}}_{{\mathrm{st}}})=\left\{\begin{array}{ll}1&{{{\rm{at}}}}\,{{{\rm{least}}}}\,{{{\rm{one}}}}\,{{{\rm{accessible}}}}\,{{{\rm{path}}}}\\ 0&{{{\rm{no}}}}\,{{{\rm{accessible}}}}\,{{{\rm{path}}}},\,{{{\rm{not}}}}\,{{{\rm{aborted}}}}\\ {{{\rm{NA}}}}&{{{\rm{no}}}}\,{{{\rm{accessible}}}}\,{{{\rm{path}}}},\,{{{\rm{aborted}}}}\end{array}\right.$$

where searches are aborted if they extend beyond a computational threshold of genotypes encountered T. An estimate of the navigability $\left\langle \psi \right\rangle$ can be written as:

$$\left\langle \psi \right\rangle =\frac{1}{{N}_{c}}\mathop{\sum }\limits_{t=1}^{{N}_{{\mathrm{t}}}}\mathop{\sum }\limits_{s=1}^{{N}_{{\mathrm{s}}}}{I}_{T}(s,t)\psi ({p}_{{\mathrm{st}}},{q}_{{\mathrm{t}}},{{{{\mathcal{F}}}}}_{{\mathrm{st}}})$$

(6)

where p_st and q_t are the source and target phenotypes of sth source for the tth target, with ${I}_{T}(s,t):= I\left(\psi ({p}_{{\mathrm{st}}},{q}_{{\mathrm{t}}},{{{{\mathcal{F}}}}}_{{\mathrm{st}}})\ne {{{\rm{NA}}}}\right)$ an indicator for whether the run was not aborted, and therefore the number of completed runs is ${N}_{c}={\sum }_{t,s}{I}_{T}\left(s,t\right)$ with the aborted proportion α:

$$\alpha =1-\frac{{N}_{c}}{{N}_{{\mathrm{t}}}{N}_{{\mathrm{s}}}}$$

(7)

The estimate of the navigability of a fitness landscape with GP map has an associated Bernoulli standard error (derived from an estimate of the corrected sample standard deviation):

$${\mathrm{s.e.}}(\left\langle \psi \right\rangle )=\sqrt{\frac{\left\langle \psi \right\rangle \left(1-\left\langle \psi \right\rangle \right)}{{N}_{c}-1}}$$

(8)

We next describe in more detail the computational algorithm for estimating $\left\langle \psi \right\rangle$.

Navigability estimation algorithm

For a given source and target phenotype, in each random landscape instance, we perform the following computational algorithm to measure ψ. We first provide some definitions:

GP map M is a function $M:{{{\mathcal{G}}}}\to {{{\mathcal{P}}}}$ where ${{{\mathcal{G}}}}$ is the space of genotypes and ${{{\mathcal{P}}}}$ is the space of phenotypes, such that we can write the phenotype p of genotype g as p = M(g)
Dimensionality: we define the set of sequence positions that may be mutated as ${{{\mathcal{D}}}}$, with the size of $| {{{\mathcal{D}}}}|$ being the dimensionality D. When $| {{{\mathcal{D}}}}| =L$ all base positions are mutable. Relative dimensionality is defined as the dimensionality relative to sequence length d = D/L
Alphabet: sequences have a set of ${{{\mathcal{A}}}}$ possible letters at a given site and the size of $| {{{\mathcal{A}}}}| =K$ is the base
u₀ contains genotypes whose 1-mutant neighbours are yet to be considered in a given search for an accessible path
u₁ contains genotypes that have already had their 1-mutant neighbours considered in a given search for an accessible path

The algorithm proceeds with a breadth-first search:

(1)
A random genotype g that maps to the source phenotype is chosen and added to u₀
(2)
Set the first element of u₀ as g
(3)
For base $a\in {{{\mathcal{A}}}}$ at position j and for each position $j\in {{{\mathcal{D}}}}$, measure genotype neighbour $g^{\prime}$ and phenotype $p^{\prime} =M(g^{\prime} )$
(4)
If ${F}_{p^{\prime} }\ge {F}_{p}$ and $g^{\prime} \notin {u}_{1}$, add $g^{\prime}$ to u₀
(5)
Move g from u₀ to u₁
(6)
If ∣u₀∣ = 0 or ∣u₀∣ + ∣u₁∣ > T (computational threshold) or the target phenotype is found, return ‘aborted’ or ψ, respectively: otherwise return to step 2

The algorithm finishes with either u becoming empty, or the combined size of u₀ and u₁ becoming larger than a predefined threshold T (introduced in Definitions and formulation), beyond which computational progress may become unfeasible. We discard these aborted runs from the measurement of navigability $\left\langle \psi \right\rangle$ using the indicator function I_T of the previous section (Navigability estimation).

As described in equation (6) we pick N_s source phenotypes uniformly at random for each of the N_t target phenotypes also chosen at random. We set N_t = 20 and N_s = 50. The uncertainty in the estimate of the navigability $\left\langle \psi \right\rangle$ is reported as the standard error ${\mathrm{s.e.}}(\left\langle \psi \right\rangle )$ across the ensemble of measurements.

Removing correlations

To measure the effect of positive neutral correlations³⁰, we perform genotype swaps and then repeat the measurement of $\left\langle \psi \right\rangle$. This process involves constructing a new GP map M_s from the original GP map M_s=0 ≔ M where s is the number of pairs of genotypes whose phenotypes have been swapped. More precisely, a swap involves selecting two genotypes g₁ and g₂ with uniform random probability and setting M_s(g₁) = M_s−1(g₂) and M_s(g₂) = M_s−1(g₁). It follows that M_s→∞ is the uncorrelated random null model GP map with no positive neutral correlations as used in ref. ³⁰. As shown in ref. ³⁰, the random null model has ρ_p ≅ f_p when there are no positive neutral correlations. Therefore, we additionally define the correlations c present in a given GP map M_s by comparing the logarithm of the average robustness-to-frequency ratio in a given GP map against the original GP map, generating a scale for measuring correlations in M_s:

$$c(s)=\frac{{\log }_{10}{\left\langle \frac{{\rho }_{p}(s)}{{f}_{p}(s)}\right\rangle }_{p}}{{\log }_{10}{\left\langle \frac{{\rho }_{p}(0)}{{f}_{p}(0)}\right\rangle }_{p}}$$

(9)

where for s = 0 we have c(0) = 1, and for $\mathop{\lim }\limits_{s\to \infty }c(s)\approx 0$ the expectation for the random model. Therefore, the scale yields positive values for c where there is, on average, greater robustness than frequency. The process of removing correlations gradually from the original GP map (s = 0) to the random null model (s → ∞) provides a range over which the relationship between positive neutral correlations and navigability may be considered in GP maps. We measure the navigability of S_2,8, RNA12, HP3x3x3 and HP5x5 by taking 100 evenly spaced values for s on the range s = [0, K^L] and measuring $\left\langle \psi \right\rangle$ and c(s) for each.

Restricting dimensionality

To measure the role of dimensionality we restrict the dimensionality of a search for an accessible path from source to target by only allowing a set of ${{{\mathcal{D}}}}$ randomly chosen positions along the sequence to be mutated in the 1-mutant neighbour measurement in step 3 of the navigability algorithm above (Navigability estimation algorithm). The dimensionality D is the number of positions that may be mutated $| {{{\mathcal{D}}}}|$, and the relative dimensionality d ≔ D/L. When D = L we have the original dimensionality, while for D = 1 only a single sequence position may be mutated. The GP map M itself is not changed under this dimensional restriction but rather the connectivity of genotypes and therefore the connectivity of the fitness landscape.

We measure the navigability of S_2,8, RNA12, HP3x3x3 and HP5x5 by taking evenly spaced values for D on the range D ∈ [1, L].

Measuring ruggedness

For fitness landscapes, related to navigability is the concept of landscape ruggedness. We measure κ(g), whether a genotype is a local fitness maximum, during the search from source to target. The average proportion of genotypes that are local fitness maxima provides a measure of ruggedness²⁶. Whether a genotype g is a local fitness peak is determined by the fitness of all accessible 1-mutant neighbours $g^{\prime}$, such that:

$$\kappa (g)=\left\{\begin{array}{ll}1&{{{\rm{if}}}}{F}_{M(g^{\prime} )} < {F}_{M(g)}\forall g^{\prime} \in \sigma (g)\\ 0&{{{\rm{otherwise}}}}\end{array}\right.$$

(10)

where we have the function σ(g), which returns the set of 1-mutants of genotype g. We calculate the ruggedness for a landscape by taking the average of κ(g) over all genotypes and all source–target pairs once the search has completed. We denote the ruggedness as $\left\langle \kappa \right\rangle$.

Navigability in the fRNAdb

In Navigability of fRNA fitness landscapes, we examine navigability in a specific subset of RNA phenotypes, namely those that are found in the fRNAdb⁴⁰. For a given length, we use all phenotypes in proportion to their occurrence in the fRNAdb apart from the trial structure that we exclude as it is assigned zero fitness here. We randomly choose N_t = 50 targets with N_s = 20 randomly chosen sources from this set.

To examine navigability between fRNAs, we must consider sequences longer than L = 15. In doing so, we introduce additional computational overhead given the increasing neutral set size resulting in the condition ∣u₀∣ + ∣u₁∣ > T being more likely to be met. Therefore, to maximize the number of non-aborted runs, we perform a modified depth-first search (DFS) where we attempt to greedily follow paths of increasing gradient until we reach the maximum fitness phenotype. If the path fails, instead of moving back one step as in a standard DFS, we go all the way back to the start of the walk and pick an unexplored neighbour with the lowest fitness to begin a new uphill walk. In this way, we maximize the exploration of new phenotypes by always starting our deep walks from the lowest point while still maintaining the ability to perform long walks during the search.

We write the modified DFS algorithm explicitly as:

(1)
A random genotype g that maps to the source phenotype is chosen and added to u₀
(2)
Set the first element of u₀ as g, and p = M(g)
(3)
For each alternative base $a\in {{{\mathcal{A}}}}$ at position j and for each position j in ${{{\mathcal{D}}}}$, measure genotype neighbour $g^{\prime}$ and phenotype $p^{\prime} =M(g^{\prime} )$
(4)
If any $g^{\prime}$ has ${F}_{p^{\prime} } > {F}_{p}$ and $g^{\prime} \notin {u}_{1}$ and $g^{\prime} \notin {u}_{0}$, add $g^{\prime}$ to front of u₀ and return to step 2
(5)
If any $g^{\prime}$ have $p=p^{\prime}$ and ∣u₀∣ = 1, add one such neutral case to the back of u₀ if $g^{\prime} \notin {u}_{0}$ and $g^{\prime} \notin {u}_{1}$
(6)
Move g from u₀ to u₁
(7)
If ∣u₀∣ = 0 or ∣u₀∣ + ∣u₁∣ > T (computational threshold) or the target phenotype is found, return ‘aborted’ or ψ, respectively: otherwise return to step 2

We note that for searches where neutral mutations are not permitted as part of the search, step 5 of the above is omitted.

In terms of computational time, on a single Intel Xeon core at 2.8 GHz a single search for a target with T = 2 × 10⁶ took on average 0.9 minutes for L = 20, 1.3 hours for L = 30 and 19.1 hours for L = 40. With T = 2 × 10⁴, the times were on average 0.1 minutes for L = 20, 3.0 minutes for L = 30 and 19.5 minutes for L = 40.

Navigability estimation under evolutionary dynamics

We measured fitness landscape navigability as the average probability that a given source–target pair could be connected by way of an accessible path. We extend this definition to the stricter requirement of evolutionary navigability where the evolutionary dynamics of a population is considered instead of just the existence of an accessible path in crossing the fitness landscape.

Monomorphic evolutionary dynamics

We model monomorphic evolutionary dynamics with a sequential fixation model⁵⁷, assuming that the rate of mutation is much less than the time it takes for mutants to reach fixation once they have arisen. Under this model, the sequence of fixation can be treated as a Markov chain, with the adaptive path of the population essentially following a biased random walk.

Following the formalism of ref. ⁵⁷, and assuming that the neighbouring genotypes σ(g) of genotype g will be produced at equal rates, the probability that mutant genotype h will be the next to fix after genotype g, is given by:

$$P(g,h)=\frac{{P}_{{{{\rm{fix}}}}}(s(h,g),N)}{{\sum }_{g^{\prime} \in \sigma (g)}{P}_{{{{\rm{fix}}}}}(s(g^{\prime},g),N)}$$

(11)

where the probability P_fix that a given mutant arising in a haploid population of size N is given by Kimura’s equation⁵⁸:

$${P}_{{{{\rm{fix}}}}}(s,N)=\frac{1-\exp (-2s)}{1-\exp (-2Ns)}$$

(12)

with $s(g^{\prime},g)={F}_{g^{\prime} }/{F}_{g}-1$ as the relative fitness of genotype $g^{\prime}$ to genotype g.

To computationally implement these dynamics for a given source–target pair of phenotypes p and q, respectively, with fitness assignment function F (either random or Hamming, Fitness landscapes), we perform the following algorithm up to a limit of T iterations:

(1)
Set genotype g as the source genotype and its phenotype p corresponding to randomly chosen entry from fRNAdb, calculating its fitness F_p using the fitness assignment function
(2)
For each neighbouring genotype $g^{\prime}$ in the set σ(g) of neighbours of g, calculate their phenotype $p^{\prime} =M(g^{\prime} )$
(3)
Calculate the fitness of each neighbour ${F}_{p^{\prime} }$ and the ${P}_{{{{\rm{fix}}}}}(s(g^{\prime},g),N)$
(4)
Randomly choose a neighbour genotype $g^{\prime}$ in proportion to ${P}_{{{{\rm{fix}}}}}(s(g^{\prime},g),N)$
(5)
Set $g\leftarrow g^{\prime}$, t ← t + 1
(6)
Return to step 2 if $M(g^{\prime} )\ne q$ and t < T: otherwise terminate

We performed the evolutionary search for N_s = 20 sources for each of N_t = 50 targets randomly chosen from the fRNAdb at lengths L = 20, 30, 40, with both random and Hamming distance fitness assignment (Fitness landscapes). A computational limit of T = 50,000 sequential fixations was used. On a single Intel Xeon core at 2.8 GHz a single search from source to target took on average 0.4 minutes for L = 20, 5.1 minutes for L = 30 and 30.7 minutes for L = 40.

Non-monomorphic evolutionary dynamics

For non-monomorphic evolutionary dynamics, we modelled the evolutionary process using Wright–Fisher dynamics^69,70. This directly involved simulating a population of N genotypes and updating this population every generation with genotypes chosen for reproduction in proportion to their fitness, with point mutations applied.

For a given source–target pair of phenotypes p and q, respectively, we use the following algorithm:

(1)
Set genotype g as the source genotype and its phenotype p corresponding to randomly chosen entry from the fRNAdb
(2)
Make N copies of g constructing the population Γ_t=0 at time t = 0
(3)
For subsequent times 0 < t ≤ T (with T as the computational limit/maximum number of generations), we repeat the following:
1. (a)
  For ith genotype g_i of the population Γ_t, calculate the phenotype _pi and its fitness ${F}_{{p}_{i}}$
2. (b)
  Sample N genotypes at random with probability ${F}_{{p}_{i}}/{\sum }_{k}{F}_{{p}_{k}}$ with replacement from Γ_t, constructing a temporary population of genotypes ${{{\Gamma }}}_{t}^{\prime}$
3. (c)
  For each base position j for each genotype i of ${{{\Gamma }}}_{t}^{\prime}$, apply a random mutation with Bernoulli probability μ (point mutation rate). Where a mutation is applied to g_ij, a random alternative base to the current is chosen from {A, C, G, U}⧹g_ij with uniform probability.
4. (d)
  Set the population at time t + 1 from the mutated temporary population: ${{{\Gamma }}}_{t+1}\leftarrow {{{\Gamma }}}_{t}^{\prime}$

We performed the evolutionary search for N_s = 20 sources for each of N_t = 50 targets randomly chosen from the fRNAdb at lengths L = 20, 30, 40, with both random and Hamming distance fitness assignment (Fitness landscapes). A computational limit of T = 20, 000 generations and a population size of N = 100 was used. Population mutation rates of NμL = 1 (intermediate), NμL = 10 and NμL = 100 (polymorphic as NμL ≫ 1) were investigated.

On a single Intel Xeon core at 2.8 GHz, a single simulation of a population of size N = 100 for T = 10,000 generations took on average 1.3 minutes for L = 20, 5.6 minutes for L = 30 and 16.7 minutes for L = 40.

Estimating evolutionary navigability

To quantify navigability under evolutionary dynamics we need to define the adaptive path from source to target. For monomorphic evolutionary dynamics, this is the genotypes (and their corresponding phenotype’s and fitness) along the Markov chain of sequential fixations. For non-monomorphic evolutionary dynamics, we measure whether the population Γ_t has a majority phenotype with proportion greater than 50%, otherwise recording a null value, leading to a sequence of majority phenotypes and their corresponding fitnesses during the search. An accessible path is an adaptive path that reaches the target with monotonic fitness changes along the adaptive path. We defined evolutionary navigability for a given GP map as the average probability that an adaptive path was an accessible path given the evolutionary dynamics, GP map and fitness assignment.

To estimate this computationally, we record two binary properties of the search: (1) whether the target was discovered (Successful) and (2) whether the adaptive path only increased in fitness (Monotonic). We record whether the population took an accessible path by enumerating the cases:

$${\psi }^{{{{\rm{evo}}}}}=\left\{\begin{array}{ll}1&{{{\rm{Successful}}}}\,{{{\rm{AND}}}}\,{{{\rm{Monotonic}}}}\\ 0&{{{\rm{Successful}}}}\,{{{\rm{AND}}}}\,{{{\rm{not}}}}\,{{{\rm{Monotonic}}}}\\ 0&{{{\rm{not}}}}\,{{{\rm{Successful}}}}\,{{{\rm{AND}}}}\,{{{\rm{not}}}}\,{{{\rm{Monotonic}}}}\\ {{{\rm{NA}}}}&{{{\rm{not}}}}\,{{{\rm{Successful}}}}\,{{{\rm{AND}}}}\,{{{\rm{Monotonic}}}}\\ \end{array}\right.$$

with the evolutionary navigability then estimated over a k-indexed ensemble of searches as:

$$\left\langle {\psi }^{{{{\rm{evo}}}}}\right\rangle =\frac{1}{{N}_{c}}\mathop{\sum}\limits_{k}{\psi }_{{s}_{k},{t}_{k}}^{{{{\rm{evo}}}}}I\left({\psi }_{{s}_{k},{t}_{k}}^{{{{\rm{evo}}}}}\ne {{{\rm{NA}}}}\right)$$

(13)

with ${N}_{c}={\sum }_{k}I\left({\psi }_{{s}_{k},{t}_{k}}^{{{{\rm{evo}}}}}\ne {{{\rm{NA}}}}\right)$ counting the searches where it is certain that the search will be via an accessible path or not. As in equation (7), the proportion of searches aborted is given by $\alpha =1-\frac{{N}_{c}}{{N}_{{\mathrm{t}}}{N}_{{\mathrm{s}}}}$. We additionally define the phenotypic evolutionary navigability $\langle {\psi }_{p}^{{{{\rm{evo}}}}}\rangle$ for an individual phenotype p as an ensemble where t_k = p for all k, such that:

$$\left\langle {\psi }_{p}^{{{{\rm{evo}}}}}\right\rangle =\frac{1}{{N}_{c}}\mathop{\sum}\limits_{k}{\psi }_{{s}_{k},p}^{{{{\rm{evo}}}}}I\left({\psi }_{{s}_{k},p}^{{{{\rm{evo}}}}}\ne {{{\rm{NA}}}}\right)$$

(14)

providing a means to investigate the distribution conditional on the target phenotype p as well as overall navigability.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The dataset containing fRNA (fRNAdb) used in this paper is available at: https://doi.org/10.18908/lsdba.nbdc00452-001. The GP maps analysed are available in the Code availability section.

Code availability

The ViennaRNA package (v.1.8.5), RNAshape package https://anaconda.org/bioconda/rnashapes and custom C++ and Python source code was used to construct GP maps and perform computational simulations. The source code is available at: https://github.com/sgreenbury/gp-maps-nav.

References

Wright, S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. In Proc. 6th International Congress on Genetics Vol. 1, 356–366 (1932).
Kauffman, S. A. The Origins of Order: Self-Organization and Selection in Evolution (Oxford Univ. Press, 1993).
Svensson, E. & Calsbeek, R. The Adaptive Landscape in Evolutionary Biology (Oxford Univ. Press, 2012).
Pigliucci, M. in The Adaptive Landscape in Evolutionary Biology (eds Svensson, E. & Calsbeek, R.) 26–38 (Oxford Univ. Press, 2012).
Arjan, J., de Visser, G. M. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genetics 15, 480–490 (2014).
Article Google Scholar
Fragata, Inês, Blanckaert, A., António Dias Louro, M., Liberles, D. A. & Bank, C. Evolution in the light of fitness landscape theory. Trends Ecol. Evol. 34, 69–82 (2019).
Article PubMed Google Scholar
Fisher, R. A. The Genetical Theory of Natural Selection (Clarendon Press, 1958).
May, R. M. Stability and Complexity in Model Ecosystems (Princeton Univ. Press, 1973).
Conrad, M. & Ebeling, W. M.V. Volkenstein, evolutionary thinking and the structure of fitness landscapes. BioSystems 27, 125–128 (1992).
Article CAS PubMed Google Scholar
Gavrilets, S. Fitness Landscapes and the Origin of Species (MPB-41) (Princeton Univ. Press, 2004).
Franke, J., Klözer, A., Arjan, J., de Visser, G. M. & Krug, J. Evolutionary accessibility of mutational pathways. PLoS Comput. Biol. 7, e1002134 (2011).
Article CAS PubMed PubMed Central Google Scholar
Das, S. G., Direito, SusanaO. L., Waclaw, B., Allen, R. J. & Krug, J. Predictable properties of fitness landscapes induced by adaptational tradeoffs. eLife 9, 1–24 (2020).
Article Google Scholar
Weinreich, D. M., Watson, R. A. & Chao, L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59, 1165–1174 (2005).
CAS PubMed Google Scholar
Carneiro, Maurício & Hartl, D. L. Adaptive landscapes and protein evolution. Proc. Natl Acad. Sci. USA 107, 1747–1751 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wu, N. C., Dai, L., Olson, C. A., Lloyd-Smith, J. O. & Sun, R. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 5, 1–21 (2016).
Article Google Scholar
Bank, C., Matuszewski, S., Hietpas, R. T. & Jensen, J. D. On the (un)predictability of a large intragenic fitness landscape. Proc. Natl Acad. Sci. USA 113, 14085–14090 (2016).
Article CAS PubMed PubMed Central Google Scholar
Aguilar-Rodríguez, José, Payne, J. L. & Wagner, A. A thousand empirical adaptive landscapes and their navigability. Nat. Ecol. Evol. 1, 0045 (2017).
Article Google Scholar
Domingo, J. úlia, Diss, G. & Lehner, B. Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature 558, 117–121 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zheng, J., Payne, J. L. & Wagner, A. Cryptic genetic variation accelerates evolution by opening access to diverse adaptive peaks. Science 365, 347–353 (2019).
Article CAS PubMed Google Scholar
Pokusaeva, V. O. et al. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLOS Genet. 15, e1008079 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wagner, A. Life Finds a Way: What Evolution Teaches Us about Creativity (Oneworld Publications, 2019).
Poelwijk, F. J., Kiviet, D. J., Weinreich, D. M. & Tans, S. J. Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445, 383–386 (2007).
Article CAS PubMed Google Scholar
Lobkovsky, A. E. & Koonin, E. V. Replaying the tape of life: quantification of the predictability of evolution. Front. Genet. 3, 246 (2012).
Article PubMed PubMed Central Google Scholar
Hartl, D. L. What can we learn from fitness landscapes? Curr. Opin. Microbiol. 21, 51–57 (2014).
Article PubMed Google Scholar
Louis, A. A. Contingency, convergence and hyper-astronomical numbers in biological evolution. Stud. Hist. Philos. Sci. C. Stud. Hist. Philos. Biol. Biomed. Sci. 58, 107–116 (2016).
Article Google Scholar
Kauffman, S. & Levin, S. Towards a general theory of adaptive walks on rugged landscapes. J. Theo. Biol. 128, 11–45 (1987).
Article CAS Google Scholar
Zagorski, M., Burda, Z. & Waclaw, B. Beyond the hypercube: evolutionary accessibility of fitness landscapes with realistic mutational networks. PLoS Comput. Biol. 12, e1005218 (2016).
Article PubMed PubMed Central Google Scholar
Kingman, J. F. C. A simple model for the balance between selection and mutation. J. Appl. Probab. https://doi.org/10.2307/3213231 (1978).
Østman, B. & Adami, C. in Recent Advances in the Theory and Application of Fitness Landscapes (eds Richter, H. & Engelbrecht, A.) 509–526 (Springer, 2014).
Greenbury, S. F., Schaper, S., Ahnert, S. E. & Louis, A. A. Genetic correlations greatly increase mutational robustness and can both reduce and enhance evolvability. PLoS Comput. Biol. 12, 1–27 (2016).
Article Google Scholar
Ahnert, S. E. Structural properties of genotype–phenotype maps. J. R. Soc. Int. 14, 20170275 (2017).
Article Google Scholar
Manrubia, S. et al. From genotypes to organisms: state-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys. Life Rev. 38, 55–106 (2021).
Article CAS PubMed Google Scholar
van Nimwegen, E., Crutchfield, J. P. & Huynen, M. Neutral evolution of mutational robustness. Proc. Natl Acad. Sci. USA 96, 9716–9720 (1999).
Article PubMed PubMed Central Google Scholar
Greenbury, S. F. & Ahnert, S. E. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype-phenotype maps. J. R. Soc. Interface 12, 20150724 (2015).
Article CAS PubMed PubMed Central Google Scholar
Manrubia, S. & Cuesta, JoséA. Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps. J. R. Soc. Interface 14, 20160976 (2017).
Article PubMed PubMed Central Google Scholar
Weiß, M. & Ahnert, S. E. Phenotypes can be robust and evolvable if mutations have non-local effects on sequence constraints. J. R. Soc. Interface 15, 20170618 (2018).
Article PubMed PubMed Central Google Scholar
Camargo, C. Q. & Louis, A. A. in Complex Networks XI (eds Barbosa, H. et al.) 143–155 (Springer, 2020).
Wagner, A. Robustness and evolvability: a paradox resolved. Proc. R. Soc. B: Bio. Sci. 275, 91–100 (2008).
Article Google Scholar
Schaper, S., Johnston, I. G. & Louis, A. A. Epistasis can lead to fragmented neutral spaces and contingency in evolution. Proc. R. Soc. B: Bio. Sci. 279, 1777–1783 (2012).
Article CAS Google Scholar
Kin, T. et al. fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res. 35, 145–148 (2007).
Article Google Scholar
Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. From sequences to shapes and back: a case study in RNA secondary structures. Proc. R. Soc. B. Bio. Sci. 255, 279–284 (1994).
Article CAS Google Scholar
Hofacker, I. L. et al. Fast folding and comparison of RNA secondary structures. Monatshefte Chem. 125, 167–188 (1994).
Article CAS Google Scholar
Fontana, W. Modelling ‘evo-devo’ with RNA. BioEssays 24, 1164–1177 (2002).
Article CAS PubMed Google Scholar
Cowperthwaite, M. C., Economo, E. P., Harcombe, W. R., Miller, E. L. & Meyers, LaurenAncel The ascent of the abundant: how mutational networks constrain evolution. PLoS Comput. Biol. 4, e1000110 (2008).
Article PubMed PubMed Central Google Scholar
Aguirre, J., Buldú, J. M., Stich, M. & Manrubia, S. C. Topological structure of the space of phenotypes: the case of RNA neutral networks. PLoS ONE 6, e26324 (2011).
Article CAS PubMed PubMed Central Google Scholar
Schaper, S. & Louis, A. A. The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima. PLoS ONE 9, e86635 (2014).
Article PubMed PubMed Central Google Scholar
Wagner, A. The Origins of Evolutionary Innovations: A Theory of Transformative Change in Living Systems (Oxford Univ. Press, 2011).
Greenbury, S. F., Johnston, I. G., Louis, A. A. & Ahnert, S. E. A tractable genotype–phenotype map modelling the self-assembly of protein quaternary structure. J. R. Soc. Interface 11, 20140249 (2014).
Article PubMed PubMed Central Google Scholar
Johnston, I. G. et al. Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution. Proc. Natl Acad. Sci. USA 119, e2113883119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dill, K. A. Theory for the folding and stability of globular proteins. Biochemistry 24, 1501–1509 (1985).
Article CAS PubMed Google Scholar
Irbäck, A. & Troein, C. Enumerating designing sequences in the HP model. J. Biol. Phys. 28, 1–15 (2002).
Article PubMed PubMed Central Google Scholar
Ferrada, E. & Wagner, A. A comparison of genotype-phenotype maps for RNA and proteins. Biophysical J. 102, 1916–1925 (2012).
Article CAS Google Scholar
Jörg, T., Martin, O. & Wagner, A. Neutral network sizes of biological RNA molecules can be computed and are not atypically small. BMC Bioinformatics 9, 464 (2008).
Article PubMed PubMed Central Google Scholar
Borg, I. & Groenen, P. J. F. Modern Multidimensional Scaling: Theory and Applications (Springer Science & Business Media, 2005).
Dingle, K., Ghaddar, F., Šulc, P. & Louis, A. A. Phenotype bias determines how natural RNA structures occupy the morphospace of all possible shapes. Mol. Biol. Evol. 39, msab280 (2021).
Dingle, K., Schaper, S. & Louis, A. A. The structure of the genotype-phenotype map strongly constrains the evolution of non-coding RNA. Int. Focus 5, 20150053 (2015).
Google Scholar
McCandlish, D. M. & Stoltzfus, A. Modeling evolution using the probability of fixation: history and implications. Quart. Rev. Biol. 89, 225–252 (2014).
Article PubMed Google Scholar
Kimura, M. On the probability of fixation of mutant genes in a population. Genetics 47, 713–719 (1962).
Article CAS PubMed PubMed Central Google Scholar
Giegerich, R., Voß, Björn & Rehmsmeier, M. Abstract shapes of RNA. Nucleic Acids Res. 32, 4843–4851 (2004).
Article CAS PubMed PubMed Central Google Scholar
Martin, N. S. & Ahnert, S. E. Insertions and deletions in the RNA sequence-structure map. J. R. Soc. Interface 18, 20210380 (2021).
Article CAS PubMed PubMed Central Google Scholar
Catalán, P., Arias, C. F., Cuesta, J. A. & Manrubia, S. Adaptive multiscapes: an up-to-date metaphor to visualize molecular adaptation. Biol. Direct 12, 7 (2017).
Article PubMed PubMed Central Google Scholar
Ogbunugafor, C. B., Wylie, C. S., Diakite, I., Weinreich, D. M. & Hartl, D. L. Adaptive landscape by environment interactions dictate evolutionary dynamics in models of drug resistance. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1004710 (2016).
Nichol, D., Robertson-Tessi, M., Anderson, AlexanderR. A. & Jeavons, P. Model genotype-phenotype mappings and the algorithmic structure of evolution. J. R. Soc. Interface 16, 20190332 (2019).
Article CAS PubMed PubMed Central Google Scholar
Diaz-Uriarte, R. Cancer progression models and fitness landscapes: a many-to-many relationship. Bioinformatics 34, 836–844 (2018).
Article CAS PubMed Google Scholar
Gabbutt, C. & Graham, T. A. Evolution’s cartographer: mapping the fitness landscape in cancer. Cancer Cell 39, 1311–1313 (2021).
Article CAS PubMed Google Scholar
Lau, KitFun & Dill, K. A. A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22, 3986–3997 (1989).
Article CAS Google Scholar
Li, H., Helling, R., Tang, C. & Wingreen, N. Emergence of preferred structures in a simple model of protein folding. Science 273, 666–669 (1996).
Article CAS PubMed Google Scholar
Weinreich, D. M. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
Article CAS PubMed Google Scholar
Ewens, W. J. Mathematical Population Genetics: Theoretical Introduction Vol. 1 (Springer, 2004).
Imhof, L. A. & Nowak, M. A. Evolutionary game dynamics in a Wright-Fisher process. J. Math. Biol. 52, 667–681 (2006).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

S.E.A. was supported by the Royal Society and the Gatsby Foundation. S.F.G. was supported by the Engineering and Physical Sciences Research Council. We thank M. Weiß for helpful discussions and insights.

Author information

Authors and Affiliations

Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK
Sam F. Greenbury
The Alan Turing Institute, British Library, London, UK
Sam F. Greenbury & Sebastian E. Ahnert
Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
Ard A. Louis
Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
Sebastian E. Ahnert

Authors

Sam F. Greenbury
View author publications
You can also search for this author in PubMed Google Scholar
Ard A. Louis
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian E. Ahnert
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.F.G., A.A.L. and S.E.A. conceived and designed the experiments. S.F.G. performed the experiments. S.F.G., A.A.L. and S.E.A. analysed the data. S.E.A. supervised the work. S.F.G., A.A.L. and S.E.A. wrote the paper.

Corresponding authors

Correspondence to Sam F. Greenbury, Ard A. Louis or Sebastian E. Ahnert.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Jacobo Aguirre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Depiction of the different biological systems, specific GP maps considered and example genotype, phenotype and encoding of phenotype.

Each row is a specific GP map included in this work and is situated within one of the four categories of system: RNA, Polyomino, HP (compact), and HP (non-compact). RNA and HP genotypes are depicted with distinct colours for their constituent bases. Polyomino genotypes are shown as numerical sequences that map to the edges of distinctly coloured tiles with arrows used to indicate the tile orientation. The corresponding phenotype (the structure that is formed following the self-assembly process on the example genotype) is shown with the colours and arrows used in the genotype depiction highlighting the mechanism by which bonds are formed. The encoding of the example phenotypes are shown in the final column: dot-bracket and shape notation for RNA, grid coordinates for tile placements of polyominoes, and the lattice directions for the HP lattice fold.

Extended Data Table 1 GP map properties and navigability estimates. All GP maps studied with their properties (base K, sequence length L, number of phenotypes N_P, proportion of genotypes with the deleterious phenotype f_del, average redundancy ${\log }_{10}R$, mean genotypic robustness $\left\langle {\rho }_{g}\right\rangle$) and estimate with standard error of navigability $\left\langle \psi \right\rangle \pm SE(\left\langle \psi \right\rangle )$. RNA, Polyomino and compact HP GP maps all have navigable fitness landscapes ($\left\langle \psi \right\rangle > 0.6$) under random fitness assignment. By contrast, non-compact HP models have very low navigability ($\left\langle \psi \right\rangle \le 0.013$)

Full size table

Extended Data Table 2 Terminology. A summary of terms and their representations used in the paper. The first column (left) provides the term used and its description, while the second column (right) has the corresponding mathematical symbol and equation where relevant

Full size table

Supplementary information

Supplementary Information.

Reporting Summary.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greenbury, S.F., Louis, A.A. & Ahnert, S.E. The structure of genotype-phenotype maps makes fitness landscapes navigable. Nat Ecol Evol 6, 1742–1752 (2022). https://doi.org/10.1038/s41559-022-01867-z

Download citation

Received: 21 September 2021
Accepted: 01 August 2022
Published: 29 September 2022
Issue Date: November 2022
DOI: https://doi.org/10.1038/s41559-022-01867-z

This article is cited by

Early detection of emerging viral variants through analysis of community structure of coordinated substitution networks
- Fatemeh Mohebbi
- Alex Zelikovsky
- Pavel Skums
Nature Communications (2024)
Life finds a way
- Jacobo Aguirre
Nature Ecology & Evolution (2022)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Well-studied GP maps induce navigable fitness landscapes

Common GP map properties are associated with navigability

Redundancy, deleterious frequency and genotypic robustness

Positive neutral correlations increase navigability

Large dimensionality increases navigability

Navigability of fRNA fitness landscapes

Fitness valleys are not observed between short fRNAs

Evolutionary dynamics between fRNAs use accessible paths

Discussion

Methods

Self-assembly GP maps

Measuring landscape navigability

Definitions and formulation

Fitness landscapes

Navigability estimation

Navigability estimation algorithm

Removing correlations

Restricting dimensionality

Measuring ruggedness

Navigability in the fRNAdb

Navigability estimation under evolutionary dynamics

Monomorphic evolutionary dynamics

Non-monomorphic evolutionary dynamics

Estimating evolutionary navigability

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links