## Introduction

There is a contradiction between major branches of modern evolutionary biology. On the one hand, fields such as behavioural and evolutionary ecology are based on the assumption that organisms will behave as if they are trying to maximise their fitness1,2,3,4. Models based on fitness maximisation are used to make predictions about the selective forces (reasons) for adaptation, and these are then tested empirically5,6. This approach has been phenomenally successful, explaining many aspects of behaviour, life history and morphology. For example, fitness maximisation underpins our evolutionary explanations of: foraging behaviour, resource competition, sexual selection, parental care, sex allocation, signalling and cooperation7,8,9,10,11,12.

On the other hand, there is considerable evidence for selfish genetic elements, which increase their own contribution to future generations at the expense of other genes in the same organism13,14,15,16,17. These selfish genetic elements may distort traits away from the values that would maximise individual fitness, to increase their own transmission14,18,19,20,21,22. Evidence for such genetic conflict has been found across the tree of life, from simple prokaryotes to complex animals. The contradiction is that selfish genetic elements mess up individual fitness maximisation, and appear to be common, but individual fitness maximisation still appears to occur17,23,24. This contradiction is especially apparent in the study of sex allocation: theoretical models based on individual fitness maximisation have explained a wide range of natural variation in sex ratio, and yet there have been many reported cases of selfish sex ratio distorters9,14,25,26,27.

Leigh28 provided a potential solution to this contradiction by suggesting that selfish genetic elements would be suppressed by the ‘parliament of genes’. Leigh’s argument was that, because selfish genetic elements reduce the fitness of most of the other genes in the organism, these other genes will have a united interest in suppressing selfish genetic elements. Furthermore, because these other genes are far more numerous, they will be likely to win the conflict. Consequently, even when there is considerable potential for conflict within individuals, we would still expect fitness maximisation at the individual level29,30,31,32,33,34. Leigh28 demonstrated the plausibility of his argument by showing theoretically how a suppressor of a sex ratio distorter could be favoured. Since then, numerous suppressors have been studied from a theoretical and an empirical perspective14,35,36.

However, several issues may affect the validity of the parliament of genes hypothesis. First, whether a suppressor spreads can depend upon biological details such as the extent to which a selfish genetic element is distorting a trait, the population frequency of that element and the cost of suppression14,37,38,39,40,41,42,43. Are certain types of selfish genetic elements, which cause substantial distortion, less likely to be suppressed? Second, if the spread of suppressors through populations is slow, and if selfish genetic elements arise continuously over evolutionary time, non-equilibrium trait distortion may be possible35. Third, selfish genetic elements are themselves also under evolutionary pressure to cause a level of trait distortion that would maximise their transmission to the next generation15. Could the evolution of selfish genetic elements lead to trait distortion that is less likely to be suppressed?32 Fourth, if a suppressor does not reach fixation in a population, or a selfish genetic element is not purged from a population, subsequent mating may decouple selfish genetic elements and suppressors to expose previously suppressed trait distortion38. How important is this problem of polymorphism likely to be?

We address these issues, by investigating the parliament of genes hypothesis theoretically. Our aim is to investigate the extent to which genetic conflict distorts traits away from the value that would maximise individual fitness. We find that: (i) the greater the level of trait distortion caused by a selfish genetic element, the more likely and the quicker it is suppressed; (ii) selection on selfish genetic elements leads towards greater trait distortion, making them more likely to be suppressed; (iii) in genome-wide arms races to gain control of organism traits, the majority interest within the genome generally prevails over ‘cabals of a few’, regardless of genome size, mutation rate, and the strength and sophistication of trait distorters. We find the same patterns with an illustrative model, and when examining three specific scenarios: selfish trait distortion of the sex ratio by an X chromosome driver; an altruistic helping behaviour encoded by an imprinted gene; and production of a cooperative public good encoded on a horizontally transmitted bacterial plasmid. Furthermore, we find close agreement when analysing scenarios with population genetic analyses and individual-based simulations. Our results suggest that even when there is potential for considerable genetic conflict, it has relatively little impact on traits at the individual level.

## Results

### Modelling approach

We examine conflict between two groups of genes within the genome. We assume a selfish genetic element that can gain a propagation advantage through distorting some trait of the organism (‘trait distorter’). This trait distortion only benefits alleles at a subset of loci within the genome—Leigh termed this subset of loci a ‘cabal’30. The rest of the genome, which does not gain the propagation advantage from the trait distortion, will be selected to suppress the trait distorter. Leigh termed this collection of genes, which will comprise most of the genome, and so will constitute the majority within the parliament of genes, the ‘commonwealth’30.

We used two complementary theoretical approaches. First, we developed ‘Equilibrium models’, where we assume that the trait distorter and their cabal are only a very small fraction of the genome. We allow for this by assuming that it is highly likely that a potential suppressor of a trait distorter can arise by mutation. Consequently, in these models, we focus our analyses on when a trait distorter and its suppressor can spread. We use this approach to examine, given the potential for suppression, what direction would we expect natural selection to take on average.

We then developed ‘Dynamics models’, where we relaxed the assumption that the trait distorter and its cabal are a negligible fraction of the genome. In this case, rather than focus on the equilibrium state, we allowed trait distorters and their suppressors to arise continuously, at different loci across the genome. This approach allows us to investigate the influence of factors such as genome size, mutation rate and cabal size. We use this approach to determine the outcome of an evolutionary conflict that embroils the whole genome, to elucidate how far an organism trait is likely to be distorted at any given point in evolutionary time.

### Equilibrium models

We assessed, given the potential for suppression, the extent to which a trait distorter will distort an organism trait away from the optimum for individuals. In order to elucidate the selective forces, we ask four questions in a step-wise manner, with increasing complexity:

1. (1)

In the absence of a suppressor, when can a trait distorter invade?

2. (2)

When can a costly suppressor of the trait distorter invade?

3. (3)

What are the overall consequences of trait distorter-suppressor dynamics for trait values, at the individual and population level, at evolutionary equilibrium and before equilibrium has been reached?

4. (4)

If the extent to which the trait distorter manipulates the organism trait can evolve, how will this influence the likelihood that it is suppressed, and hence the individual and population trait values?

We assume an arbitrary trait that influences organism fitness. In the absence of trait distorters, all individuals have the trait value that maximises their individual fitness. The trait distorter manipulates the trait away from the individual optimum, to increase their own transmission to offspring. We assume a large population of diploid, randomly mating individuals. The aim of this model is to establish key aspects of the population genetics governing trait distorters and their suppressors, in an abstract setting. In Supplementary Notes 3, 4 and 5, we address the same issues in three specific biological scenarios.

(1) Spread of a trait distorter: We consider a trait distorter, which we denote by D1, that is dominant and distorts an organism trait value by some positive amount k (k > 0). This trait distortion increases the transmission of the trait distorter to offspring. Specifically, the trait distorter (D1) drives at meiosis, in heterozygotes, against a trait non-distorter (D0), being passed into the proportion (1 + t(k))/2 of offspring. t(k) denotes the transmission bias (0 ≤ t(k) ≤ 1) and is a monotonically increasing function of trait distortion $$\left( {\frac{{{\mathrm{d}}t}}{{{\mathrm{d}}k}} \ge 0} \right)$$.

We emphasise that, in nature, trait distorters need not be meiotic drivers—the key point here is that we are considering when trait distortion increases the propagation of that trait distorter. We chose meiotic drive in this model for simplicity, and model different mechanisms in the biologically specific models (Supplementary Notes 3, 4 and 5). Indeed, in many natural cases, meiotic drivers would not gain their advantage by distorting a trait, in which case they would not enter any conflict with the rest of the genome over organism trait values, and therefore would not have any lasting influence on whether trait values are those that maximise individual fitness. For example, the segregation distorter (SD) meiotic driver in Drosophila melanogaster gains its advantage in heterozygous males by disrupting the proper development of rival sperm, and not by trait distortion44. Any organism-level fitness costs associated with SD would be opposed by SD as well as across the rest of the genome45. Our focus in this paper is on selfish genetic elements that gain an advantage by trait distortion, and therefore disagree with the majority of genes over trait values.

Trait distortion leads to a fitness (viability) cost (ctrait(k)) at the individual level, reducing an individual’s number of offspring from 1 to 1 − ctrait(k) (0 ≤ ctrait(k) ≤ 1). Owing to trait distorter dominance, the fitness cost of trait distortion is borne by heterozygous as well as trait distorter-homozygous individuals. The fitness cost is a monotonically increasing function of trait distortion $$\left( {\frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{{\mathrm{d}}k}} \ge 0} \right)$$. We assume that t(k) and ctrait(k) do not change with population allele frequencies, but relax this assumption in our specific models.

We first ask what frequency the trait distorter will reach in the population in the absence of suppression. If we take p and p′ as the population frequency of the trait distorter in two consecutive generations, then the population frequency of the trait distorter in the latter generation is:

$$\bar w\,p^\prime = (1 - c_{\mathrm{trait}}(k))\,(p^2 + (1 - p)p(t(k) + 1)),$$
(1)

where $$\bar w$$ is the average fitness of individuals in the population in the current generation, and can be written in full as: $$\bar w$$ = (1 − ctrait(k))(p2 + 2p(1 − p)) + (1 − p)2. In ‘Trait distorter population frequency’ in the Methods, we show, with a population genetic analysis of Eq. 1, that the trait distorter will spread from rarity and reach fixation when ctrait(k) < t(k)(1 ctrait(k)). This shows that trait distortion will evolve when the number of offspring that the trait distorter gains as a result of trait distortion (t(k)(1 − ctrait(k))) is greater than the number of offspring bearing the trait distorter that are lost as a result of reduced individual fitness (ctrait(k)).

(2) Spread of an autosomal suppressor: We assume that the trait distorter (D1) can be suppressed by an unlinked autosomal allele (suppressor), denoted by S1. We assume that this suppressor (S1) is dominant and only expressed in the presence of the trait distorter (facultative), but found similar results when the suppressor is constitutively expressed (obligate; Supplementary Note 6). Expression of the suppressor incurs a fitness cost to the individual, csup (0 ≤ csup ≤ 1), which could arise for multiple reasons, including energy expenditure, or errors relating to the use of gene silencing machinery46,47. Gene silencing generally precedes the translation of the targeted gene, and so we assume that the cost of suppression (csup) is independent of the amount of trait distortion caused by the trait distorter (k).

We can write recursions detailing the generational change in the frequencies of the four possible gametes, D0/S0, D0/S1, D1/S0 and D1/S1, with the respective frequencies in the current generation denoted by x00, x01, x10 and x11, and the frequencies in the subsequent generation denoted by an appended dash (′):

$$\begin{array}{l}\bar w\,x_{00}^{\prime} = x_{00}^2 + x_{00}x_{01} + \left( {1 - t} \right)\left( {1 - c_{\mathrm{trait}}} \right)x_{00}x_{10}\\ + \, \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{10},\end{array}$$
(2)
$$\begin{array}{l}\bar w\,x_{01}^{\prime} = x_{00}x_{01} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{11} + x_{01}^2\\ + \, \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{10} + \left( {1 - c_{\mathrm{sup}}} \right)x_{01}x_{11},\end{array}$$
(3)
$$\begin{array}{l}\bar w\,x_{10}^{\prime} = \left( {1 + t} \right)\left( {1 - c_{\mathrm{trait}}} \right)x_{00}x_{10} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{11}\\ + \, \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{10} + \left( {1 - c_{\mathrm{trait}}} \right)x_{10}^2 + \left( {1 - c_{\mathrm{sup}}} \right)x_{10}x_{11},\end{array}$$
(4)
$$\begin{array}{l}\bar w\,x_{11}^{\prime} = \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{01}x_{10} + \left( {1 - c_{\mathrm{sup}}} \right)x_{01}x_{11} + \left( {1 - c_{\mathrm{sup}}} \right)x_{10}x_{11} + \left( {1 - c_{\mathrm{sup}}} \right)x_{11}^2,\end{array}$$
(5)

where $$\bar w$$ is the average fitness of individuals in the current generation, and equals the sum of the equations’ right-hand sides. In ‘Suppressor invasion condition’ in the Methods, we show, with a population genetic analysis of these equations, that a suppressor will spread from rarity if trait distortion (k) is greater than some threshold value, at which the cost of suppression (csup) is less than the cost of being subjected to trait distortion, csup < ctrait(k). A threshold with respect to the level of trait distortion (k) arises because the cost of trait distortion (ctrait(k)) increases with greater trait distortion, but the cost of suppression (csup) is constant. Given that the individual cost of pre-translational suppression at a single locus is likely to be low46,47, trait distortion conferred by unsuppressed trait distorters is likely to be negligible.

(3) Consequences for organism trait values: The extent of trait distortion at the individual level shows a discontinuous relationship with the strength of the trait distorter (Fig. 1a). When trait distortion is low, a suppressor will not spread (csup > ctrait(k)) and so the level of trait distortion at the individual level will increase with the level of trait distortion induced by the trait distorter (k). However, once a threshold is reached (csup < ctrait(k)), the suppressor spreads. We show in ‘Equilibrium trait distorter and suppressor frequencies’ in the Methods that the spread of the suppressor (S1) causes the trait distorter (D1) to lose its selective advantage and be eliminated from the population, leading to an absence of trait distortion at the individual level. In contrast, we show in Supplementary Note 6 that if the suppressor is constitutively expressed (obligate), the spread of the suppressor (S1) to fixation in the population causes the trait distorter (D1) to become neutral, meaning the trait distorter (D1) can be maintained in the population without being expressed.

Overall, these results suggest that, given a relatively low cost of suppression (csup), the level of trait distortion observed at the individual level will either be low or absent. When a trait distorter is weak (low k), it will not be suppressed, but it will only have a small influence at the level of the individual. When a trait distorter is strong (high k), it will be suppressed and so there will be no influence at the level of the individual (Fig. 1a).

In addition, we found that stronger trait distorters are suppressed more quickly (Fig. 1b). In ‘Non-equilibrium trait distortion’ in the Methods, we numerically iterated our recursions to determine how many generations it takes for suppressors to reach equilibrium. As long as trait distortion continues to reduce individual fitness non-negligibly after suppression is favoured (such that $$\frac{{{\mathrm{d}}t}}{{{\mathrm{d}}k}}/\frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{{\mathrm{d}}k}}$$ is not excessively high after csup < ctrait(k)), stronger trait distorters (higher k) are suppressed and purged more rapidly than weaker trait distorters, limiting the potential for non-equilibrium trait distortion (Fig. 1b).

(4) Evolution of trait distortion: We then considered the consequence of allowing the level of trait distortion (k) to evolve. We assume a trait distorter (D1) that distorts by k, and then introduce a rare mutant (D2) that distorts by a different amount $$\hat k$$ ($$\hat k$$ ≠ k). This mutant (D2) is propagated into the proportion (1 + t($$\hat k$$) − t(k))/2 of the offspring of D2D1 heterozygotes, and into the proportion (1 + t($$\hat k$$))/2 of the offspring of D2D0 heterozygotes. We assume that the stronger of the two trait distorters is dominant, but found similar results when assuming additivity (‘Invasion of a mutant trait distorter’ in the Methods). We assume that the similarity in coding sequence and regulatory control means that the original trait distorter and the mutant are both suppressed by the same suppressor allele, at the same cost (csup)46,47. In ‘Invasion of a mutant trait distorter’ in the Methods, we write the recursions that detail the generational frequency changes in the different possible gametes (D0/S0, D0/S1, D1/S0, D1/S1, D2/S0 and D2/S1).

We found that stronger mutant trait distorters ($$\hat k$$ > k) will invade from rarity when the marginal increase in offspring they are propagated into exceeds the marginal increase in offspring they are lost from as a result of reduced fitness (Δt(1ctrait($$\hat k$$)) > Δctrait, where Δ denotes marginal change (Δt=t($$\hat k$$) − t(k); Δctrait = ctrait($$\hat k$$) − ctrait(k))). Consequently, if trait distortion is initially low, and successive mutant trait distorters are introduced, each deviating only slightly from the trait distorters from which they are derived (‘δ-weak selection’48), invading trait distorters will approach a ‘target’ strength, denoted by ktarget. This target strength corresponds to the level of trait distortion that would maximise the fitness of the gene15, and is when the marginal benefit of transmission is exactly counterbalanced by the marginal individual cost of reduced offspring, $$\frac{{{\mathrm{d}}t}}{{{\mathrm{d}}k}}\left( {1 - c_{\mathrm{trait}}} \right) = \frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{{\mathrm{d}}k}}$$. The target strength of trait distortion (ktarget) will therefore be greater if increased trait distortion (k) leads to a low rate of decrease in marginal transmission benefit $$\left( { - \frac{{{\mathrm{d}}^2t}}{{{\mathrm{d}}k^2}}} \right)$$ relative to the rate of increase in marginal individual cost $$\left( {\frac{{{\mathrm{d}}^2c_{\mathrm{trait}}}}{{{\mathrm{d}}k^2}}} \right)$$ (Fig. 2b). If mutations are larger (strong selection), invading trait distorters may overshoot the target strength of trait distortion ($$\hat k$$ > ktarget). Weaker mutant trait distorters ($$\hat k$$ < k) are recessive so cannot invade from rarity.

As evolution on the trait distorter increases the level of trait distortion, it makes it more likely that the trait distorter goes above the critical level of trait distortion where suppression will be favoured. When this is the case (csup < ctrait(ktarget)), the trait distorter spreads to high frequency, which then causes the suppressor to increase in frequency, reversing the direction of selection on the trait distorter, towards non-trait distortion (D0), resulting in 0 trait distortion at equilibrium (k* = 0) (Fig. 2a; ‘Equilibrium allele frequencies after mutant invasion’ in the Methods). Suppression only fails to spread if the individual fitness cost associated with suppression is greater than the individual fitness cost associated with the target trait distortion (csup > ctrait(ktarget); Fig. 2a). Given that the individual fitness cost of pre-translational suppression at a single locus is likely to be low, then any non-negligible trait distorter is likely to be suppressed.

Overall, our results suggest that selection on trait distorters will tend to lead to the eventual suppression of those trait distorters. In ‘Agent-based simulation (single trait distorter locus)’ in the Methods, we developed an agent-based simulation, which allowed us to continuously vary the level of both trait distortion and suppression, and obtained results in close agreement (Fig. 2a; Supplementary Note 2, Supplementary Fig. 2).

### Specific biological scenarios

In Supplementary Notes 3, 4 and 5, we tested the robustness of our above conclusions by developing models for three different biological scenarios: a sex ratio distorter on an X chromosome (X driver); an imprinted gene that is only expressed when maternally inherited; and a gene for the production of a public good by bacteria, which is encoded on a mobile genetic element14,26,36,49,50,51,52. We examined these cases because they are different types of trait distortion, involving different selection pressures, in very different organisms. In all three specific models, we obtained the same qualitative results as with our above illustrative model for an arbitrary trait (Fig. 3).

### Dynamics models

Our Equilibrium models assumed that the suppressor of any given trait distorter will arise quickly by mutation. This assumption becomes less likely if suppressors are complex and hard to evolve, or favoured across a reduced portion of the genome (smaller commonwealth). Also, multiple trait distorters and their suppressors may arise continually in populations, through evolutionary time, at different loci within the cabal and commonwealth respectively. Organisms may therefore never rest at equilibria where all trait distorters are suppressed or of negligible strength.

We address these issues by relaxing our assumption that the commonwealth is very large relative to the cabal, assuming instead that the commonwealth encompasses some majority of loci within the genome, with the remaining loci comprising the cabal. We examined the average and extremes of trait distortion produced by trait distorters and suppressors, by asking three further questions, of increasing complexity, in a step-wise manner:

1. (5)

To what extent are organism traits distorted when populations of individuals are only ever subjected to one segregating trait distorter at a time (no trait distorter co-segregation)?

2. (6)

To what extent are organism traits distorted when populations of individuals may be exposed to multiple, co-segregating, interacting trait distorters?

3. (7)

To what extent are organism traits distorted when the strength of each trait distorter may evolve?

(5) Trait distortion when no trait distorter co-segregation: We model a population of individuals, each with a genome size of γ loci. Within this genome, the cabal constitutes a fraction θ of all loci, and the commonwealth constitutes the remaining fraction 1 − θ of all loci. If a fraction of the genome is inherited in the same way, such that it favours the same trait values (same maximand), it is termed a ‘coreplicon’20,22. The cabal comprises all coreplicons that favour the distortion of a particular trait, along a particular axis, in a particular direction, away from individual fitness maximisation. The commonwealth comprises the remaining replicons. Cabals and commonwealths are therefore trait-specific. It is useful, when analysing a specific trait, to partition the genome along these lines, because it is this conflict—between the cabal and commonwealth—that drives the evolution of the trait value.

Cabals and commonwealths are defined a priori, by partitioning and summing up the coreplicons that, respectively, disfavour and favour the trait distortion under study. The ‘individual’ is the majority interest within the genome, and so the cabal size can never exceed more than half of the genome, because then it would be the majority (θ ≤ 0.5)53. In Supplementary Note 8, we calculate some real-world proportional cabal sizes (θ) by dividing the number of genes in a cabal by the total number of genes in a genome. In Drosophila melanogaster, a Y chromosome cabal, which favours male biased sex ratio distortion, has a proportional size of ~θ ≈ 0.00154,55. In human females, a cabal comprising cytoplasmic elements as well as the X chromosomes, which favours female-biased sex ratio distortion, has a proportional size of ~θ ≈ 0.0456,57,58. In Escherichia coli, a cabal made up of horizontally transferrable plasmids, which could favour upregulated public goods production49, varies in size across strains, but has an average of ~θ ≈ 0.036.

For analytical tractability, we start by assuming that new trait distorters and suppressors are introduced at a fixed rate (deterministic). Biologically, new trait distorters and suppressors are likely to arise via some combination of de novo mutation and the acquisition, via gene conversion or transposition, of pre-existing sequences contributing to trait distortion or suppression35,59,60. We assume that a trait distorter arises at a new locus within the cabal every $$1/(\theta\gamma\rho_{D_1})$$ generations, and its dedicated suppressor arises at a locus inside the commonwealth $$1/((1 - \theta)\gamma\rho_{S_1})$$ generations afterwards. $$\rho_{D_1}$$ and $$\rho_{S_1}$$, respectively, give the generational per-locus probabilities of generating new trait distorters and suppressors. These probabilities ($$\rho_{D_1}$$;$$\rho_{S_1}$$) increase linearly, according to the same gradient, as the baseline mutation rate in the genome, denoted by ρ, is increased.

As in our equilibrium models, we assume that unsuppressed trait distorters distort organism traits by the fixed amount k, at an individual cost ctrait(k), gaining a meiotic transmission advantage in heterozygotes of (1 + t(k))/2. Similarly, we again assume that suppressors are dominant and completely suppress their target trait distorters at the cost csup, and are facultatively expressed in the presence of their target trait distorter5,6,7,8. We assume that the trait distortion experienced by an organism is given by the strength of its strongest unsuppressed trait distorter (inter-locus dominance).

We emphasise again that the mechanism by which the trait distorter gains its advantage (meiotic drive) is chosen here purely for illustrative purposes (see Supplementary Notes 3, 4 and 5 for different mechanisms). We are interested in the subset of selfish genetic elements that gain their selfish benefit by distorting a trait away from the value that maximises individual fitness. The same trait distortion would be favoured across the coreplicon/cabal of which these selfish genetic elements are a part. This contrasts with selfish genetic elements that gain a selfish benefit through their ability to be meiotic drivers, without distorting a trait—such drivers could conceivably arise at any locus in a genome. The key difference here is between meiotic drive (could be favoured at any locus; selfish benefit does not arise via distorting a trait) and selfish genetic elements that gain a benefit by distorting a trait (the specific examples that we consider and model in this paper)14,15.

We calculate the average and extremes of trait distortion faced by organisms in the population across evolutionary time, for different trait distorter strengths (k), and different proportional cabal sizes (θ). Considering trait distorters that do not trigger suppressor invasion (csup > ctrait(k)), the average trait distortion is trivially given by the strength of the trait distorters available to the cabal (k). Considering trait distorters that are suppressed and purged at equilibrium (csup < ctrait(k)), for analytical tractability, we first consider parameter regimes in which trait distorters are introduced at new loci more slowly than they are purged at old loci, meaning they do not co-segregate.

In ‘Long-term trait distortion (exact numerical solution)’ in the Methods, we develop a population genetic model based on these assumptions, and solve it numerically to show that individual trait distortion increases and decreases cyclically over evolutionary time, ranging between peaks of k and troughs of 0, as new trait distorters and suppressors advance and retreat through the population (Fig. 4a). In ‘Long-term trait distortion (analytical approximation)’ in the Methods, we show that the average trait distortion over these cycles is given by

$$\begin{array}{*{20}{c}} {\frac{{k\theta \rho _{D_1}}}{{\left( {{\mathrm{1 - }}\theta } \right)\rho _{S_1}}},} \end{array}$$
(6)

by making the assumption that the rate of gene frequency equilibration after trait distorter/suppressor introduction is very fast relative to the rate of trait distorter/suppressor introduction (separation of timescales). For our three specific biological scenarios (Supplementary Notes 3, 4 and 5), the rate of gene frequency equilibration after trait distorter/suppressor introduction varies in each scenario, but these details are inconsequential when the separation of timescales assumption is made, meaning average trait distortion is given by Eq. 6 in each of the three specific biological scenarios. Furthermore, we also found with numerical analysis that Eq. 6 is a good approximation, even when the separation of timescales is relaxed (Fig. 4b).

Smaller proportional cabal sizes (θ) lead to a slower rate of trait distorter introduction relative to suppressor introduction, and so both: (i) an absolute reduction in average trait distortion; and (ii) a reduced effect of distorter strength (k) on average trait distortion (k − θ interaction) (Fig. 4b). In the limit of negligible proportional cabal size (θ → 0), we recover the result from our Equilibrium models that the proportion of evolutionary time in which a trait distorter is present approaches 0, leading to an average trait distortion of 0 for trait distorters above the threshold of suppression (csup < ctrait(k)).

Both genome size (γ) and baseline mutation rate (ρ) have no influence on the average trait distortion. Increases in both of these factors leads to a proportional increase in trait distorter introduction rate, and the same proportional increase in suppressor introduction rate, which exactly cancel (Supplementary Note 7, Supplementary Fig. 11).

(6) Trait distortion when trait distorters may co-segregate: We then considered the possibility that different trait distorters may co-segregate for some periods of evolutionary time59,60. In ‘Agent-based simulation (multiple loci; discrete)’ in the Methods, we developed an agent-based simulation that allowed us to investigate the scenario where mutations appear stochastically rather than deterministically. When an individual contains multiple trait distorters, we assume that extent of trait distortion is determined by the strongest trait distorter (inter-locus dominance).

The consequence of allowing trait distorters to co-segregate will depend on mechanistic assumptions about how trait distorters and suppressors act and interact. To capture different ends of the continuum of possibilities, we model two different types of trait distorter, which we term low-sophistication (D1L) and high-sophistication (D1H) (Supplementary Note 7, Supplementary Fig. 12). High-sophistication trait distorters are only suppressed by dedicated suppressors that evolved to suppress that specific trait distorter, and incur a low cost when inter-locus recessive. In contrast, low-sophistication trait distorters can be suppressed to some extent by any suppressor (background or generalist suppression)35,59,60, and incur a high cost when inter-locus recessive. High-sophistication trait distorters are more functionally complex, and so are likely to be less mutationally accessible than low-sophistication trait distorters.

We found that, for a sufficiently small proportional cabal size (θ → 0), trait distorters scarcely co-segregate, and Eq. 6 is recovered. Consequently, for sufficiently small proportional cabal sizes, the average level of trait distortion is again not influenced by genome size (γ), mutation rate (ρ), or the mechanics of trait distorter interaction (D1L/D1H).

In contrast, with larger cabals (θ → 0.5), trait distorters often co-segregate. In this case, the details of genome size (γ), mutation rate (ρ), and trait distorter sophistication (D1L/D1H) matter. Specifically, trait distortion may be: (i) greater than Eq. 6 if trait distorters are high sophistication (D1H); (ii) lower than Eq. 6 if trait distorters are low sophistication (D1L). The deviation from Eq. 6 is exaggerated for increased trait distorter co-segregation, which is promoted by: (i) high genome size (γ)/mutation rate (ρ) (Fig. 5); (iii) low trait distorter strength (k), which causes trait distorters to be purged more slowly (Supplementary Note 7, Supplementary Fig. 14); (iv) low trait distorter sophistication (D1L), which increases the mutational accessibility of trait distorters. The proportional cabal sizes that make these different factors matter are, however, much larger than we generally find in nature.

(7) Evolution of trait distortion and suppression: We then examined the consequences of allowing the level of trait distortion and suppression to evolve freely at each locus15. In ‘Agent-based simulation (multiple loci; continuous)’ in the Methods, we generalised our agent-based simulation to allow for this, and found that trait distorters evolve increased trait distortion (approaching ktarget) while unsuppressed (Supplementary Note 7, Supplementary Fig. 15). Stronger trait distorters are suppressed and purged more quickly than weaker ones, and are less likely to co-segregate as a result. Consequently, when evolution is permitted at trait distorter loci, average trait distortion again approaches that predicted by Eq. 6, so is less influenced by genome size (γ), mutation rate (ρ), and the mechanics of trait distorter interaction (D1L/D1H).

## Discussion

We obtained three main results: First, larger trait distortions are more likely to be suppressed. Consequently, trait distorters will either lead to small trait distortions, with minor fitness consequences, or be suppressed (Figs. 1a and 3a–c). Second, selection on trait distorters favours the evolution of higher levels of trait distortion, which will favour their suppression. Consequently, trait distorters will evolve to bring about their own demise (Figs. 2, 3d–f and 6). Third, if trait distortion is favoured at only a small proportion of the genome (proportionally small cabals), the extent of trait deviation away from the individual level optima is low and unaffected by factors, such as genome size, mutation rate and mechanism of trait distortion (Figs. 4 and 5). The reason for this result is that the influence of all of these factors is determined by proportional cabal size. Overall, these results suggest that even if there is substantial potential for genetic conflict, trait distorters will have relatively little influence at the individual level, in support of Leigh’s28 parliament of genes hypothesis.

Suppressing trait distorters: We have shown that suppressors spread when the cost of suppression is lower than the fitness cost imposed by trait distortion (ctrait(k) > csup). The individual fitness cost of pre-translational suppression at a single locus is likely to be low. For example, a molecularly characterised suppressor (nmy) destroys the messenger RNA transcripts of a sex ratio distorter (Dox) via RNA interference (RNAi), the costs of which are likely to be negligible at the individual level46,47,60,61. Consequently, in order to not be suppressed, a trait distorter would have to have relatively negligible influence on a trait, or influence a trait that has a negligible influence on fitness. Furthermore, we also showed that selection on trait distorters will often favour higher level trait distortion, bringing trait distorters into the region where ctrait(k) > csup, and hence where suppression is favoured (Figs. 2, 3 and 6).

Our analyses have focused on selfish genetic elements that increase their own transmission by manipulating some organism trait in a specific direction15,17. Examples include the sex ratio distorters and public goods genes considered in our specific models. We focused on such ‘trait distorters’ because they can have substantial influences on the traits of organisms, even when at fixation. In contrast, we have not considered selfish genetic elements, such as transposons and meiotic drivers, that do not need to manipulate organism traits in order to give themselves a selfish propagation advantage43. We have not considered such selfish genetic elements because: (i) they do not distort traits away from individual maxima; and (ii) the cost of such drivers makes them disfavoured across the entire genome, leading to selection to attenuate that cost.

Our Dynamics models have validated various verbal arguments that have previously been made for the parliament of genes hypothesis. We found that, if trait distortion is only favoured across a small proportion of the genome (proportionally small cabal), the trait distortion experienced by individuals is likely to be low, and unaffected by details such as genome size, mutation rate and mechanism of trait distortion. Empirically, cabals typically comprise small proportions of genomes54,56. Furthermore, more sophisticated trait distorters, with the potential to interact synergistically with each other, are likely to have a lower mutational accessibility, and so are more likely to be suppressed and purged before they have a chance to co-segregate. Real-world examples of trait distortion are typically caused by lone genes, or genes that do not interact synergistically14,60. In contrast, complex adaptations are typically underpinned by multitudes of synergistically interacting genes residing in the parliamentary majority (commonwealth)23.

We are not claiming that appreciable trait distortion will never evolve, or that biological details will never matter14,32,59,60. Instead, our results suggest that the modal outcome will be a relative lack of trait distortion. This conclusion is supported empirically by cases where appreciable distortion is only revealed in hybrid crosses, implying that trait distorters are generally suppressed62. Furthermore, we find that, after suppression has evolved, trait distorters are generally purged from the population at equilibrium. If suppressors are constitutively expressed (obligate), trait distorters are not purged from the population, but in these cases, suppressors spread to fixation (Supplementary Note 6). Regardless of the extent to which suppressors are constitutive, there is negligible polymorphism in at least one locus, meaning trait distortion is unlikely to be revealed by mating within a population38. When trait distorters are not purged from the population, trait distortion will be revealed by matings between populations/species62.

Sex ratio distorters as a case study: The relatively large literature on sex ratio distorters offers a chance for us to assess the validity of our models, and their predictions. In Supplementary Note 3, we detail how our assumptions are consistent with the biology of sex ratio distorters and their suppressors. For example, X drivers increase their own transmission by killing Y bearing sperm, and hence producing a female-biased offspring sex ratio. This comes at a cost to the rest of the genome through both a reduction in sperm number, and through Fisherian selection disfavouring the more common sex (females). The scope of the parliament of genes to act against such drivers is shown by the fact that, in most species in which an X driver is present, suppressors have been found on both the autosomes and the Y chromosome36. Our assumptions about how suppressors act, and the cost of suppression, are analogous to those in a molecularly characterised suppressor (Nmy) of a sex ratio distorter (Dox)46,60,61; and more generally to suppressors that act pre-translationally63,64.

Our model predictions are consistent with the available data on X drivers in Drosophila. As predicted by our model: (1) Across natural populations of Drosophila simulans, there is a positive correlation between the extent of sex ratio distortion and the extent of suppression65. (2) In both Drosophila mediopunctata and D. simulans the presence of an X-linked driver led to the experimental evolution of suppression66,67. In addition, consistent with our model: (3) In natural populations of D. simulans, the prevalence of an X driver has been shown to sometimes decrease under complete suppression68. (4) Crossing different species of Drosophila has been shown to lead to appreciable sex ratio deviation, by unlinking trait distorters from their suppressors, and hence revealing previously hidden trait distorters62. Work on other sex ratio distorters has also shown that suppressors can spread extremely quickly from rarity, reaching fixation in as little as ~5 generations69.

Individual fitness maximisation: We emphasise that when the assumption of individual fitness maximisation is made in behavioural and evolutionary ecology, it is not being assumed that natural selection produces perfect fitness maximisers5. Many factors could constrain adaptation, such as genetic architecture, mutation and phylogenetic constraints70,71. Instead, the assumption of fitness maximisation is used as a basis to investigate the selective forces that have favoured particular traits (adaptations). The aim is not to test if organisms maximise fitness, or behave ‘optimally’, but rather to try to understand the selective forces favouring particular traits or behaviours2. We have examined how the parliament of genes prevents selfish genetic elements from constraining adaptation, focusing on the maintenance, rather than the emergence, of traits (Supplementary Discussion).

To conclude, debate over the validity of assuming individual level fitness maximisation has usually revolved around whether selfish genetic elements are common or rare4,20,21,24,72. We have shown that that even if selfish genetic elements are common, they will tend to be either weak and negligible, or suppressed. This suggests that even if there is the potential for appreciable genetic conflict, individual level fitness maximisation will still often be a reasonable assumption. This allows us to explain why certain traits, especially the sex ratio, have been able to provide such clear support for both individual level fitness maximisation and genetic conflict9.

## Methods

### Trait distorter population frequency

We ask when a rare trait distorter (D1) can invade a population fixed for the trait non-distorter (D0). We take Eq. 1, set p′ = p = p*, and solve to find two possible equilibria: p* = 0 (trait non-distorter fixation) and p* = 1 (trait distorter fixation). The trait distorter (D1) can invade from rarity when the p* = 0 equilibrium is unstable, which occurs when the differential of p′ with respect to p, at p* = 0, is >1. The trait distorter invasion criterion is therefore ctrait(k) < t(k)(1 − ctrait(k)).

We now ask what frequency the trait distorter (D1) will reach after invasion. The trait distorter (D1) can spread to fixation if the p* = 1 equilibrium is stable, which requires that the differential of p′ with respect to p, at p* = 1, is <1. This requirement always holds true, demonstrating that there is no negative frequency dependence on the trait distorter, and that it will always spread to fixation after its initial invasion.

### Suppressor invasion condition

We ask when the suppressor (S1) can spread from rarity in a population in which the trait distorter (D1) and non-suppressor (S0) are fixed at equilibrium. We derive the Jacobian stability matrix for this equilibrium, which is a matrix of each genotype frequency (x00 ′ , x01′ , x10′ , x11′ ) differentiated by each genotype frequency in the prior generation (x00, x01, x10, x11), at the equilibrium position given by x00* = 0, x01* = 0, x10* = 1, x11* = 0:

$$J = \left( {\begin{array}{*{20}{c}} {1 - t} & {\frac{{1 - c_{\mathrm{sup}}}}{{2(1 - c_{\mathrm{trait}})}}} & 0 & 0 \\ 0 & {\frac{{1 - c_{\mathrm{sup}}}}{{2(1 - c_{\mathrm{trait}})}}} & 0 & 0 \\ {t - 1} & {\frac{{ - 3(1 - c_{\mathrm{sup}})}}{{2(1 - c_{\mathrm{trait}})}}} & 0 & {\frac{{ - (1 - c_{\mathrm{sup}})}}{{1 - c_{\mathrm{trait}}}}} \\ 0 & {\frac{{1 - c_{\mathrm{sup}}}}{{2(1 - c_{\mathrm{trait}})}}} & 0 & {\frac{{1 - c_{\mathrm{sup}}}}{{1 - c_{\mathrm{trait}}}}} \end{array}} \right),$$
(7)

The suppressor can invade when the equilibrium is unstable, which occurs when the leading eigenvalue is greater than one. The leading eigenvalue is (1 − csup)/(1 − ctrait), meaning the suppressor invasion criterion is ctrait > csup.

### Equilibrium trait distorter and suppressor frequencies

We ask what frequency the trait distorter (D1) and suppressor (S1) will reach after initial suppressor (S1) invasion. We assume that the suppressor is introduced from rarity when the trait distorter has reached the population frequency given by f (x00 → f, x10 → 1 − f, {x01,x11} → 0). We numerically iterate Eqs. 25, over successive generations, until equilibrium has been reached. At equilibrium, for all parameter combinations (f, t,csup,ctrait), the suppressor reaches an internal equilibrium and the trait distorter is lost from the population (x00* + x01* = 1, x10* = 0, x11* = 0). This equilibrium arises because trait distorter presence gives the suppressor (S1) a selective advantage, leading to high suppressor frequency, which in turn reverses the selective advantage of the trait distorter (D1), leading to trait distorter loss and suppressor equilibration.

### Non-equilibrium trait distortion

We consider a trait distorter that is suppressed and therefore purged at equilibrium (ctrait > csup), and ask to what extent it can contribute to individual trait distortion in the period after its initial invasion, but before its eventual loss (non-equilibrium). We introduce the trait distorter (D1) and suppressor (S1) from rarity and numerically iterate our recursions until the trait distorter has been purged from the population (or a cap of 20,000,000 generations has been reached). We vary parameters between 0 ≤ t ≤ 1, csup < ctrait ≤ 1, 0 ≤ csup ≤ 1.

We find that a higher cost of trait distortion (ctrait) relative to suppression (csup) leads to shorter non-equilibrium maintenance of the trait distorter in the population. This is because the cost of trait distortion relative to suppression mediates selection on the suppressor (Methods: ‘Suppressor invasion condition’). We find that a higher transmission bias (t) leads to longer non-equilibrium maintenance of the trait distorter in the population, but this effect is diluted as the cost of trait distortion (ctrait) is increased relative to suppression (csup) (Supplementary Note 2, Supplementary Fig. 1). Stronger trait distorters (with higher k, leading to higher ctrait and t) are therefore generally suppressed and purged more rapidly than weaker trait distorters (Fig. 1b). Exceptions are trait distorters that reduce individual fitness relatively negligibly after the point (k) at which suppression is favoured, such that $$\frac{{{\mathrm{d}}t}}{{{\mathrm{d}}k}}/\frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{{\mathrm{d}}k}}$$ is very high for values of k satisfying csup < ctrait(k).

### Invasion of a mutant trait distorter

We ask when a mutant trait distorter (D2) will invade against a resident trait distorter (D1) that is unsuppressed and at fixation (k ≠ $$\hat k$$). We write recursions detailing the generational frequency changes in the six possible gametes, D0/S0, D0/S1, D1/S0, D1/S1, D2/S0, D2/S1, with current generation frequencies denoted, respectively by x00, x01, x10, x11, x20, x21, and next-generation frequencies denoted with an appended dash (′):

$$\begin{array}{l}\bar w\,x_{00}^{\prime} = x_{00}x_{00} + x_{00}x_{01} + (1 - t(k))(1 - c_{\mathrm{trait}}(k))x_{00}x_{10}\\ + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{11} + (1 - t(\hat k))(1 - c_{\mathrm{trait}}(\hat k))\\ x_{00}x_{20} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{21} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{01}x_{10} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{20},\end{array}$$
(8)
$$\begin{array}{l}\bar w\,x_{01}^{\prime} = x_{00}x_{01} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{00}x_{21} + x_{01}x_{01} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{10} + \left( {1 - c_{\mathrm{sup}}} \right)x_{01}x_{11}\\ + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{20} + \left( {1 - c_{\mathrm{sup}}} \right)x_{01}x_{21},\end{array}$$
(9)
$$\begin{array}{l}\bar w\,x_{10}\prime = (1 + t(k))(1 - c_{\mathrm{trait}}(k))x_{00}x_{10} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{00}x_{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{10} + (1 - c_{\mathrm{trait}}(k))\\ x_{10}x_{10} + \left( {1 - c_{\mathrm{sup}}} \right)x_{10}x_{11} + (1 + t(k) - t(\hat k))\\ (1 - c_{\mathrm{trait}}(\mathrm{max}(k,\hat k)))x_{10}x_{20} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{10}x_{21}\\ + \, \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{11}x_{20},\end{array}$$
(10)
$$\begin{array}{l}\bar w\,x_{11}\prime = \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00} \times _{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{01}x_{10} + \left( {1 - c_{\mathrm{sup}}} \right)x_{01}x_{11} + \left( {1 - c_{\mathrm{sup}}} \right)\\ x_{10}x_{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{10}x_{21}\\ + \, \left( {1 - c_{\mathrm{sup}}} \right)x_{11}x_{11} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{11}x_{20} + \left( {1 - c_{\mathrm{sup}}} \right)x_{11}x_{21},\end{array}$$
(11)
$$\begin{array}{l}\bar w\,x_{20}\prime = (1 + t(\hat k))(1 - c_{\mathrm{trait}}(\hat k))x_{00}x_{20} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{00}x_{21} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{01}x_{20} + (1 - t(k) + t(\hat k))\\ (1 - c_{\mathrm{trait}}({\mathrm{max}}(k,\hat k)))x_{10}x_{20} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{10}x_{21} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{11}x_{20} + (1 - c_{\mathrm{trait}}(\hat k))\\ x_{20}x_{20} + \left( {1 - c_{\mathrm{sup}}} \right)x_{20}x_{21},\end{array}$$
(12)
$$\begin{array}{l}\bar w\,x_{21}\prime = \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{00}x_{21} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{01}x_{20} + \left( {1 - c_{\mathrm{sup}}} \right)x_{01}x_{21} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)x_{10}x_{21} + \left( {\left( {1 - c_{\mathrm{sup}}} \right)/2} \right)\\ x_{11}x_{20} + \left( {1 - c_{\mathrm{sup}}} \right)x_{11}x_{21} + \left( {1 - c_{\mathrm{sup}}} \right)x_{20}x_{21} + \left( {1 - c_{\mathrm{sup}}} \right)x_{21}x_{21},\end{array}$$
(13)

where $$\bar w$$ is the average fitness of individuals in the current generation, and equals the sum of the right-hand side of the system of equations. The mutant trait distorter can invade when the equilibrium given by x00* = 0, x01* = 0, x10* = 1, x11* = 0, x20* = 0, x21* = 0 is unstable, which occurs when the leading eigenvalue of the Jacobian stability matrix for this equilibrium is >1. Testing for stability in this way, we find that, if the mutant trait distorter is weaker than the resident, it can never invade. If the mutant trait distorter is stronger than the resident, it invades from rarity when Δt(1 − ctrait($$\hat k$$)) > Δctrait, where Δt = t($$\hat k$$) − t(k), Δctrait = ctrait($$\hat k$$) − ctrait(k).

The implication is that, if trait distortion is initially low, and mutant trait distorters are successively introduced, each deviating only very slightly from the resident trait distorter from which they are derived, such that $$\hat k$$ = k ± δ, where δ is very small (‘δ-weak selection’48), then trait distorters will approach a ‘target’ strength at which $$\frac{{{\mathrm{d}}t}}{{{\mathrm{d}}k}}\left( {1 - c_{\mathrm{trait}}} \right) = \frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{{\mathrm{d}}k}}$$. In the absence of suppression, this target (ktarget) is the equilibrium level of trait distortion (k* = ktarget). However, if mutant trait distorters (D2) are allowed to deviate appreciably from residents (D1) (strong selection), then trait distorters may invade even if they overshoot the target ($$\hat k$$ > ktarget). In the absence of suppression, ktarget is then not the equilibrium level of trait distortion, but rather, the minimum equilibrium level of trait distortion (k* ≥ ktarget) (Supplementary Note 2, Supplementary Fig. 2b).

We could alternatively have assumed that an individual’s trait is distorted according to the average strength of its alleles (additive gene interactions), rather than according to the stronger (higher k) allele (dominance). Such an assumption leads to a single invasion criterion for a mutant trait distorter, regardless of whether the mutant trait distorter is stronger or weaker than the resident trait distorter, given by: Δt(2 ctrait(k) − ctrait($$\hat k$$)) > Δctrait. In the absence of suppression, this leads to an equilibrium level of trait distortion (k*), which holds even under strong selection, and satisfies $$2\frac{{{\mathrm{d}}t}}{{{\mathrm{d}}k}}\left( {1 - c_{\mathrm{trait}}} \right) = \frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{{\mathrm{d}}k}}$$.

### Equilibrium allele frequencies after mutant invasion

We ask what equilibrium state will arise after the invasion of a mutant trait distorter. We assume that the mutant trait distorter (D2) is introduced from rarity when the resident trait distorter (D1) has reached the population frequency given by q. We numerically iterate Eq. 813, over successive generations, until equilibrium has been reached. At equilibrium, for all parameter combinations (q, t(k), t($$\hat k$$), csup, ctrait(k), ctrait($$\hat k$$)), the resident trait distorter (D1) is lost from the population (x10,x11 = 0), with either the mutant trait distorter (D2) and non-suppressor (S0) at fixation (x20* = 1), or the trait non-distorter (D0) at fixation alongside the suppressor (S1) at an internal equilibrium (x00* + x01* = 1). The latter scenario arises if the mutant trait distorter triggers suppressor invasion (csup < ctrait($$\hat k$$)). This equilibrium arises because mutant trait distorter presence gives the suppressor (S1) a selective advantage, leading to high suppressor frequency, which in turn reverses the selective advantage of trait distortion, leading to trait distorter (D1,D2) loss and suppressor equilibration.

### Agent-based simulation (single trait distorter locus)

We construct an agent-based simulation to ask what level of trait distortion evolves when continuous variation is permitted at trait distorter and suppressor loci. We model a population of N = 2000 individuals and track evolution at two autosomal loci: a trait distorter locus and a suppressor locus. Each individual has two alleles at the trait distorter locus, with strengths denoted by ka and kb, and two alleles at the suppressor locus, with strengths denoted by ma and mb (diploid). Strengths can take any continuous value between 0 and 1. We assume that, for both loci, the strongest (highest value) allele within an individual is dominant. The absolute fitness of an individual with at least one active meiotic driver (max(ka,kb) > 0) is: 1 − ctrait(max(ka,kb))(1 − max(ma,mb)) − csupmax(ma,mb), and the absolute fitness of an individual lacking an active trait distorter (max(ka,kb) = 0) is 1. The function ctrait(max(ka,kb)) is given an explicit form in simulations (Supplementary Note 2, Supplementary Fig. 2).

In each generation, there are N breeding pairs. To fill each position in each breeding pair, individuals are drawn from the population, with replacement, with probabilities given by their fitness (hermaphrodites). Breeding pairs then reproduce to produce one offspring, before dying (non-overlapping generations). Alleles at the suppressor locus are inherited in Mendelian fashion. Alleles at the trait distorter locus may drive, meaning the parental allele of strength ka is inherited, rather than the allele of strength kb, with the probability (1 + (t(ka) − t(kb))(1 − max(ma,mb)))/2. The transmission bias function, t, is given an explicit form in simulations (Supplementary Note 2, Supplementary Fig. 2). Each generation, trait distorter and suppressor alleles have a 0.01 chance of mutating to a new value, which is drawn from a normal distribution centred around the pre-mutation value, with variance 0.2, and truncated between 0 and 1. We track the population average trait distorter strength, denoted by E[k], and suppressor strength, denoted by E[m], over 20,000 generations. We see that, allowing for continuous variation at the trait distorter and suppressor loci, if the cost of suppression (csup) is not excessively high, trait distortion at equilibrium is either low or nothing (Fig. 2a; Supplementary Note 2, Supplementary Fig. 2b).

### Long-term trait distortion (exact numerical solution)

We ask how the trait distortion experienced by organisms changes across evolutionary time as new trait distorters and suppressors are continuously introduced and lost from a population. We construct a population genetic model and solve it numerically and exactly. We introduce a trait distorter from rarity and iterate our recursion for an unsuppressed trait distorter (Eq. 1) from T = 1 to $$T=1/((1-\theta)\gamma\rho_{S_1})$$ generations. During this period, the trait distortion experienced by individuals rises to a peak of k, corresponding to the strength of trait distorters available to the cabal. We then introduce a suppressor from rarity and iterate our recursions for trait distorter-suppressor co-segregation (Eqs. 25), from $$T=1/((1-\theta)\gamma\rho_{S_1})$$ until the trait distorter has been purged (T = X). During this period, the trait distortion experienced by individuals falls to a trough of 0.

Average trait distortion over evolutionary time is given by weighting average trait distortion during the interval T = {1, 2, …, X} by the proportion of evolutionary time in which a trait distorter is segregating in the population $$(X(\theta\gamma\rho_{D_1}))$$. This methodology provides exact, numerical values for average trait distortion. These values correspond closely to the analytical approximation for average trait distortion (Eq. 6), which is derived under a separation of timescales assumption (Methods: ‘Long-term trait distortion (analytical approximation)’; Fig. 4).

### Long-term trait distortion (analytical approximation)

When a trait distorter is initially introduced into the population, it will spread, and the population will equilibrate when the trait distorter reaches fixation (Methods: ‘Long-term trait distortion (exact numerical solution)’). Similarly, when a suppressor is initially introduced into the population, it will spread if its target trait distorter is sufficiently costly (csup < ctrait(k)), and the population will equilibrate when the suppressor’s target trait distorter is purged from the population (Methods: ‘Long-term trait distortion (exact numerical solution)’). We assume that, after the introduction of a new trait distorter or suppressor, the rate at which gene frequencies equilibrate is very fast relative to the rate at which new trait distorters and suppressors are introduced at new loci (separation of timescales).

On this assumption, we can partition evolutionary time into two repeating periods. In the first period, comprising the $$1/((1-\theta)\gamma\rho_{S_1})$$ generations in between trait distorter and suppressor introduction, individual trait distortion is k. In the second period, comprising the following $$1/(\theta\gamma\rho_{D_1})-1/((1-\theta)\gamma\rho_{S_1})$$ generations, and ending when the next trait distorter is introduced at a new locus, individual trait distortion is 0. We average over these two time periods to calculate the average trait distortion experienced by individuals across evolutionary time (Eq. 6).

### Agent-based simulation (multiple loci; discrete)

We build on the agent-based model detailed in Methods: ‘Agent-based simulation (single trait distorter locus)’ to capture the evolutionary dynamics of arbitrarily large numbers of co-segregating trait distorters and suppressors across the genome. The specific details of how mate partners are attributed (e.g. panmictic; hermaphrodite), and how the population is sampled to implement fitness effects (e.g. non-overlapping generations), are fully described in Methods: ‘Agent-based simulation (single trait distorter locus)’. We model a diploid population of N = 2000 individuals, each with γ = 106 loci, θγ of which constituting the cabal and (1 − θ)γ of which constituting the commonwealth.

We assume that each locus across the genome is initially ‘dormant’. The alleles segregating in the population at dormant loci are neutral with respect to trait distortion and suppression. Loci are activated when the alleles segregating there have drifted to lie one mutational step away from distortion or suppression. For a given dormant locus in the cabal and in the commonwealth, the generational activation probability is given, respectively, by $$\rho_{D_1}$$ and $$\rho_{S_1}$$. Each successively activated cabal and commonwealth locus is indexed with a consecutive integer within the respective sets Icabal = {1, 2, …, ncabal} and Icommonwealth = {1, 2, …, ncommonwealth}, where ncabal and ncommonwealth give respectively the total number of activated cabal and commonwealth loci, which increase as generations (T) pass. After locus activation, alleles mutate between functional and neutral forms with a generational probability of 0.001. If, at any time, all trait distorters (iIcabal) have dedicated suppressors (iIcommonwealth), such that ncabal=ncommonwealth, further commonwealth loci cannot be activated until new trait distorters arise (ncabal > ncommonwealth). If trait distorters are low-sophistication as opposed to high-sophistication, the generational cabal locus activation probability ($$\rho_{D_1}$$) is increased by a factor two (such that $$\rho_{D_{1{\mathrm{L}}}}=2^\ast \rho_{D_{1{\mathrm{H}}}}$$).

For each individual, the set IdistorterIcabal comprises every locus within the cabal where one (heterozygous) or two (homozygous) trait distorters are present. A given suppressor at a locus within the commonwealth (iIcommonwealth) is only expressed if its target trait distorter (iIdistorter) is also present in the individual. However, if expressed, a given suppressor (iIcommonwealth) may also contribute to the ‘background’ suppression of unsuppressed non-target trait distorters (Idistorter\i), at a fraction z of its usual strength. We assume that, for low-sophistication trait distorters (D1L), z = 0.5, and for high-sophistication trait distorters (D1H), z = 0.

The total suppression faced by a trait distorter (iIdistorter) is therefore TotSupi = 1 if its dedicated suppressor is present in the individual, or TotSupi = min(zq,1) if its dedicated suppressor is absent, where q is the number of expressed suppressors present in the individual, and where the ‘min’ notation indicates that the total suppression cannot exceed 1 (complete suppression). The total cost of suppression for an individual is $$c_{\mathrm{sup}}\mathop {\sum}\nolimits_{i \in I_{\mathrm{distorter}}} {{\mathrm{TotSup}}_i}$$. The least suppressed trait distorter in each individual (idomIdistorter) exerts inter-locus dominance, and causes a trait distortion of $${\mathrm{Dist}} = \begin{array}{*{20}{c}} {{\mathrm{max}}} \\ {i \in I_{\mathrm{distorter}}} \end{array}\left( {(1 - {\mathrm{TotSup}}_i)k} \right)$$. The individual cost of trait distortion, which is given by ctrait(Dist), increases monotonically with the extent that the trait is distorted $$\left( {\frac{{{\mathrm{d}}c_{\mathrm{trait}}}}{{\mathrm{dDist}}} \ge 0} \right)$$.

Expression of the remaining ‘inter-locus recessive’ trait distorters (Idistorter\idom) leads to a pool of gene products with an abundance that is proportional to: $${\mathrm{Waste}} = \mathop {\sum}\nolimits_{\begin{array}{*{20}{c}} {i \in I_{\mathrm{distorter}}} \\ {i \ne i_{\mathrm{dom}}} \end{array}} {((1 - {\mathrm{TotSup}}_i)k)}$$. The individual cost arising from inter-locus recessive trait distorters, which is given by crec, increases monotonically with the size of the pool of redundant gene products $$\left( {\frac{{{\mathrm{d}}c_{\mathrm{rec}}}}{{{\mathrm{dWaste}}}} \ge 0} \right)$$. We assume that, for low-sophistication trait distorters (D1L), the individual cost arising from any one inter-locus recessive trait distorter is equal to the cost of trait distortion itself $$\left( {c_{\mathrm{trait}}\left( {\mathrm{Dist}} \right) = \frac{{c_{\mathrm{rec}}\left( {\mathrm{Waste}} \right)}}{{\left| {I_{\mathrm{distorter}}} \right| - 1}} \ge 0} \right)$$. For high-sophistication trait distorters (D1H), this cost is lower relative to the cost of trait distortion $$\left( {c_{\mathrm{trait}}\left( {\mathrm{Dist}} \right) = \frac{{5(c_{\mathrm{rec}}\left( {\mathrm{Waste}} \right))}}{{3(\left| {I_{\mathrm{distorter}}} \right| - 1)}} \ge 0} \right)$$. The total fitness (viability) of an individual is then given by: $$1 - c_{\mathrm{trait}}(\mathrm{Dist}) - c_{\mathrm{rec}}(\mathrm{Waste}) - c_{\mathrm{sup}}\mathop {\sum}\nolimits_{i \in I_{\mathrm{distorter}}} {{\mathrm{TotSup}}_i}$$.

We define the set IhetIdistorterIcabal as the collection of loci in an individual at which one (heterozygous) trait distorter, as opposed to two (homozygous) trait distorters, are present. The trait distorters at these loci (Ihet) drive at meiosis, as a unit. The least suppressed trait distorter in the group pulls the unit through meiosis, meaning the group of trait distorters (at loci Ihet) is inherited by each offspring with the probability $$(1 + \begin{array}{*{20}{c}} {{\mathrm{max}}} \\ {i \in I_{\mathrm{het}}} \end{array}(1 - {\mathrm{TotSup}}_i)k)/2$$.

### Agent-based simulation (multiple loci; continuous)

We adapt the simulation model detailed in Methods: ‘Agent-based simulation (multiple loci; discrete)’ so that trait distorters and suppressors are not of fixed strength (of k and 1, respectively), but are free to evolve continuously between 0 and 1.

Homologous alleles at activated cabal loci (iIcabal) have strengths kai and kbi, and homologous alleles at activated commonwealth loci (iIcommonwealth) have strengths mai and mbi. Within an individual, the loci bearing trait distorters (IdistorterIcabal) each satisfy max(kai, kbi) > 0. Each trait distorter (at locus iIdistorter) is suppressed to the following extent: $${\mathrm{TotSup}}_i = \min \left( {\max \left( {m_{ai},m_{bi}} \right) + z\mathop {\sum}\nolimits_{\begin{array}{*{20}{c}} {j \in I_{\mathrm{distorter}}} \\ {j \ne i} \end{array}} {\max \left( {m_{aj},m_{bj}} \right),1} } \right)$$.

Within an individual, the strongest trait distorter (after suppression) is inter-locus dominant (idomIdistorter), and distorts the individual trait by: $${\mathrm{Dist}} = \begin{array}{*{20}{c}} {{\mathrm{max}}} \\ {i \in I_{\mathrm{distorter}}} \end{array}\left( {(1 - {\mathrm{TotSup}}_i){\mathrm{max}}(k_{ai},k_{bi})} \right)$$. The inter-locus recessive trait distorters (Idistorter\idom) bring about an additional individual level cost of crec(Waste), which is a monotonically increasing function of $${\mathrm{Waste}} = \mathop {\sum}\nolimits_{\begin{array}{*{20}{c}} {i \in I_{\mathrm{distorter}}} \\ {i \ne i_{\mathrm{dom}}} \end{array}} {((1 - {\mathrm{TotSup}_i){\mathrm{max}}}(k_{ai},k_{bi}))}$$.

If an allele is more trait-distorting than its homologue (kai vs. kbi), it can drive at meiosis. The strongest alleles across each homologous pair drive together as a single unit. The unit is inherited by each offspring with the probability $$\left( {1 + \begin{array}{*{20}{c}} {{\mathrm{max}}} \\ {i \in I_{\mathrm{distorter}}} \end{array}\left( {1 - {\mathrm{TotSup}}_i} \right){\mathrm{abs}}\left( {k_{ai} - k_{bi}} \right)} \right)/2$$. Every generation, each allele at an activated locus has a 0.01 chance of mutating to a new strength, which is drawn from a normal distribution centred around the pre-mutation strength, with variance 0.2, and truncated between 0 and 1.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.