Neutral Theory: The Null Hypothesis of Molecular Evolution

Citation: Duret, L. (2008) Neutral theory: The null hypothesis of molecular evolution. Nature Education 1(1):218

In the decades since its introduction, the neutral theory of evolution has become central to the study of evolution at the molecular level, in part because it provides a way to make strong predictions that can be tested against actual data. The neutral theory holds that most variation at the molecular level does not affect fitness and, therefore, the evolutionary fate of genetic variation is best explained by stochastic processes. This theory also presents a framework for ongoing exploration of two areas of research: biased gene conversion, and the impact of effective population size on the effective neutrality of genetic variants.

Aa Aa Aa

The evolution of living organisms is the consequence of two processes. First, evolution depends on the genetic variability generated by mutations, which continuously arise within populations. Second, it also relies on changes in the frequency of alleles within populations over time.

The fate of those mutations that affect the fitness of their carrier is partly determined by natural selection. On one hand, new alleles that confer a higher fitness tend to increase in frequency over time until they reach fixation, thus replacing the ancestral allele in the population. This evolutionary process is called positive or directional selectio n. Conversely, new mutations that decrease the carrier's fitness tend to disappear from populations through a process known as negative or purifying selection. Finally, it may happen that a mutation is advantageous only in heterozygotes but not in homozygotes. Such alleles tend to be maintained at an intermediate frequency in populations by way of the process known as balancing selection.

However, natural selection is not the only factor that can lead to changes in allele frequency. For example, consider a theoretical population in which all individuals, or genotypes, have exactly the same fitness. In this situation, natural selection does not operate, because all genotypes have the same chance to contribute to the next generation. Given that populations do not grow infinitely and that each individual produces many gametes, it follows that only a fraction of the gametes that are produced will succeed in developing into adults. Thus, in each generation, allelic frequencies may change simply as a consequence of this random process of gamete sampling. This process is called genetic drift. The difference between genetic drift and natural selection is that changes in allele frequency caused by genetic drift are random, rather than directional. Ultimately, genetic drift leads to the fixation of some alleles and the loss of others.

But what about mutations that do not affect the fitness of individuals? These so-called neutral mutations are not affected by natural selection and, hence, their fate is essentially driven by genetic drift. Interestingly, Darwin himself recognized that some traits might evolve without being affected by natural selection:

"Variations neither useful nor injurious would not be affected by natural selection, and would be left either a fluctuating element, as perhaps we see in certain polymorphic species, or would ultimately become fixed, owing to the nature of the organism and the nature of the conditions." (Darwin, 1859)

It is important to note, however, that the impact of genetic drift is not limited to neutral mutations. Because of genetic drift, most advantageous mutations are eventually lost, whereas some weakly deleterious mutations may become fixed.

Beyond selection and drift, biased gene conversion (BGC) is a third process that can cause changes in allele frequency in sexual populations. BGC is linked to meiotic crossing-over. When crossing-over occurs between two homologous chromosomes, the intermediate includes heteroduplex DNA—a region in which one DNA strand is from one homologue and the other strand is from the other homologue. Regardless of the ultimate resolution of the crossover intermediate (in other words, whether the regions on either side of the crossover junction recombine), base-pairing mismatches in the heteroduplex region must be resolved. As a consequence, when a given locus resides in the heteroduplex region, one allele can be "copied and pasted" onto the other one during gene conversion.

BGC is said to be biased if one allele has a higher probability of conversion than the other. In that situation, the donor allele will occur at higher frequency in the gamete pool than the converted allele. Hence, BGC tends to increase the frequency of such donor alleles within populations. There is evidence that BGC occurs in many eukaryotic species, and various observations suggest that it might result from a bias in the repair of DNA mismatches in the heteroduplex DNA formed during recombination (Marais, 2003). Again, it is important to note that the impact of BGC is not limited to the evolution of neutral mutations: BGC can favor the fixation of donor alleles even if these alleles are weakly deleterious (Galtier & Duret, 2007).

The Selectionist vs. Neutralist Controversy

This bar graph shows the percent sequence divergence between human and mouse genomes for different gene regions and for retropseudogenes. The percent human-mouse sequence divergence is plotted on the y-axis. The human-mouse sequence divergence is 30% for the 5’ untranslated region, 10% for non-degenerate sites, 32% for 4-fold degenerate sites, 34% for introns, 29% for the 3’ untranslated region, and 32% for retropseudogenes.

Figure 1: Sequence divergence between human and mouse genomes

Complete genomic data for humans and mice allows the comparison of genetic sequences between the two species to calculate sequence divergence in different regions of genes and in retropseudogenes.

Thus, the fate of the mutations within a population is driven by natural selection and by nonadaptive evolutionary processes, such as genetic drift and BGC. But to what extent does each of these processes contribute to genome evolution?

Until the 1960s, the prevailing view was that natural selection played a dominant role. According to this view, differences between species were thought to consist mainly of mutations that had been fixed by positive selection—mutations that contributed to the adaptation of a species to its environment. In contrast, the existing polymorphism within populations was thought to reflect balancing selection. Thus, according to this so-called selectionist theory, nonadaptive processes were at best minor contributors to evolution. However, the analysis of sequence data that became available in the late 1960s considerably challenged this view. In 1968, these empirical data and new theoretical developments led Motoo Kimura to propose a new hypothesis, now known as the neutral theory of molecular evolution (Kimura, 1968). Kimura subsequently summarized his theory as follows:

"This neutral theory claims that the overwhelming majority of evolutionary changes at the molecular level are not caused by selection acting on advantageous mutants, but by random fixation of selectively neutral or very nearly neutral mutants through the cumulative effect of sampling drift (due to finite population number) under continued input of new mutations" (Kimura, 1991)

Immediately, this theory caused controversy and gave rise to opposition from many evolutionary biologists. However, the theory also made several strong predictions that could be tested against actual data. Notably, if most of the sequence divergence between species is due to neutral evolution, then one should expect more changes in functionally less important sequences. When Kimura proposed the neutral theory in 1968, only a few protein sequences were available. By the 1980s, however, the much larger amount of DNA sequence data that had accumulated largely validated this prediction. In fact, in light of these new sequence data, Kimura himself published a review of his theory in 1991. In his paper, he pointed out several important observations that had been recently reported, including the following:

In protein sequences, conservative changes—substitutions of amino acids that have similar biochemical properties and are therefore less likely to affect the function of a protein—occur much more frequently than radical changes.
Synonymous base substitutions (i.e., those that do not cause amino acid changes) occur almost always at a much higher rate than nonsynonymous substitutions.
Noncoding sequences, such as introns, evolve at a high rate similar to that of synonymous sites.
Pseudogenes, or dead genes, evolve at a high rate, and this rate is about the same in three-codon positions.

All of these observations have been widely confirmed with the genomic data that are now available (Figure 1). These observations are consistent with the neutral theory but contradict selectionist theory. After all, if most substitutions were adaptive, as argued by selectionist theory, one would expect fewer substitutions in DNA regions where changes have little or no effect on phenotype (e.g., pseudogenes, noncoding sequences, synonymous sites) than in functionally important regions.

It must be stressed that the neutral theory of molecular evolution is not an anti-Darwinian theory. Both the selectionist and neutral theories recognize that natural selection is responsible for the adaptation of organisms to their environment. Both also recognize that most new mutations in functionally important regions are deleterious and that purifying selection quickly removes these deleterious mutations from populations. Thus, these mutations do not contribute—or contribute very little—to sequence divergence between species and to polymorphisms within species. Rather, the dispute between selectionists and neutralists relates only to the relative proportion of neutral and advantageous mutations that contribute to sequence divergence and polymorphism.

Analysis of genomic sequence data reveals that there is no "all or nothing" answer to this dispute. In fact, the proportion of neutral substitutions varies widely among taxa. However, it is now clearly established that nonadaptive processes cannot be neglected. Even in taxa in which selection is very effective, a large fraction of substitutions are indeed neutral.

Population Size Matters

The classification of mutations into three distinct types—deleterious, neutral, and advantageous—is of course an oversimplification. In reality, there is a continuum from highly deleterious to weakly deleterious, nearly neutral, neutral, weakly advantageous, and strongly advantageous mutations. It is important to note that the effectiveness of selection on a mutation depends both on the fitness effect of this mutation (the selection coefficient s) and on the effective population size (N_e). Specifically, when the product N_es is much less than 1, the fate of mutations is essentially determined by random genetic drift. In other words, in small populations, the stochastic effects of random genetic drift overcome the effects of selection. Thus, all mutations for which N_es is much less than 1 can be considered effectively neutral. This implies that the proportion of neutral mutations is expected to inversely vary with a taxon's effective population size.

Empirical data are consistent with this prediction. For example, in Drosophila species (where N_e is about 10⁶), the proportion of nonsynonymous substitutions that have been fixed by positive selection is about 50%. Contrast this with the data for hominids (with N_e around 10,000 to 30,000), where this proportion is close to zero. Similarly, the proportion of nonsynonymous mutations that are effectively neutral is less than 16% in Drosophila, whereas it is about 30% in hominids (Eyre-Walker & Keightley, 2007).

Neutral Theory: The Null Hypothesis of Molecular Evolution

The most important contribution of Kimura's work is that it provides a theoretical framework for developing methods that detect the action of selection within genomes. However, to be able to demonstrate that a sequence is subject to selective pressure, one must reject the null hypothesis that this sequence evolves neutrally. For example, one strong (and elegant) prediction of the neutral theory is that at selectively neutral sites, the rate of substitution is equal to the rate of mutation (Kimura, 1968). To demonstrate this, consider a neutral site: a DNA position at which all alleles are selectively equivalent, and where the rate of mutation per generation is u. In a haploid population of size N, Nu mutations occur at this site at each generation. Given that there is no selection, all genotypes have the same probability to reach fixation. Under a neutral model, the probability that an allele or mutation fixes is simply its relative frequency in the population. For a new mutation in a haploid population, this relative frequency is 1/N; thus, the probability that a new mutation reaches fixation is simply 1/N (the same reasoning also holds for diploid species). The rate of substitution per generation (K) is obtained simply by multiplying the number of mutations that occur at each generation by their probability of fixation. Thus, for neutrally evolving sites, the equation becomes the following:

K = Nu × 1/N = u

Of course, because of natural selection, advantageous mutations have a higher probability of fixation than neutral mutations, and deleterious mutations have a lower probability of fixation. It therefore follows that sequences subject to positive selection evolve faster than neutral sites (K > u), whereas sequences subject to negative selection evolve more slowly (K < u). This simple result is the basis of many tests that have been developed to detect selection. Note, however, that selection is not the only process that can affect K. Indeed, BGC can affect the probability of allele fixation and hence substitution rates, as previously mentioned.

Why Should We Care About Neutral Evolution?

The main goal of biology is to understand how living organisms function and how they adapt to ever-fluctuating environments. So one may wonder why it is important to study neutral evolutionary processes that, a priori, seem to have little effect on the evolution of phenotypes. There are actually three primary reasons for this adjusted focus.

First, as mentioned earlier, the neutral theory is the underlying basis of selection tests. These tests are widely used to identify functional elements (e.g., genes and regulatory regions) within genomic sequences. The basic principle of this comparative genomics approach is that functional elements are subject to selective pressure, and hence their pattern of evolution differs from the neutral expectation. To be able to detect selection, it is therefore necessary to have a good understanding of all the nonadaptive evolutionary processes that affect sequence evolution—mutation, BGC, and genetic drift. Notably, the selection test mentioned above requires knowledge of u. This parameter can be estimated by measuring K in sites that are expected, a priori, to be neutral—pseudogenes or defective transposable elements, for example—although u may vary across chromosomes. Also, in some eukaryotic taxa, BGC appears to have a strong influence on genome evolution by favoring the fixation of AT to GC mutations (Marias, 2003). This BGC drive leads to enrichment of GC-content in genomic regions that feature high crossover rates. Many lines of evidence indicate this process is responsible for the strong regional variations in GC-content across mammalian chromosomes. In mammals, crossovers occur essentially in hot spots (typically 1 to 2 kilobases long), and BGC can create strong substitution hot spots, where the local substitution rate can be up to 20 times higher than in the rest of the genome (Duret & Arndt, 2008). In some cases, BGC may even counteract the action of selection and lead to the fixation of deleterious mutations, which implies that BGC can contribute to species maladaptation. Thus, before concluding that sequences are subject to selection, it is necessary to test whether the observed pattern of sequence evolution cannot be explained by this nonadaptive process (Galtier & Duret, 2007).

A second reason why knowledge of neutral sequence evolution is important is that it provides information about molecular processes that are involved in genome functioning. For example, it has been found that in some taxa, there is an asymmetry of substitution patterns between the two DNA strands. This pattern is caused by the asymmetry of the DNA replication process and can be used to infer the location of replication origins within chromosomes (Lobry, 1996). Similarly, the asymmetry of substitution patterns can be used to detect and orient transcription in the germ line of mammals (Green et al., 2003). The analysis of neutral substitution patterns also revealed the existence of a homology-dependent mechanism of DNA methylation in primates (Meunier et al., 2005).

Finally, one point that is often not fully appreciated is that neutral evolution can ultimately contribute to phenotypic evolution and to species adaptation. Kimura noted that many gene duplications may get fixed by random genetic drift, simply because they are not deleterious . Then, because of relaxation of selective pressures, one or both copies may accumulate mutations that otherwise would have been counterselected. Some of these mutants will turn out to be useful for the adaptation of organisms to their environment (Kimura, 1991). The idea that nonadaptive processes may have a major impact on the evolution of biological complexity has been largely developed by Michael Lynch (Lynch, 2006). Following duplication, the functions of the duplicates are initially redundant. Also, most gene products contribute to multiple aspects of an organism's phenotype. If one duplicate undergoes a mutation that knocks out part of its contribution to phenotype, it is released from purifying selection to maintain those functions. The reduction of negative selection efficiency allows a wider exploration of the space of possible genotypes, which may allow for improvements in remaining functions.

Also, regardless of the steps involved, the evolutionary trajectory between one genotype and another better-fit genotype may sometimes have to pass through a less-optimal genotype. Thus, a reduction of effective population size, which increases the effect of random drift, may allow the fixation of weakly deleterious mutations to pass through this less-optimal genotype, and hence can open a new evolutionary trajectory, possibly toward better-adapted genotypes.

References and Recommended Reading

Darwin, C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life (London, John Murray, 1859)

Duret, L., & Arndt, P. F. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genetics 4, e1000071 (2008) (link to article)

Eyre-Walker, A., & Keightley, P. D. The distribution of fitness effects of new mutations. Nature Reviews Genetics 8, 610–618 (2007) (link to article)

Galtier, N., & Duret, L. Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. Trends in Genetics 23, 273–277 (2007)

Graur, D., & Li, W. Fundamentals of Molecular Evolution (Sunderland, MA, Sinauer Associates, 2000)

Green, P., et al. Transcription-associated mutational asymmetry in mammalian evolution. Nature Genetics 33, 514–517 (2003) doi:10.1038/ng1103 (link to article)

Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624–626 (1968) doi:10.1038/217624a0 (link to article)

———. The neutral theory of molecular evolution: A review of recent evidence. Japanese Journal of Genetics 66, 367–386 (1991)

Lobry, J. R. Asymmetric substitution patterns in the two DNA strands of bacteria. Molecular Biology and Evolution 13, 660–665 (1996)

Lynch, M. The origins of eukaryotic gene structure. Molecular Biology and Evolution 23, 450–468 (2006)

———. The Origins of Genome Architecture (Sunderland, MA, Sinauer Associates, 2007)

Makalowski, W., & Boguski, M. S. Evolutionary parameters of the transcribed mammalian genome: An analysis of 2,820 orthologous rodent and human sequences. Proceedings of the National Academy of Sciences 95, 9407–9412 (1998)

Marais, G. Biased gene conversion: Implications for genome and sex evolution. Trends in Genetics 19, 330–338 (2003)

Meunier, J., et al. Homology-dependent methylation in primate repetitive DNA. Proceedings of the National Academy of Sciences 102, 5471–5476 (2005)

Zheng, D., et al. Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Research 17, 839–851 (2007)