What is the G matrix?

Quantitative genetics has long made significant contributions to the development of domesticated plant and animal species. In the last two decades there has been an increasing interest in the use of quantitative genetics to understand the evolution of quantitative traits in natural populations. For domestic organisms the central equation of quantitative genetics is the relationship, R = h2S (response = heritability × selection differential), or less frequently the two-trait formulation

, where CRY is the correlated response of trait Y to direct selection on trait X, rA is the genetic correlation between X and Y, and σPi is the phenotypic standard deviation of trait i. However, natural selection does not typically act upon single or even pairs of traits, but rather the whole organism and hence it is more appropriate to consider the multivariate extension of the above equations (Lande, 1979a), Δ=GP−1S where Δ is the vector of mean responses (conventionally symbolized by z), G is the matrix of genetic variances and covariances, P is the matrix of phenotypic variances and covariances, and S is the vector of selection differentials. The matrix combination GP–1 can thus be viewed as the multivariate ‘equivalent’ of h2, with G replacing the additive genetic variance in the numerator and P–1 replacing the phenotypic variance in the denominator.

Of what use is G?

The multivariate response equation has been used theoretically to explore evolutionary trajectories with particular reference to constraints on the direction of evolutionary change (e.g. Via & Lande, 1985; Arnold, 1992; Bjorklund, 1996; Schluter, 1996). With respect to the analysis of natural populations, the response equation has been used to retrospectively reconstruct the selection regime that has given rise to a change over time or among populations (Arnold, 1992), or to predict responses to selection (Morris, 1971; Grant & Grant, 1993; Roff & Fairbairn, 1999). Retrospective selection analysis assumes that both G and P remain constant and that there are no dramatic reversals in the direction of selection (Shaw et al., 1995). Depending upon the nature of the analysis, prediction of response requires either that both G and P remain constant or change in a proportional manner relative to the original matrices.

There are two defences for the assumption of constancy. The first is the same as applied to artificial selection experiments, namely that although selection will change allelic frequencies and hence genetic variance and covariances (and thus also their phenotypic counterparts), the change will be so small per generation that the equations will remain correct at least for a dozen or so generations (see Roff, 1997 for a review providing support for this assumption). This argument clearly cannot be applied to those cases in which the equations are used to predict changes taking place over hundreds or thousands of generations. For this the assumption is made that directional or stabilizing selection is sufficiently weak that any erosion of variance is countered by new variation added by mutation (Lande, 1976a).

How weak is weak selection?

We can examine this question by asking what selection is necessary to produce a change in mean trait value that is characteristic of some given taxonomic level. A survey of tests of variation in G matrices at different taxonomic levels suggests that statistically significant differences are found at least at the level of species (Roff & Mousseau, 1999). Most studies have focused upon morphological traits and thus we must ask how much variation is characteristically found among species within any given genus. An analysis of variation among species within 32 different genera covering a wide range of animals suggests that a typical range is a two- to four-fold difference in linear dimensions, though differences larger than tenfold also occur (Fig. 1). For illustration, suppose that h2 = 0.5 (a typical heritability for morphological traits), VA = 25 and the initial trait value is 50. When the top 99.9% of the population is selected each generation then after 5000 generations the trait value will be 50 + (5000)(0.00337) (√[0.5][25]) = 110, a two-fold increase, while if the top 99% is selected the trait value will be 50 + (5000)(0.02692)(√[0.5][25]) = 526, a 10-fold increase. Both of these changes are within the range of interspecific variation in morphological traits and the former is probably most typical (see above). On a geological time-scale 5000 generations is short and changes of the above magnitude would be considered rapid. A cull of 0.1% of the population per generation is extremely weak (certainly below present levels of detection for most studies). Lande (1976b) estimated the required cull rate per generation to account for change in the tooth dimensions during the evolution of the horse: to obtain the observed changes requires selective deaths of the order of 2 per million individuals, which is considerably weaker than the above rates (99.9998% survive), but the time-span is in the millions rather than thousands of generations. Of course, it is unlikely that selection acts at a constant, minute rate each generation but the important point is that even very low rates of selection can produce very large phenotypic changes in what is no more than a geological ‘instant’. Thus the hypothesis of weak selection is at least tenable, but requires empirical support (Turelli, 1988).

Fig. 1
figure 1

Distribution of the ratio R2 among 32 different genera (lower panel) with two examples of the distribution of sizes within genera (upper panels). Data sources: birds (Dunning, 1984: note that weight was the trait given, and to make it comparable to a length measurement I used the cube root); mammals (Barbour & Davis, 1969; Schober & Grimmberger, 1987; Kingdon, 1997); reptiles (Conant, 1975); amphibia (Barker et al., 1995); fish (Scott & Crossman, 1973); beetles (Lindroth, 1966); leafhoppers (Beirne, 1956); grasshoppers (Otte, 1984); spiders (Locket & Millidge, 1953); cone shells (Wagner & Abbott, 1977).

Endler (1986; p. 210) surveyed a large number of studies in which estimates of selection intensities in natural populations were made and found that ‘natural selection is often as strong as artificial selection’. This result does not, however, demonstrate that the assumption of weak selection is invalid. First, it remains to be shown if these are typical or simply those that have been selected a priori because the researchers suspected that the traits in question would be under strong selection. There is little value in selecting traits for which selection intensities cannot be measured although these may represent the majority of traits. Secondly, single or short-term episodes of strong selection will not greatly affect the additive genetic variance (Roff, 1997). What we need to know is the frequency with which episodes of strong selection occur.

How does selection and/or drift alter G?

When genetic correlations arise principally from pleiotropy, populations are large and mating is random, the G matrix is determined by the joint effects of pleiotropic mutation and multivariate selection (Lande, 1980). Changes in the selective regime will produce changes in G but there is no a priori general prediction that can be made (or has so far been published). If selection is strong enough to significantly alter the gene frequencies of the traits subject to selection there will be a change in that portion of the G matrix in which the genes have influence, i.e. the additive genetic variance of the target trait plus the additive genetic covariances between the target trait and other nontarget traits. Thus, in general, the change in the G matrix will show no overall general pattern, but will depend upon the selection regime and the traits included in the matrix. In contrast, genetic drift will cause G matrices to diverge but remain proportional to each other (Lande, 1979a).

Population bottlenecks can convert nonadditive genetic (co)variance into additive genetic (co)variance (Roff, 1997; chapter 8) but the general effect of genetic drift is to erode additive genetic variance. This should be particularly true for morphological traits in which non additive effects are typically small (Mousseau & Roff, 1987; Crnokrak & Roff, 1995; Derose & Roff, 1999). Non-additive effects are present in life history traits (Crnokrak & Roff, 1995; DeRose & Roff, 1999) and hence changes in the G matrix due to short episodes of genetic drift could be unpredictable.

To illustrate the potential importance of genetic drift when there is only additive genetic variance, consider the change in the additive genetic variance of a single trait as a result of drift alone. The proportional decline in the additive genetic variance after t generations is approximately e - t / 2 N e (Robertson, 1960). After 5000 generations the proportional decline for an effective population size of 5000 will be 0.61, which is a substantial reduction in a very short geological period. Thus even very large populations can be subject to significant erosion in additive genetic variance due to genetic drift. A review of estimated effective population sizes produced less than 20% greater than 1000 and over 70% in the range 10–500 (see table 8.3 and figs 8.2 in Roff, 1997). The populations studied do not represent a random sample (there are few invertebrates) but do suggest that random genetic drift could play a very important role in evolutionary changes in many populations (see also Lande, 1979b).

The erosion of additive genetic (co)variance resulting from the combined effects of selection (directional or stabilizing) and drift will be compensated in part or whole by new variation added by mutation. For the infinitesimal model, truncation selection, drift and mutation will change the additive variance, σ2A, after one generation to (Keightley & Hill, 1987)

where h2t is the heritability at time t, which changes because σ2A changes, k = i(ix), where i is the intensity of selection and x is the standardized deviation of the truncation point of selection and σ2g,t is the genic variance in the population, which is the genetic variance that would be present under linkage and Hardy–Weinberg equilibria. Assuming this to be true in the initial, unselected population we have σ2A,02g,0. The genetic variance introduced by mutation is σ2M. At equilibrium the additive genetic variance maintained by selection, drift and mutation, σ2A, is obtained from the solution to the quadratic

which is

From the above it is evident that the equilibrium additive genetic variance is approximately proportional to the effective population size. For example, suppose the proportion culled per generation is 0.1%, and the mutational variance is 0.002, which is within the observed range (see tables 9.7 in Roff, 1997), then for effective population sizes of 10, 100, 1000 the equilibrium additive genetic variance is 0.04, 0.40 and 4.0, respectively. Even if we make selection very strong by culling 50% of the population per generation, the respective additive genetic variances are still approximately proportional to Ne (0.04, 0.34, 2.73, respectively). Bürger (1993), assuming weak selection, obtained the simple result that at equilibrium under directional selection the additive genetic variance would be equal to 2Neσ2M, which is the equilibrium variance under the neutral mutation–drift balance as derived by Lynch & Hill (1986).

What statistical methods are presently available to test for variation among G matrices?

Given the G matrix of two or more populations consisting of either the same or different species we are in the first instance interested in two questions: first, ‘can we reject the hypothesis that the matrices came from the same statistical population?’. This null hypothesis supposes that the populations do not differ in population size, selection history or mutational variances. We might be inclined to entertain such a hypothesis when comparing several populations of the same species but it seems an unlikely scenario when considering different species. The second question is, ‘can we reject the hypothesis that differences among the G matrices can be accounted for on the basis of drift alone?’ This hypothesis does not preclude the possibility that there are differences in selection history or mutational variances among the populations, but postulates that genetic drift will be relatively so large that it overwhelms such effects. It is important to remember that proportionality of the G matrices does not imply that the mean phenotypes have not been drastically altered by selection. If equality and proportionality can be rejected we accept the hypothesis that differences are, at least in part, a consequence of variation in selection history and/or mutational effects.

The statistical methods proposed to address the above hypotheses can be classified into four groups: element-by-element comparisons, correlation between matrices, maximum likelihood, and the Flury hierarchical comparison. Most analyses have involved only two populations and for simplicity I shall assume that only two matrices are being compared and that the elements of the matrices are ‘lined up’ in two vectors. Element-by-element comparison entails doing a separate test on each element of the matrices (Brodie, 1993; Paulsen, 1996; Roff & Mousseau, 1999; Roff et al., 1999). The simplest such test is to check whether the difference between the elements is greater than expected by chance, which may be done using a bootstrap approach (Paulsen, 1996), randomization (Roff et al., 1999), or a t-test where the standard errors are derived by using the jackknife (Brodie, 1993; Roff, 1997). The problem with the method is that the elements within a matrix are not independent of each other and there are likely to be so many tests that after correction for multiple tests there is too little statistical power to be useful. However, while the method may not serve as a means of testing for overall differences it is useful as a ‘data-exploration’ technique, serving to indicate where large differences might lie.

The most commonly used test is to compare the correlation between the matrices. To circumvent the problem of nonindependence, Lofsvold (1986) suggested the randomization test known as Mantel’s test. This test has been quite widely used although its statistical validity has been questioned (Shaw, 1992). My objection to the method, apart from those enunciated by Shaw (1992) is that it cannot distinguish the hypotheses presented above. A highly significant correlation between matrices could indicate equality (i.e. yi = xi, where yi and xi are the corresponding elements in the two ‘vectorized’ matrices), proportionality (i.e. yi = bxi, where b ≠ 1), or the joint effects of drift and selection/mutation (i.e. yi = a + bxi, where a ≠ 0). The three hypotheses can be distinguished by a regression approach using the reduced major axis method and randomization to estimate the required probabilities (Roff et al., 1999).

Maximum likelihood methods have become particularly popular in quantitative genetic analyses in general; they have been used to address the question of similarity between matrices (e.g. Shaw, 1991; Holloway et al., 1993), but have not been used to distinguish equality from proportionality vs. other types of differences. A method that shows considerable promise and is itself based on maximum likelihood is the Flury approach which considers a hierarchy of possible differences. This method looks at the matrices from the perspective of their common principal components: starting from the ‘top’ of the hierarchy the matrices can be (a) identical, (b) proportional, in which case the matrices share identical principal components but their eigenvalues differ by a proportional constant, (c) they can have one or more, but not all, principal components in common, or (d) they can have completely unrelated structures (Phillips & Arnold, 1999; Fig. 2). The argument for analysing the principle components is that evolutionary change will be influenced by the size of the eigenvalues corresponding to the principal components (Bjorklund, 1996; Schluter, 1996). Eigenvalues that are zero indicate directions in which there is no genetic variation and hence in which there cannot be evolutionary change. Very large eigenvalues indicate highly correlated traits, and thus selection will be strongly influenced by the associated principal component.

Fig. 2
figure 2

Schematic view of the Flury hierarchy for two covariance matrices consisting of two traits. The ellipses represent the covariance structure with the axis orientation denoting the principal components and the spread of the ellipses along each axis the eigenvalues. The analysis proceeds from the top to the bottom.

Determining that two matrices differ does not itself tell us which process (drift vs. selection) is the more significant. For this purpose we require some method of partitioning the effect that can be attributed to drift vs. that which can be attributed to selection. The following approach based on mean-square deviations is one possible method. First, we compute the mean-square error for the three models

where c is the number of variances and covariances, b0 and B0 (= 1/b0) are the slopes of the reduced major axis regression forced through the origin, and a, b, A and B are the parameters of the reduced major axis regression with the intercept included. The MSE is calculated using both ‘x on y’ and ‘y on x’, because these do not each give the same value. We can now ask how much the MSE is reduced by the assumption of a model that only assumes drift (model 2) vs. one that assumes both drift and selection (model 3). To illustrate this I compare the G matrices for two populations of garter snake studied by Arnold & Phillips (1999). Using the Flury hierarchical method the G matrices for females from a coastal population were found to have common principal components but not to be proportional (Arnold & Phillips, 1999). However, it is visually evident that the elements of the matrices do not differ much from proportionality (Fig. 3). This is confirmed by the comparison of the percentage reduction: model 2 accounts for a 17.1% reduction in the mean-square error whereas model 3 accounts for a 18.3% reduction. Thus 17.1% reduction can be attributed to drift while only 1.3% can be attributed to selection. This does not mean that drift actually caused the divergence, for it is possible that selection caused a proportional change in the matrices. However, in general, selection will cause nonproportional changes. Thus, in the absence of evidence to the contrary, the hypothesis that the major force causing divergence was drift cannot be rejected.

Fig. 3
figure 3

Plot of genetic variances and covariances of two populations of garter snake (data from Arnold & Phillips (1999).

What have the analyses of G matrices told us to date?

Almost all studies have compared either different populations of the same species or different species within the same genus. Most have not distinguished between the effects of drift vs. the effects of selection. More importantly many used a correlational test, which as discussed above, cannot distinguish between equality, proportionality and a general linear relationship. The majority of studies comparing populations have found either no significant differences or significant correlations, whereas comparisons between species have most frequently found differences (Fig. 4). Of course this does not mean that there are no differences between populations, simply that the difference is smaller than can typically be detected given the sample sizes typically used.

Fig. 4
figure 4

A survey of analyses comparing genetic covariance matrices (data from table 7 of Roff & Mousseau, 1999; plus the garter snake data from Arnold & Phillips, 1999). NSD, Not Significantly Different; SC, Significantly Correlated; SD, SC, Significantly Different and Significantly Correlated; SD, Significantly Different; NSC, Not Significantly Correlated.

Another way of approaching the problem is to ask how much of the difference at the various taxonomic levels is attributable to drift vs. selection. Where significant variation between groups has been found it can almost all be accounted for by drift (Fig. 5). Interestingly, the points showing the greatest ‘effect’ of drift and selection (as indicated by the deviation from the line of equality) are all nonsignificant indicating that analyses should consider both the results of hypothesis testing and interval estimation. Also included in Fig. 5 are the results from a selection experiment on body size in D. melanogaster. Comparison of the G matrices of the base and control populations showed significant differences (Shaw et al., 1995), which, as shown in the plot, can be attributed to drift, as would be expected. Comparisons of the control population with those selected for large or small size suggests that drift has caused even greater deviation of the selected lines from the control. Further, there appears to have been no effect of selection on G in the population selected for small size but an effect in the population selected for large size (Fig. 5). However, these conclusions are suspect because in neither case did the G matrix of the control population differ from that of the selected population (Shaw et al., 1995).

Fig. 5
figure 5

A comparison of percentage reduction in mean-square error attributable to Drift vs. that attributable to Drift and Selection. The data were taken from those studies in table 7 of Roff & Mousseau (1999) in which the necessary variances and covariances were presented, Arnold & Phillips (1999) and Shaw et al. (1995). Sig; matrices were found to be significantly different from each other. NS; the null hypothesis of equality could not be rejected.

Where do we go from here?

There is considerable scope for further theoretical analysis of the evolution of the G matrix. In particular, what is the relationship between selection and the deviation from proportionality expected under drift alone? There is also need for more research into both the statistical properties of the proposed tests and further development of methods for estimating the difference between matrices. I have presented one measure, the reduction in mean-squared deviation, which suggests that most of the differentiation observed between populations and species is due to drift rather than selection. Drift is a consequence of differences in effective population size, which are likely to be a consequence of ecological factors: thus we might expect that variation in G should correlate with both taxonomic and ecological variables. Support for the latter is provided by the study of the genetic correlations between morphological characters in the amphipod Gammarus minus (Jernigan et al., 1994).

The G matrix is central to the quantitative genetics of evolutionary processes. Recent developments have shown great promise in improving our ability to analyse variation among matrices. What is presently lacking is an examination of a group incorporating several taxonomic levels (e.g. species and genus) and ecological variables. While such a research programme would be very labour-intensive, the rewards certainly merit the required expenditure.