Introduction

One generalization in the biological literature is that fluctuating asymmetry, which represents a measure of developmental instability, is negatively related to the degree of heterozygosity within and among populations (e.g. Palmer & Strobeck, 1986; Mitton, 1995). Although a number of studies report exceptions to this pattern, they have been largely ignored. Well-known generalizations are often used as foundations for new hypotheses or statements such as fluctuating asymmetry being used for assessing genetic variability in the fossil record (Palmer, 1986) and multilocus heterozygosity generally being associated with fitness (Mitton, 1995; Britten, 1996). We believe that quantitative assessment of the basic hypothesis is required before any further progress can be made (Arnqvist & Wooster, 1995).

Developmental stability (or homeostasis) is defined as the ability of an individual to withstand genetic and environmental disturbances during development so as to produce a predetermined phenotype (Waddington, 1942; Lerner, 1954). A common measure of developmental instability is fluctuating asymmetry (hereafter FA); the random departure of a bilateral pair of traits from perfect symmetry. Environmental or genetic stress during ontogeny may reduce the efficiency of normal developmental processes, thus increasing the level of FA (Palmer & Strobeck, 1986; Parsons, 1990; Clarke, 1992; Møller & Swaddle, 1997). Fluctuating asymmetry seems to increase with increasing homozygosity, hybridization, inbreeding, mutation, extreme physical conditions, and pollution or habitat deterioration (Palmer & Strobeck, 1986, 1992; Leary & Allendorf, 1989; Clarke, 1992; Leary et al., 1992; Møller & Swaddle, 1997).

A negative correlation between the level of heterozygosity (frequently measured using biochemical methods such as enzyme electrophoresis) and the level of FA is regarded as firmly established by some (Soulé, 1979; Kat, 1982; Vrijenhoek & Lerman, 1982; Biémont, 1983; Leary et al., 1983, 1984; Mitton, 1993). However, a number of studies on very different kinds of organisms report no or only weak correlations (Beacham, 1991; Clarke & Oldroyd, 1996; Vøllestad & Hindar, 1997), raising doubts about the generality of the relationship between heterozygosity and FA. A number of the estimates of the relationship between heterozygosity and fluctuating asymmetry suffer from small sample sizes and therefore low statistical power (Cohen, 1988). The frequency of type II errors is high when sample sizes are small, but meta-analysis has a unique ability to reduce the likelihood of type II errors by pooling the results from multiple studies (Light & Pillemer, 1984; Hedges & Olkin, 1985; Rosenthal, 1991, 1994; Arnqvist & Wooster, 1995).

Here we analyse whether there is a general correlation between heterozygosity and FA using a meta-analytic approach. Meta-analysis refers to a set of statistical tools that can be used to analyse or compare the results of large numbers of individual studies (Light & Pillemer, 1984; Hedges & Olkin, 1985; Rosenthal, 1991, 1994; Arnqvist & Wooster, 1995). The meta-analysis tests for consistency among outcomes of different studies, determines the magnitude (basically the amount of variance explained) and significance of the effects they measure, and uses standardized estimates of effect size. Meta-analysis also provides a stringent way of evaluating heterogeneity in a data set (Hedges & Olkin, 1985). If no statistically significant heterogeneity exists, there is no reason to search for explanatory factors. Significant heterogeneity justifies a systematic search for variables that may account for different effects among studies. Finally, meta-analytic methods offer formal tests for publication bias (Light & Pillemer, 1984; Begg, 1994). Publication bias may prove a serious problem to many fields of scientific enquiry, in particular when scientists are attempting to assess the generality of a particular phenomenon. The distribution of effect sizes should indicate whether bias is present (Light & Pillemer, 1984). The variance in effect size should decrease with increasing sample size if publication is unbiased, because large samples are likely to produce effects that are close to the true population relationship. Mean effect size should be independent of sample size if published studies represent an unbiased sample of the true underlying distribution, and the frequency distribution of effect sizes should be normal and peak around the true value of an effect. These expectations are testable.

The usefulness of meta-analysis in biology is still controversial, but we believe it is superior to reviews based on simple vote counting without rigorous statistical treatment, or narrative summaries of the literature (Osenberg & St. Mary, 1998).

Materials and methods

Materials

We collected data on the correlation between fluctuating asymmetry and heterozygosity from the biological literature (Table 1). The published data were found by intensive literature search in electronic databases including BIOSIS, Zoological Record and Biological Abstracts. The different studies vary markedly in scope, sample size and number of traits used. It is also evident that a small number of researchers have supplied more than half of the data. In most cases the relevant statistics could be found in the different papers, but where unavailable we estimated the effect sizes from the published results.

Table 1 Effect sizes (r) of association between heterozygosity and fluctuating asymmetry with additional information on validity (A, test for the presence of antisymmetry and directional asymmetry and estimate of measurement error; B, test for the presence of antisymmetry and directional asymmetry, or estimate of measurement error; C, no evaluation of distributions or assessment of measurement error)

The level of fluctuating asymmetry was estimated using widely different characters among studies, and the number of characters scored also varied. Where possible we used composite character scores (such as the number of asymmetric characters per individual, or the sum of the total absolute difference between right and left characters) rather than single-trait scores as the relevant estimate of fluctuating asymmetry (Palmer & Strobeck, 1986; Møller & Swaddle, 1997). In some studies only single-character estimates were used. Different traits are differently susceptible to developmental accidents, and most often asymmetry in different traits is not correlated (Palmer & Strobeck, 1986; Leary & Allendorf, 1989; Møller & Swaddle, 1997; Clarke, 1998a,b). Asymmetry in two different types of traits, meristic and morphometric ones, was tested separately in the present study. However, the present data set does not allow testing the correlation between asymmetry in specific traits and heterozygosity. The number of individuals or populations scored varied among studies. In some analyses we used the number of individuals or populations as a weighting variable to give more weight to studies with larger sample sizes, as it is reasonable to assume that larger sample sizes will yield more precise results (Hedges & Olkin, 1985). The usual way of weighting effect size is by using the inverse of its variance (Osenberg & St. Mary, 1998). However, variance or standard error estimates were not available for most of the studies. Therefore, we use sample size as the most appropriate weighting variable. In most published studies heterozygosity was estimated using enzyme electrophoresis. The number and type of loci screened differed widely among studies. In the studies where comparisons were made between families in the laboratory, inbreeding coefficients were used as an estimate of the level of heterozygosity.

The data set is comprised of studies of a large number of different organisms, but is heavily skewed towards fish, in particular salmonid fish. Therefore, we calculated a mean effect size for each species to reduce phylogenetic bias and used the species-specific means in some analyses.

Calculation of effect sizes

The methods used are described in detail by Rosenthal (1991). We used Pearson’s product-moment correlation coefficient (r) as a measure of effect size. Fisher’s zr-transformation of r (eqn 1),

where −1< r < +1, is often used in this kind of study, especially if r is large and sample size small. In most data sets used in the present study r is small and sample sizes relatively large. We have, however, analysed the material using both effect size estimators. The conclusions did not differ between effect size metrics, and in this paper we report the results using r only.

In a number of studies used in this analysis the statistics were reported as χ2-tests, two-sample t-tests or the F-statistics of a one-way analysis of variance (ANOVA). These statistics can be transformed into Pearson’s correlation coefficients (Rosenthal, 1991) with the appropriate sign (plus or minus):

In some instances only the P-values of the statistical tests were reported. When the sample size of the study (i.e. N) was known, we estimated r by converting P into its standard normal deviate equivalent using a table of z-values (Rosenthal, 1991). We then estimated r and inserted the appropriate sign based on the reported trend (plus or minus):

Data sets analysed using more complicated statistical tests than mentioned here were not included in this analysis, as the translation of the statistics of such analyses into a standardized effect size is problematic. Thus we have not included the interesting study on guppies Poecilia reticulata by Sheridan & Pomiankowski (1997). Likewise, the study on developmental instability in gynodioecious Teucrium lusitanicum was excluded (Alados et al., 1998). Both these studies, using inbreeding vs. outbreeding as measures of heterozygosity, reported no effect of inbreeding (i.e. lack of heterozygosity) on the level of individual asymmetry. Furthermore, we excluded the study of the endangered Sonoran topminnow Poeciliopsis occidentalis by Quattro & Vrijenhoek (1988), because two recent reports indicated that the earlier result may arise from a laboratory artefact (Sheffer et al., 1997, 1998). Quattro & Vrijenhoek (1988) reported a strong negative correlation between mean heterozygosity and FA. Sheffer et al. (1997, 1998) repeated the experiments and also conducted field studies, and found no relationship between heterozygosity and FA.

No weighting of effect sizes was possible when analysing the total data set. Sample size in the within-population studies was the number of individuals (N was usually large), whereas sample size in the among-population studies was defined as the number of populations studied (N was usually very small, although the number of individuals used to estimate the population means may have been large). However, we did weight effect size by sample size when considering within- and among-population studies. When analysing the data we first used a test of homogeneity to test if the effect sizes were homogeneous among studies (Rosenthal, 1991). The test statistic has a χ2-distribution with K − 1 degrees of freedom (K=number of comparisons between FA and heterozygosity):

where Nj is the number of observations in the jth comparison, rj is the r of the jth comparison, and rm is the mean r. If no significant heterogeneity was found, we tested the hypothesis that fluctuating asymmetry was negatively correlated with heterozygosity. Hence, we have used one-tailed statistical tests when testing whether mean effect sizes were significantly different from zero (i.e. r < 0). Thereafter we tested for differences among groups of studies using ANOVA, with tests of homogeneity among categorical variables. We categorized studies into two conceptually different types, namely studies of individuals within populations and studies of variation among populations.

Many studies on fluctuating asymmetry do not report measurement error of asymmetry, or test for the occurrence of antisymmetry or directional asymmetry. Measurement error may be large (Palmer & Strobeck, 1986; Palmer, 1996; Møller & Swaddle, 1997), and the among-researcher variation when measuring/counting the same character may be profound (Hubert & Alexander, 1995). Furthermore, antisymmetry or directional asymmetry may invalidate estimates of fluctuating asymmetry. Therefore, critical assessment of the validity of each effect size estimate is important. We grouped studies into three categories (used as nominal variables in the analyses). In type A studies measurement error is accounted for and the right-minus-left distributions analysed for presence of antisymmetry and directional asymmetry, which were not found. Type B studies analysed for antisymmetry and directional asymmetry (neither was found), or estimated measurement error. In type C studies none of these was reported.

Results

The overall unweighted mean effect size for the total data set was r=−0.09 (SE=0.03) (Table 1). The overall mean effect size for within-population studies weighted by sample size was r=−0.04, whereas the overall mean effect size for among-population studies weighted by sample size was r=−0.18. There was significant heterogeneity in effect sizes both in the within-population (χ294=445, K=95, P < 0.001) and the among-population studies (χ223=51.5, K=24, P < 0.001). The same qualitative results were obtained when the analysis was performed using the mean r-values for each species. The variance was much larger in the among-population than in the within-population studies (Levene test, F1,116= 41.34, P<0.001). Because of this large difference in variance between study types, we subsequently analysed the data separately.

The frequency distributions of effect sizes of both among- and within-population studies were symmetrical with peak values close to zero (Fig 1), as expected for unbiased samples. We found only weak indications of a decrease in variance in effect size with increasing sample size, indicating a possible bias in the samples. We regressed effect size on ln-transformed sample size and found no statistically significant regression (among-population studies: F1,23=0.95, r2=0.04, P=0.34, b (SE)=0.11 (0.31); within-population studies: F1,91=0.45, r2=0.01, P=0.51, b (SE)=0.01 (0.02)). This implies that average effect size does not change with sampling effort, consistent with an absence of publication bias.

Figure 1
figure 1

Effect size in relation to sample size for among-population and within-population studies of the relationship between fluctuating asymmetry and heterozygosity.

We also tested if there was an association between number of polymorphic loci used in each study (only those studies which estimated heterozygosity using enzyme electrophoresis) and effect size (Fig 2). Number of loci used varied between 1 and 27. We grouped the studies into three classes; 1–4, 5–15, and >15 loci. There was no significant difference in effect size between groups, either at the within-population (weighted ANOVA, F2,90=0.96, P=0.39) or at the among-population level (F2,22=3.30, P=0.06).

Figure 2
figure 2

Effect size in relation to number of polymorphic loci used to estimate the relationship between fluctuating asymmetry and heterozygosity for among-populations and within-population studies.

We found no difference in effect size depending on whether heterozygosity was estimated using enzyme electrophoresis or inbreeding coefficients (one-way ANOVA, F1,117=0.23, P=0.63).

The precision of the different effect size estimates may vary among studies. We grouped studies into three different categories based on whether the studies tested for measurement error, if they examined either right-minus-left distributions for the occurrence of antisymmetry or directional asymmetry, or if they did neither. There was a weak tendency for studies with the least rigorous design (type C studies) to have larger mean effect size than other categories (one-way ANOVA, F1,115=2.86, P=0.06; mean ± SE (N) for studies with type A, B and C validity were −0.06 ± 0.05 (43), −0.02 ± 0.05 (35), −0.18 ± 0.05 (40), respectively). In some of the following analyses we tested the robustness of the results effect by removing type C studies.

Studies were heavily biased towards fish, in particular salmonid fish (Table 1). A total of 65 studies were on fish, and 59 (91%) of these on salmonids (mostly Oncorhynchus and Salmo). Therefore, we tested for differences in effect size between salmonid and nonsalmonid fish, using a Welch one-way ANOVA which allows for unequal variances among samples (because variances were significantly different; Levene test, F1,63=24.13, P < 0.001). The means were not significantly different (rsalmonids)=−0.12 ± 0.03 (SE), rnonsalmonids=−0.27 ± 0.26; F1,63=0.37, P=0.57). Thus the data set does not seem to be unduly biased by using the salmonid results (this result may be less robust than desired because of the low number of nonsalmonid fish studies).

Ectothermic and endothermic animals have been hypothesized to respond differently to reductions in heterozygosity (Handford, 1980; Wooten & Smith, 1986; Novak et al., 1993; Mitton, 1995). Therefore, we grouped the animal species into these two modes of thermoregulation to test for a difference in mean effect size (both study type and mode of thermoregulation were set as fixed effects in the analysis). Effect sizes differed significantly among groups, and there was also a significant study type by thermoregulation interaction (Table 2, Fig 3). Mean effect size for endotherms was positive for the two study types, whereas mean effect size for ectotherms was negative for both study types (endotherms: mean ± SE=0.09 ± 0.05, median= 0.08; ectotherms: mean ± SE=−0.13 ± 0.03; median=−0.08). The ectotherm data set was strongly biased towards fish. However, mean effect sizes were not significantly different between fish and nonfish ectotherms (fish: mean ± SE=−0.13 ± 0.04, median= −0.09, N=65; nonfish: mean ± SE=−0.13 ± 0.06, median=−0.04, N=26; one-way ANOVA, F1,89=0.01, P=0.96).

Table 2 Full factorial ANOVA of variation in effect size for the relationship between fluctuating asymmetry and heterozygosity, with thermoregulation (endotherm, ecotherm and study type (among-populations, within-population) as fixed effects. F3, 113 = 7.75, P < 0.001
Figure 3
figure 3

Mean (±SE) effect sizes for correlations between fluctuating asymmetry and heterozygosity in endothermic and ectothermic animals and different study types (among-populations, within-populations).

We removed type C studies from the data set, but this did not change our conclusion (two-way ANOVA with interaction, F3,73=4.83, P=0.004). Both the interaction between study type and mode of thermoregulation (P=0.022) and the effect of mode of thermoregulation remained significant (P<0.001).

There was still significant heterogeneity in parts of the data set, after partitioning into mode of thermoregulation and study type. For endothermic animals studied at the population level we found no significant heterogeneity (χ218=21.7, K=19, P>0.05). Mean weighted effect size (0.07 ± 0.03) was significantly different from zero, but positive, which is contrary to expectation. We also found no heterogeneity for ectothermic animals studied at the among-population level (χ215=21.2, K=16, P>0.05). Mean weighted effect size (−0.23 ± 0.09) was significantly different from zero (t15=−2.85, P (one-tailed)=0.006). We found significant heterogeneity for the two other study types (ectotherms within-populations: χ275=399, K=76, P<0.001; endotherms among-populations: χ2=17.3, K=7, P<0.025). We also tested for heterogeneity in effect sizes within those two species where the number of studies was large enough for this analysis. We found significant heterogeneity in effect sizes in rainbow trout Oncorhynchus mykiss (χ221=142, K=22, P<0.001; mean r=−0.19 ±0.06), but not in honey bees Apis mellifera (χ212=12.4, K=13, P>0.05; not significantly different from zero (mean r=−0.00 ± 0.03; t13=−0.11, P (one-tailed)= 0.46)).

Different kinds of characters may respond differently to stress. We divided studies into those based on meristic and morphometric characters. There was a significant difference in effect size between these categories when tested within populations (rmeristic=−0.05 ± 0.02, rmorphometric=0.02 ± 0.03; Welch ANOVA, F1,89=6.69, P=0.010), but not when tested among populations (rmeristic=−0.16 ± 0.12; rmorphometric=−0.10 ± 0.14; F1,19=0.11, P=0.74).

Discussion

We assessed the relationship between heterozygosity and developmental stability by means of a meta-analysis and found a weak negative correlation between heterozygosity and level of asymmetry. The distribution of effect sizes was consistent with expected distributions for samples of studies unaffected by publication bias. However, effect size was significantly heterogeneous both for within-population and among-population studies. Our analysis revealed significantly different mean effect sizes for ectothermic and endothermic animals. Given this complexity, controversy over the presence and magnitude of the heterozygosity–fluctuating asymmetry relationship is not surprising.

Heterozygosity was initially hypothesized to be negatively related to developmental instability (and thus fluctuating asymmetry) by Lerner (1954): heterozygotes were assumed to be able to synthesize more different biochemical products and therefore be better able to control development. Reduced metabolic efficiency may contribute to increased fluctuating asymmetry (Alekseeva et al., 1992; Ozernyuk et al., 1992; Palmer, 1996; Møller & Swaddle, 1997), with metabolic efficiency being optimal under certain environmental conditions. The optimal metabolic efficiency may differ among populations along an environmental gradient. Fluctuating asymmetry has been suggested to depend on environmental stress, if metabolic efficiency decreases with increasing deviations from optimal conditions, if allozyme genotypes directly affect metabolic efficiency, or if poor metabolic efficiency has direct negative effects on developmental precision (Mulvey et al., 1994). In other words, the association between fluctuating asymmetry and heterozygosity may depend upon the amount of stress experienced by the organism during development (Palmer, 1996). If this is so, we should expect the association between asymmetry and heterozygosity to be more important in stressful than in benign environments, as observed in a study of Gambusia holbrooki (Mulvey et al., 1994). However, in studies on Atlantic salmon Salmo salar and brown trout S. trutta, Vøllestad & Hindar (1997; unpubl. data) found no significant difference in the relationship between number of asymmetric characters and number of heterozygous loci in relation to the level of environmental stress.

Within-population studies provide the most appropriate way to test the hypothesized negative relationship between heterozygosity and asymmetry. Among- population studies are difficult to interpret, as it is impossible to ensure that measures are taken against a consistent environmental background. This is clearly a problem for ectotherms, and especially fishes, which may inhabit widely different habitats and also have very different genetic backgrounds (Ward et al., 1994).

Heterozygosity has been suggested to be more important for developmental stability in ectotherms than endotherms (Handford, 1980; Wooten & Smith, 1986; Novak et al., 1993; Mitton, 1995). The argument for this hypothesis is comparable to that for the stress hypothesis given previously. Endotherms will generally experience a very controlled internal environment during early embryonic development, whereas ectotherms may experience great environmental fluctuations (Novak et al., 1993). We found that ectotherms showed a stronger relationship between fluctuating asymmetry and heterozygosity than endotherms, even when excluding fish from the analysis. Although this provides some support for the overall hypothesis, we would like to emphasize that the average effect for ectotherms was very small (r2=0.02).

Although we found a weak negative relationship between heterozygosity and asymmetry, clearly a large number of studies reported no effect. This could be explained in several different ways. First, estimates of heterozygosity based on enzyme polymorphism data from a few loci may not provide reliable estimates of overall genetic variability (Chakraborty, 1981; Chakraborty & Ryman, 1983; Mitton, 1993). Secondly, estimates of developmental instability may be based on a single or few characters, and the relationship between heterozygosity and developmental instability is known to differ among characters (Suchentrunk, 1993; Palmer & Strobeck, 1997). Thirdly, the negative relationship between heterozygosity and developmental stability may depend on stress level. One or more of these explanations may apply, but their relative importance remains obscure. We note, however, that the general result of only a weak association between heterozygosity and fluctuating asymmetry seems to hold even in studies based on large sample sizes and/or a large number of polymorphic loci.

Obviously, our meta-analysis is limited by the availability of data. Furthermore, it is limited by the large heterogeneity in results, which may be caused by a number of factors. However, we suggest that the study of patterns of heterozygosity and developmental instability may progress if future studies focus on the following problems. First, more than half of all within-population estimates are based on fish. A taxonomically more diverse data base would allow an better assessment of general patterns. Studies of amphibians and reptiles should resemble those of invertebrates and fish, whereas studies of birds should resemble those of mammals. The almost complete lack of studies of plants also needs to be addressed (but see Sherry & Lord, 1996).

Secondly, current estimates of heterozygosity used in studies of the relationship between developmental instability and heterozygosity are all based on a small number of enzyme loci. Such estimates do not provide reliable estimates of genome-wide genetic variability (Chakraborty, 1981; Chakraborty & Ryman, 1983; Mitton, 1993). Information on genetic variation based on modern molecular techniques may provide better (or at the very least independent) estimates of heterozygosity. Tests of the relationship between heterozygosity and developmental instability based on DNA variation may provide a more rigorous test of the hypothesis, and they could shed light on possible differences between heterozygosities at coding and neutral loci. Pogson & Zouros (1994) used this approach with great success, producing evidence against the hypothesis that correlations between heterozygosity and growth rate in the scallop Placopecten magellanicus result from nonrandom genotypic associations between the scored loci and other loci segregating for deleterious recessive genes (the associative overdominance hypothesis).

Thirdly, detailed studies of particular enzyme systems may reveal better information on the mechanisms involved in generating a relationship between heterozygosity and developmental instability.

We conclude that the hypothesis is only weakly supported by available data, and we question whether heterozygosity explains more than a very small amount of the variation in developmental instability among individuals and populations.