Introduction

Fluctuating asymmetry (FA), defined as minor non-directional deviations from bilateral symmetry in morphological characters (Van Valen, 1962), is widely regarded as an index of developmental stability, that is, an organism's ability to precisely express genetically determined developmental pathways despite environmental disturbances. Although there is still a general controversy about the value of FA as an estimator of an organism's fitness (e.g., Rasmuson, 2002), a long-held theory, going back at least to Lerner (1954), connects developmental stability with genetic variability. More heterozygous individuals, the argument goes, more efficiently stabilize their development by buffering against environmental insults. This may be owing to the direct influence of certain key loci, to the influence of loci linked to the loci being studied or due to overall genomic heterozygosity (e.g., Mitton, 1978, 1993). Probably the best argument for the key loci hypothesis are the findings of a direct connection between null alleles at the lactate dehydrogenase loci and asymmetry in rainbow trout (Leary et al., 1983a, 1993; Messier and Mitton, 1996). As heterozygosity is inversely related to inbreeding, the relationship between FA and genetic variability may also be the result of different levels of inbreeding (see Clarke et al., 1986; Møller and Swaddle, 1997) although there are earlier studies that did not find any correlation between FA and inbreeding coefficients (e.g., Clarke et al., 1992; Fowler and Whitlock, 1994). To test further the role of genetic variability many studies have been carried out to detect a possible correlation between FA and genetic variability. Most of these studies measured variability as heterozygosity at allozyme loci (for a review see Møller and Swaddle, 1997). If, however, it is overall heterozygosity rather than specific key loci, which causes developmental stability, variability at neutral non-coding loci should also be correlated with the level of FA. So far, there have only been few studies based on non-coding markers such as microsatellites but first results do not corroborate the hypothesis of a role of overall genomic heterozygosity (Kruuk et al., 2003; Borrell et al., 2004).

Another important issue which is still being debated is if there is a general difference between homeotherms and poikilotherms with respect to the relationship of FA and genetic variability. It is noteworthy that most of the studies yielding a negative correlation between developmental instability (as measured by FA) and genetic variability were carried out on poikilotherms using non-metric traits (e.g., Vrijenhoek and Lerman, 1982; Leary et al., 1983b; but see, e.g., Hosken et al. (2000) and Réale and Roff (2003) for no such correlation with metric traits), whereas for homeotherms (for which FA is mostly quantified by metric traits, see Discussion section) no such relationship was found in the majority of cases (e.g., Hartl et al., 1991; Sert et al., 2005; for a review see Novak et al., 1993). In line with this, a meta-analysis conducted by Vøllestad et al. (1999) yielded the tendency of a positive association between FA and heterozygosity in homeotherms and the tendency of a negative association for poikilotherms whereas altogether there was ‘only a weak association between heterozygosity and FA’ (Vøllestad et al., 1999, page no. 215).

In the present study we analyse five German populations of European roe deer (Capreolus capreolus) with respect to genetic variability at selectively neutral microsatellite loci and FA in metric and non-metric skull and mandible traits. We can thus address some of the aforementioned questions on the relationship between developmental stability and genetic variability, in particular (1) if variability of neutral molecular markers also shows associations with FA (which would favour the hypothesis of the role of overall genomic heterozygosity); (2) if there is a difference between metric and non-metric traits in studies on homeotherms and (3) if the hierarchical level of analysis (individual, population) is of relevance. The latter point is important because both Karvonen et al. (2003) and Hartl et al. (1995) found a relationship on the population level only although neither study yielded a corresponding result for the level of the individual.

Materials and methods

Specimens studied

The study is based on 105 roe deer specimens from five different populations in Schleswig–Holstein, northern Germany (Figure 1). Foehr (FO, n=18) and Fehmarn (FM, n=26) are island populations, whereas Nordfriesland (NF, n=22), Schleswig (SL, n=16) and Rantzau (RA, n=23) are mainland stocks. The two island populations were artificially founded in the 1930s, but repeated subsequent introductions mitigated the founder effect in both populations (see Zachos et al., 2006 for details). Owing to overhunting the complete North-German roe deer population went through a bottleneck in the middle of the nineteenth century, but there have been numerous introductions since. Thus, Zachos et al. (2006) found high variability of the Schleswig-Holstein roe deer at microsatellite loci and the mitochondrial control region (although not at allozyme loci). The animals sampled were adults only with a minimum age of 2 years as determined qualitatively from dentition, skull sutures, antler structure and, in some cases, recognition of individual specimens.

Figure 1
figure 1

Map of northern Germany showing the geographical location of the populations studied (taken from Zachos et al., 2006). FO, Foehr; NF, Nordfriesland; SL, Schleswig; FM, Fehmarn; RA, Rantzau; KI, Kiel; HH, Hamburg.

Microsatellite data

Microsatellites are short tandem-repetitive DNA units with repeat lengths of 1–6 base pairs. Mutation rates are very high leading to large allele numbers with the various alleles differing in repeat unit number and hence in length. The 105 roe deer were genotyped for eight polymorphic microsatellite loci: OarFCB304, RT1, RT7, ILSTS008, ILSTS058, NVHRT16, NVHRT21 and NVHRT24. The genotypic data were the same as used by Zachos et al. (2006, see this article for references on the loci used).

Genetic variability of the individuals was measured as percentage of heterozygous loci (individual heterozygosity H) and as mean d2 parameters. The latter is calculated as the sum of the squared differences of the two alleles at a locus divided by the number of loci analysed (Coulson et al., 1998). In addition to the common mean d2 parameter we also used the outbreeding-mean d2 values (mean d2out; Coulson et al., 1999) which are calculated as explained above except that homozygous loci are neglected. The rationale of calculating these parameters is that the length difference between the two alleles at a locus in an individual contains information on the genetic distance between the two gametes that gave rise to this individual and thus may serve as an indicator of genetic variability (Pemberton et al., 1999). Although mean d2 parameters have recently been criticized and shown to have generally less power in detecting genotype-fitness correlations than heterozygosity (Tsitrone et al., 2001), we included them in our calculations to produce data directly comparable to previous analyses on genetic variability and FA (Borrell et al., 2004; Kruuk et al., 2003).

Population values for H and mean d2 parameters were calculated as the mean variability values over all individuals of the population. Expected heterozygosity (HE) was calculated as the mean over all eight loci for each population using the Arlequin software (Schneider et al., 2000).

In all analyses (genetic and morphological), Bonferroni procedures were used with a nominal α of 0.05 to correct the critical values for multiple tests (Rice, 1989).

Deviation from normal distribution and homogeneity of variance of H and mean d2 were tested for each population separately by means of Kolmogorov–Smirnov and Levene tests. As none of the tests was significant a two-way ANOVA was conducted for each of the three variability parameters to test for differences between the two sexes and the five populations. Correlation between H and the two mean d2 parameters was tested for with a Pearson test. All three tests showed positive correlations, especially the two mean d2 parameters (r=0.863, P<0.0005). In order not to obtain redundant results, only H and mean d2out, whose correlation with H was much weaker than the one of mean d2 and not significant after Bonferroni correction (r=0.176, P=0.037), were used in tests of correlation with FA.

FA in metric traits

For assessing FA in metric traits 15 bilateral skull and mandible measurements were taken on the left and right side of each specimen (Figure 2). Measurements were taken exclusively by one of the authors (FEZ) to avoid possible inter-observer variability (Lee, 1990). The bilateral traits were measured twice (and then averaged) with digital callipers to the nearest 0.01 mm or, if exceeding 15 cm in length, with common callipers to the nearest 0.05 mm. As right-left differences in general are often small and (just like FA!) normally distributed around a mean of 0, estimation of measurement error is indispensable (Merilä and Björklund, 1995). Its relative contribution was determined as follows (cf. Hartl et al., 1995). Each measurement was repeated the three times on each side in 20 individuals. On the basis of resulting data set of 120 measurements for each trait (two sides × three repeated measurements × 20 individuals) two-way ANOVAs with individual and side as fixed factors were carried out. Measurement error was considered insignificant if the sum of the variance due to side and of the variance owing to side/individual interaction was at least twice as high as the residual variance. Using this criterion, none of the FA estimates was affected by measurement error.

Figure 2
figure 2

Bilateral metric skull (a) and mandible (b) measurements examined in the roe deer. JUP, jugal – upper second premolar; JM, jugal – tip of maxillary; UMM, upper third molar – tip of maxillary; PM, premaxillary length; NL, nasal length; UTR, upper tooth row length; UML, upper molar row length; UM, length of upper third molar; GCM, Gonion caudale – tip of mandible; GCP, Gonion caudale – lower second premolar; PAM, Processus articularis – tip of mandible; LTR, lower tooth row length; DL, diastema length; GVP, Gonion ventrale – Processus articularis; DH, dental height. The two points used for taking each measurement are indicated by arrows.

To test for the occurrence of directional asymmetry we used sign-tests (right vs left measure). Antisymmetry was examined using Kolmogorov–Smirnov tests of frequency distributions of right-left differences compared to an expected normal distribution (Palmer and Strobeck, 1986). For single traits the following FA index was used:

where R and L are the measurements on the right and the left body side, respectively. The denominator corrects for trait size in order not to weight traits with respect to their absolute length. This FA index has often been used before and, thus, our results are directly comparable to a variety of previous studies (e.g., Hartl et al., 1995; Suchentrunk et al., 1998; Kruuk et al., 2003; Sert et al., 2005). Pairwise correlation of single FA indices was tested by means of Spearman rank tests.

An overall FA index that combines all metric traits of each individual (FAM) was calculated as the arithmetic mean of all scorable FA values. Owing to damage during preparation of the skulls not all traits could be measured in all individuals. We only included individuals with at least eight scorable FA values (102 out of 105 specimens). The influence of sex and population on metric overall FA was tested with a one-way ANOVA (sex) and, owing to variance heterogeneity among populations (Levene test), a Kruskal–Wallis test (population).

To test if the potential difference between metric and non-metric traits is caused by the threshold character of non-metric traits we also carried out tests where we treated our metric data as threshold traits and then conducted calculations as for the non-metric traits. We set a threshold by defining – for each single metric trait – only the 10 (in a second approach: 20) individuals with the highest FA value as asymmetric and the rest as symmetric. The reason for the arbitrarily chosen 10 or 20 specimens is that we did not want our samples to become too small, and any choice of a threshold of this kind inevitably has to be arbitrary. We then calculated an individual asymmetry index as the proportion of asymmetric traits and carried out correlation analyses as described for the non-metric traits.

FA in non-metric traits

A total of 18 non-metric skull and mandible characters were used for assessing asymmetry in non-metric traits (Table 1), many of which have been used in FA studies on roe deer before (Markowski and Markowska, 1988; Markowski, 1993). For each trait only the presence or absence of symmetry (i.e., the occurrence of equal or different character states on the right and left side) was scored.

Table 1 Non-metric bilateral skull and mandible characters used in the present study

For differentiating between fluctuating and directional asymmetry we used Wilcoxon signed-rank tests. Pairwise association of single non-metric traits was tested with χ2 tests.

As an individual index of overall non-metric fluctuating asymmetry (FANM), we chose the proportion of traits asymmetric in each individual (Suchentrunk, 1993; Hartl et al., 1995). Using χ2 tests, we tested the data of the (arbitrarily chosen) right side for significant differences among the five populations studied to rule out the possibility of a purely numerical correlation of non-metric FA and character distribution, that is, different liabilities of FA of non-metric characters due to different character frequencies in the five populations (e.g., a trait with a very low frequency in one population can only have a low FA in this particular population, whereas the same character might have a higher level of FA in populations with higher frequencies of that trait). Since frequencies of occurrence did not vary significantly across populations in any of the 18 traits, all characters were used in further analyses.

The influence of sex and population on FANM was tested with a two-way ANOVA after Kolmogorov–Smirnov and Levene tests had shown their normal distribution and variance homogeneity, respectively.

To test whether metric and non-metric FA were correlated, as would be expected if these indices reflect overall developmental stability, Spearman rank tests were carried out on the individual level within and across populations and on the population level across populations. Population-specific FAM and FANM were calculated as the arithmetic mean of all individuals from the respective population.

FA and genetic variability

To test whether some subgroups of metric traits (e.g., mandible traits belonging to the same functional unit) are more prone to be correlated with genetic variability than others we first carried out two Spearman rank tests on each single metric trait (one for correlation with H and one for correlation with mean d2out) over all individuals.

FAM and FANM were tested for correlations with genetic variability on three levels: (1) over all individuals across populations (n=102 for metric and 105 for non-metric traits), (2) over all individuals within populations and (3) among populations using mean values calculated as explained above. For levels (1) and (2) we only used H and mean d2out as indices of genetic variability, for level (3) we also tested for a relationship between FA and expected heterozygosity (HE). Tests on the individual level (1, 2) were carried out using Pearson correlations (normal distribution was shown by Kolmogorov–Smirnov tests) for FAM and Spearman rank tests for FANM. Tests on the population level (3) were all conducted with Spearman rank tests.

Results

Microsatellite variability

Individual heterozygosity H of the 105 roe deer ranged from 0.125 (one out of eight loci heterozygous) to 1.0 (all loci heterozygous). Neither H nor the two mean d2 parameters showed significant differences between the sexes or among the populations (0.347<P<0.896). Population-specific values of H, HE and mean d2out are shown in Table 2.

Table 2 Population-specific values of microsatellite variability used for the correlation analyses with FA

FA in metric traits

Right-left differences of each measurement were normally distributed so that antisymmetry could be ruled out. The sign test for the nasal length data, however, yielded a value significantly different from zero (P=0.002). Owing to this directional asymmetry the trait was not included in further analyses.

The Spearman tests showed significant correlations between three pairs of single-trait FA: JUP/JM, UTR/UML and GCI/GCP. However, since the correlation coefficients were rather small (0.494, 0.395 and 0.459, respectively), we considered the traits to yield sufficiently independent information and, thus, they were all used for further calculations. Individual overall asymmetry (FAM) showed neither sex nor population dependence. Table 3 summarizes descriptive statistical parameters of the 14 metric traits used in this study.

Table 3 Asymmetry of metric traits measured in the present study

FA in non-metric traits

No cases of directional asymmetry were found and the χ2 tests did not yield significant associations between any two traits. FANM neither depended on sex nor on population. Table 4 summarizes descriptive statistical parameters of the 18 non-metric traits.

Table 4 Asymmetry of non-metric traits measured in the present study

Metric and non-metric asymmetry were not correlated on any of the three levels studied (over all individuals pooled, over individuals within populations, among populations).

FA and genetic variability

None of the 14 single metric traits showed any statistically significant correlation with H or mean d2out (P⩾0.014 with a Bonferroni-corrected significance level of 0.0036).

The results of the correlation analyses for FAM and FANM on all three levels are shown in Tables 5 and 6. For the metric traits, there was merely a nonsignificant (after Bonferroni correction) tendency for FAM to be weakly negatively correlated with H (r=−0.236, P=0.008) across all individuals but for the non-metric traits we found a strongly negative and significant correlation (r=−0.975, P=0.002) between non-metric FA and average individual heterozygosity among populations. All other correlations, including those involving HE, yielded nonsignificant results. This also held for the tests where we treated our metric traits as threshold characters (not shown in further detail).

Table 5 Pearson (rp) and Spearman (rs) correlations between asymmetry and genetic variability across all individuals and among populations
Table 6 Pearson (metric FA) and Spearman (non-metric FA) correlations between asymmetry and genetic variability across individuals within populations

Discussion

It is generally assumed that functionally important traits are subject to a stronger pressure of balancing selection, which, in the case of bilaterally symmetric traits, is supposed to result in lower levels of asymmetry (e.g., Stearns, 1992). The characters in our study did not yield any conspicuous FA patterns; in particular, tooth characters did not consistently show low FA as found in other studies (e.g., Suchentrunk, 1993; Suchentrunk et al., 1994). The lack of correlation between single-trait FA demonstrates that FA in a given character is hardly predictive for FA in other characters and thus does not corroborate the hypothesis of the existence of population or individual asymmetry parameters (PAP, IAP, Soulé, 1967; Clarke, 1998), that is, a consistent ranking of individuals or populations for the asymmetry of several characters. The lack of a consistent correlation between metric and non-metric FA is in line with the results of Zima et al. (1989) who did not find a ‘clear-cut correlation’ (p. 54) between metric and non-metric distances between roe deer populations either.

Compared with the study by Markowski (1993), who analysed population differentiation in roe deer based on non-metric traits many of which are identical to traits used in the present study, our FANM values are considerably higher (0.12–0.18 vs 0.23–0.29). Apart from possible biological reasons (FA might really be higher in the roe deer of the present study) this discrepancy may also be owing to inter-observer variability (Lee, 1990). The claim that non-metric traits are generally superior to metric ones in population studies on mammals owing to their not being correlated with one another and not being sex-dependent (e.g., Berry, 1968; see also Markowski, 1995) has not been corroborated by the results of the present study in that there were no differences between metric and non-metric traits with respect to correlation and sex dependence.

It has long been debated whether FA levels depend upon genetic variability with numerous studies favouring such a relationship and equally numerous studies rejecting it (see Møller and Swaddle, 1997 for a review). The present study did not yield unequivocal results either in that we found a tendency for a (weakly) negative correlation of metric FA and heterozygosity over all individuals (the same nonsignificant trend was found by Alves et al., 2001) and a statistically significant strongly negative correlation of non-metric FA and heterozygosity among populations but no further significant relationships. The fact that we did not find any such trend or correlation for genetic variability quantified by mean d2out parameters is in line with the criticism put forward against these measures (see above). Nevertheless, our study is, to our knowledge, the first one to report a microsatellite-based negative correlation between FA and genetic variability. This is important because it is still an open question if the relationship between developmental stability and genetic variability (if there is any at all) is due to the physiological properties of specific enzymes encoded by certain key loci or owing to overall genomic heterozygosity. Our results based on microsatellites, which are non-coding DNA regions commonly regarded as selectively neutral, favour the latter alternative (against Kruuk et al., 2003 and Borrell et al., 2004) although linkage with key loci can never totally be ruled out. It has to be borne in mind, however, that all genetic parameters calculated in our study are based on eight loci only, which is probably not enough to be representative of genome-wide heterozygosity (see Chakraborty, 1981; Slate and Pemberton, 2002).

Results showing a negative correlation of FA and genetic variability, as expected a priori from the theory, are scarce for homeothermic species. Studies on mammals, according to our knowledge, only yielded the expected negative correlation twice (Hutchison and Cheverud, 1995 on tamarin monkeys and Hartl et al., 1995 on brown hares). The study by Hutchison and Cheverud, however, is not directly comparable to similar studies because (1) their analysis comprises two different species and (2) they do not directly measure heterozygosity but rather provide qualitative ranks of genetic variability. Hartl et al. (1995) found a negative correlation between non-metric FA and enzyme heterozygosity in the brown hare (Lepus europaeus) on the level of the population but no such relationship for the individuals or metric FA values. They suggest that it might be the choice of characters in FA calculation rather than a principal difference between homeotherms and poikilotherms that accounts for the non-concordant results in these two groups because in studies on poikilotherms measurements of FA are usually based on non-metric traits such as bristle number in insects or fin ray number in fish species, whereas in studies on homeotherms FA is mostly determined through measurement of metric skeletal traits.

Interestingly, the present study yielded results very similar to those arrived at by Hartl et al. (1995) in that the only statistically significant correlation between FA and heterozygosity was found among populations and for non-metric traits. Why non-metric traits might be better suited for FA analyses remains unclear but they are, contrary to metric traits, threshold characters: as they do not vary continuously, phenotypic asymmetry (presence vs absence) only occurs when a certain threshold difference between the right and left side is achieved (cf Swain, 1987). If the relationship of FA and genetic variability only holds for very asymmetric individuals asymmetry in non-metric traits may a priori be more likely to reveal this relationship. Furthermore, metric traits might be under stronger balancing selection for symmetry since many traits typically measured in FA studies are functionally important (e.g., tooth rows or mandible length). Our approach of treating metric traits as threshold characters, however, did not yield the same results as the non-metric traits. This certainly does not completely refute the threshold explanation of the difference between metric and non-metric traits, but at least we did not find a corroboration for it either. At any rate, the fact that the two negative correlations of FA and genetic variability found in mammals so far were both based on non-metric traits is remarkable given that most studies on mammals were conducted with metric traits. The often claimed difference between homeotherms and poikilotherms (see also the aforementioned meta-analysis by Vøllestad et al., 1999) may thus be merely a methodological artefact rather than a biological phenomenon caused by the more homogeneous developmental environment owing to a more constant body temperature in homeotherms. If the constant developmental environment in homeotherms is crucial to FA studies, that is, if developmental noise is relatively more important in homeotherms than it is in poikilotherms because the effects of genetic variability are veiled by the generally lower levels of FA in the former, then mammals might turn out to be particularly well suited to test this hypothesis. The three mammalian subgroups (monotremes, marsupials and placental mammals) differ considerably in their early development with monotremes being oviparous, marsupials having a very short gestation period and giving birth to little-developed young and placental mammals undergoing a long intra-uterine phase of development. There is, in other words, an increase in ‘developmental homeothermy’ from monotremes to placental mammals, and thus, if the hypothesis of a difference between homeo- and poikilotherms holds, we expect a comparative decrease in FA levels from monotremes over marsupials to placental mammals. To date, no such comparative analysis has been conducted.

A further important issue of the present study is the fact that the only clear-cut correlation between FA and genetic variability was found not on the level of the individual but among the populations – another parallel to the results of the study on brown hares by Hartl et al. (1995). Similarly, Karvonen et al. (2003) found this correlation in greenfinches only when comparing populations – individual heterozygosity (based on allozyme loci) was not related to individual FA. One possible reason for this could be that a limited number of loci used in variability assessments is more likely to uncover differences in general heterozygosity among individuals from different populations than among individuals taken from the same population (Palmer and Strobeck, 1986). Additionally, the fact that measuring FA on the individual level is an attempt to estimate a variance with only two data points, which is likely to result in large sampling errors (Whitlock, 1996), may also be relevant.

Conclusion

Our study is one of only few to find a negative relationship between FA and genetic variability in homeotherms in general and mammals in particular, although this relationship was only shown for the population level. Ever since the debate about possibly raised asymmetry in the cheetah (Wayne et al., 1986; Modi et al., 1987; Willig and Owen, 1987; Kieser and Groeneveld, 1991), a species known to be genetically depauperate, this relationship has been uncertain for mammals. The results of the present study, however, suggest, in line with one earlier study (Hartl et al., 1995) that the lack of confirmation may be caused by methodological rather than biological factors. Future studies should be carried out bearing in mind that the choice of traits (metric vs non-metric) might bias the results of correlation analyses between FA and genetic variability. The hierarchical level of comparison (individual vs population) may also be important given that in most analyses the number of molecular markers studied is around 10. The question of the genetic factors governing developmental stability is still far from being settled. Although first microsatellite analyses did not produce a negative correlation between FA and variability (Borrell et al., 2004), our results show that further research into the relative role of key loci and overall genomic heterozygosity is required.