Introduction

Measurement of fluctuating asymmetry (FA), small nondirectional differences in size between symmetric paired morphological structures (Van Valen, 1962), is a potentially useful tool for detecting factors that influence developmental instability (DI). Lower values of FA may reflect development that proceeds more precisely, whereas elevated FA values may be due to problems during development that reflect increased DI.

Differences in the realized trait due to different levels of DI can even be so large as to contribute to the overall observed trait variance. Reviews of natural and laboratory populations (Lajus et al., 2003; Hansen et al., 2006) have shown that variance in the realized trait due to different levels of DI account for a mean of 31 or 15% of the overall trait variance, respectively. Differences in population DI may affect the evolutionary potential of populations by influencing their phenotypic variances in a manner that is neither directly environmental nor caused by genotypic variation for the mean value of the trait in question.

Many studies have demonstrated increased FA (reflecting increased DI) in populations undergoing stress due to both environmental (Ji et al., 2002; Mpho et al., 2002) and genetic factors, such as inbreeding (Waldmann, 1999; Schaefer et al., 2006), but others have failed to find a relationship between FA and environmental (Bjorksten et al., 2001; Kruuk et al., 2003; Sonne et al., 2005) or genetic factors (Fowler and Whitlock, 1994; Rao et al., 2002; Kruuk et al., 2003); consensus regarding the utility of FA as a stress indicator is lacking (Palmer, 1996; Leung et al., 2003).

Several statistical and methodological factors pose challenges for any study designed to estimate FA differences between populations. These factors include: (1) the potential presence of other forms of asymmetry, such as directional asymmetry (DA) and antisymmetry (AS); (2) the extreme sensitivity of FA to measurement error and (3) the inherent statistical difficulties of estimating a variance.

Because both AS (negatively correlated variation among paired structures such that one side is always larger than the other, but the identity of the larger side varies) and DA (a consistent difference in means between paired structures such that, for example, the right side is always larger than the left) contribute to estimators of asymmetry, the presence of these forms of asymmetry must be accounted for before a measure of FA is calculated. AS causes platykurtic signed FA distributions. In the presence of AS, the optimum degree of asymmetry is not necessarily zero, so the potential adaptive significance of FA is unclear. Traits showing AS are therefore usually omitted from studies of FA. The presence of DA causes fewer problems, as there are two cases where left–right differences can still reflect DI. First, one can assume that the observed mean DA represents the optimum. The lack of convincing demonstrations of genetic variance in DA (Carter et al., submitted) makes this a questionable assumption. A potentially more robust assumption is that the genetic basis of FA and DA are different, so variance around the mean state of the population, reflecting DI is independent of differences in trait means. In either of these cases, only a portion of the observed FA reflects the influence of developmental precision (Graham et al., 1998). To correct for the presence of DA, we subtract the mean DA from the signed differences between sides (Graham et al. 1998).

Estimates of FA are extremely sensitive to errors in measurement for three reasons. First, FA values are typically very small compared to the trait value, and measurement techniques adequate for measurement of the trait mean may not be precise enough to reveal differences on smaller scales. Second, because each FA estimate incorporates error from two independent measurements, measured FA variance includes two error components. Third, variance due to measurement error always inflates raw FA values (Palmer, 1994).

Ignoring measurement error in FA studies (Grahn and von Schantz, 1994; Norry et al., 1998) is not justifiable, even when it is not statistically significant (Crespi and Vanderkist, 1997; Badyaev et al., 1998), as any measurement error at all will certainly inflate FA. A better approach is to estimate the magnitude of the error variance and to remove it from the observed FA.

An additional underappreciated aspect of FA estimation is that FA is in fact an estimate of the variance of structures within an individual (Houle, 2000; Palmer and Strobeck, 2003). If AS, DA and measurement error have been considered as described above, the observed FA reflects differences in the trait when development proceeds twice. It is an estimate of the phenotypic variance of an identical genotype estimated from only two data points per trait per genotype. Because of their mathematical properties, variances are far less precisely estimated than means, given a particular population distribution and sample size. A far larger sample size is therefore required to adequately characterize mean FA values than mean trait values. Many past studies of the effects on inbreeding on FA have used sample sizes as small as 10–20 individuals (Roldan et al., 1998; Forsman and Merilaita, 2003), values far too low to allow accurate comparisons of variances.

We investigated the effect of a genetic stress (inbreeding) on the values of FA for a series of distances between vein intersections in the wing of Drosophila melanogaster in two outbred, unrelated laboratory populations. From each population we produced inbred and outbred treatments of flies. We measured 560 outbred and inbred females in one population, and 600 in the other. The large sample size allowed us to explicitly estimate and account for DA and measurement error before comparing inbred and outbred treatments. The large sample size also allowed us to accurately estimate the magnitude of phenotypic variance due to DI and the degree to which this caused changes in the overall phenotypic variance.

Methods

Base populations

We used two independently derived populations of D. melanogaster. The first, which we designate IV (Houle and Rowe, 2003), derives from a sample of 200 flies caught in 1975 by PT Ives in Amherst, Massachusetts, and has been maintained since that time by PT Ives (1975–1976), B Charlesworth (1976–1992) and DH (1992–present). The second population, which we designate LHM, derives from 400 flies collected by L Harshman in central California in 1991 and has been maintained since that time by L Harshman (1991–1995), WR Rice (1995–2004) and DH (2004–present). Throughout their maintenance by DH and for the duration of this experiment, flies were maintained under a 12:12 h light–dark cycle at 25 °C. During the course of the experiment, flies were cultured in plastic shell vials (95 mm height, 30 mm diameter) with a cornmeal medium.

Mating and measuring scheme

To generate treatments that differed only in their degree of inbreeding, we performed several single-pair crosses and collected their F1 offspring. We generated inbred populations by mating full sibs, and outbred lines by mating flies with unrelated parents. The individuals used in the study reported here were therefore F2 individuals from the crosses shown in Figure 1. Thus, inbred families are unrelated, whereas outbred families share one set of grandparents with each of two other outbred families. For the IV population, we performed 7 initial crosses and measured 40 females from each inbred and outbred F2 family, for a total of 560 flies measured. For the LHM population, we performed 10 initial crosses and measured 30 females from each inbred and outbred F2 family, for a total of 600 flies measured. To control for bias due to effects of age- or rearing-induced variation, we used a measurement scheme wherein 10 flies from each family were processed, inbred and outbred families alternately, until all families were processed, and then the sequence was repeated.

Figure 1
figure 1

Mating scheme used to generate outbred and inbred flies from each of two laboratory stocks, designated LHM and IV. Either 10 (LHM) or 7 (IV) initial pairs of flies were used to found F1 populations, from which single males and females were chosen to mate either within (inbred, sib-sib matings) or across (outbred, nonrelated matings) F1 sets to generate the desired F2 experimental populations. The final experimental populations consisted of 30 and 40 females from each inbred and outbred set in the LHM and IV lines, respectively, for 600 and 560 flies in total.

Image acquisition and data processing

The project made use of the automated WINGMACHINE system for the acquisition of digital images of the wings of living flies (see Houle et al., 2003, for a complete description). In brief we anesthetized flies with CO2 and recorded digital images of their wings. Entry of two user-defined landmarks (0 and 6 in Figure 2b) provided the processing software with two reference positions. The image was then processed by several computer programs that generated a B-spline model of the veins, as shown in Figure 2a. This process is relatively free from human bias and error. For the present study, we calculated the distances between pairs of landmarks formed by the vein intersections shown in Figure 2b.

Figure 2
figure 2

(a) A splined fly wing. Colored lines indicate spline approximations to the veins in the wing; arrows indicate user-defined landmarks required for the splining procedure (see Houle et al., 2003, for details). (b) Landmarks defined by wing–vein intersections and used for distance measurements. Intersections 0, 6–10, 15 and 16 were deemed too prone to measurer bias or inaccurate estimation for use in the present study.

Distance and FA data

A series of 35 distance traits was calculated for each wing (all pairwise distances between landmarks 1–5 and 11–14 in Figure 2b). Distances based on landmarks known to be relatively imprecise or influenced by measurer choice (landmarks 0, 6–9, 15 and 16) were excluded a priori, as was the extremely small distance between landmarks 13 and 14. Corresponding distances on the left and right wings were denoted L and R, trait mean and variance values reported hereafter are for the left wing values only unless specified otherwise. DA was taken into account by subtraction of the population mean (L–R) value from each individual's (L–R) value; the outbred and inbred treatments were pooled for this calculation. These DA-corrected values were used for all further analyses.

A Grubbs analysis (Sokal and Rohlf, 1981) identified outlier wings, and four individuals were excluded because of poor splines (two individuals from LHM-inbred and one from LHM-outbred) or extremely unusual FA values (one individual from IV inbred). This reduced set of data was used for all further analyses (Tables 1, 2 and 3).

Table 1 Summary values for the four experimental populations: inbred and outbred lines of the LHM and IV laboratory populations
Table 2 Statistics for 35 distances between points (shown in Figure 2) on wing veins of outbred and inbred lines of the Drosophila stock designated LHM
Table 3 Statistics for 35 distances between points (shown in Figure 2) on wing veins of outbred and inbred lines of the Drosophila stock designated IV

Data from a separate set of 200 twice-measured flies were used to estimate measurement error for each distance trait and was estimated from

where FA1 and FA2 are the signed FA values from the first and second measurements (data given in Tables 2 and 3). This variance term is directly subtracted from observed Var(L−R) values to remove measurement error. An error-corrected unsigned FA value is given by

where FAobs is the mean DA-corrected FA value (Pelabon et al., 2004). The relative magnitude of the change in unsigned FA values due to this error correction averaged about 17% for the 140 distance, population and treatment combinations; over one-third (53) were below 10% and only 6 above 40%. The trait distance and DA-corrected unsigned FA for that distance showed no consistent relationship in individuals (mean and median R2 for all data=0.0355, 0.0151; mean and median correlations for all data=−0.019, −0.017, signs of correlations were evenly distributed with 45.5% correlations positive) so size scaling of the FA values was not performed. For the rest of the paper, mean FA values generally refer to the mean DA and error-corrected FA values using Equation (2) unless otherwise indicated.

Two other values were calculated. The coefficient of variation (CV; standard deviation divided by the mean) of FA provides a measure of variation in DI, if CV(FA) exceeds 75.6% then real differences in DI between individuals is implied (Palmer and Strobeck, 2003). To compare the amounts of phenotypic variance that is due to FA with those found in other studies, we computed the between-individual variance among wings that is due to differences between wings on the same individual (Vd) as

(Hansen et al., 2006). This value measures the importance of DI; dividing Vd by overall phenotypic variance makes the impact of DI comparable with genetic and environmental factors. In calculating Vd we corrected for measurement error effects by subtracting the variance due to measurement error from the raw Var(L−R) value.

Values reported in tables are from pooling over all individuals in each breeding treatment in each population.

Owing to the nature of the design, the principal unit of replication in this experiment was the family mean. Consequently all analyses were done on family means. Neither analysis is strictly correct because outbred families were related; outbred family i shared grandparents with outbred families i−1 and i+1, as shown in Figure 1. To characterize the FA of an individual, we calculated an index consisting of the mean FA over all 35 distances.

Statistical tests are conducted upon the set of mean family values in tests for significant differences between the inbred and outbred treatments in each population. Analyses of family mean data were done through randomization and bootstrap techniques. For the randomization procedure the full set of 20 (LHM) or 14 (IV) family means were randomly rearranged into two groups and the difference in mean FA between the inbred and outbred populations was computed. The number of trials (of 1000) in which the difference exceeded that observed for the original data set was recorded to generate a P-value. For the bootstrap, the outbred families were sampled with replacement and then for each of these one of the related inbred families was selected to generate sets of pseudo-samples for comparison. This procedure pairs each outbred family with one of the two inbred families to which it is related (Figure 1). Thus, unlike the randomization, this partially controls for the possibility that there is genotypic variance in DI among families. The number of trials (of 1000) in which the mean of the inbred values exceeded that of the outbred values was recorded to generate a P-value. The randomization and bootstrap analyses were carried out separately for each of the 35 distances and the overall mean of these distances in each population.

The observed asymmetry values of the 35 distances used are not statistically independent for two reasons: shared DI processes influencing traits (Pelabon et al., 2004) and shared landmarks defining the phenotype. For this reason, conclusions relying upon the independence of the 35 distance values are not strictly correct. We present this data because even though the values are somewhat correlated, we believe that the large number of distances showing higher FA in the inbred treatments is strongly suggestive of increased DI due to inbreeding.

The potential contribution of DA in FA values was controlled for by corrections described above; we also tested for significant DA within each treatment and population combination. We compared the observed magnitude of DA observed to that expected from individuals possessing identical FA but no overall DA through a randomization procedure. The difference between the mean left and mean right wing for each distance was computed and the sum of the absolute values of these differences is then a measure of overall DA. The identities of the left and right wing were then randomized and this sum recalculated to provide an expected value of this sum if no DA exists. The number of cases from 1000 in which the sum after randomization exceeded the original observed sum generated an expectation of the probability of seeing the magnitude of DA observed, our P-value. This was done for the inbred and outbred treatments in both populations.

Results

Data summaries for each of the 35 measured distances are given in Tables 2 and 3 for the LHM and IV populations, respectively. Table 1 presents a summary of these values.

The mean trait size of inbred flies was slightly lower than that of outbred flies. Of the 35 distance means in each population, 28 were smaller in the inbred LHM flies and 25 were smaller in the inbred IV flies. The LHM inbred flies were only 0.31%, and the IV inbred flies only 0.16% smaller, relative to outbred, on average (Table 1). Overall size of the flies is therefore extremely similar. Randomization and bootstrap analyses comparing mean trait size in the families showed no difference in mean trait size between treatments (randomization: P=0.140 in LHM and P=0.291 in IV; bootstrap: P=0.670 in LHM and P=0.634 in IV).

The mean variance of individual trait values was larger in the inbred lines (LHM: 24.2% higher; IV: 25.7% higher). A randomization test showed no difference in mean trait variance between treatments (P=0.239 in LHM and P=0.320 in IV). A bootstrap test was similarly nonsignificant (P=0.432 in LHM and P=0.552 in IV). Closer examination of the variance values revealed that the observed difference in overall mean is driven by a single family with very high trait variance in each inbred treatment rather than a consistent variance increase across all families.

Phenotypic variance due to DI, Vd, also showed higher values in the inbred lines. These increases in Vd were of similar magnitude as the overall trait variance increase; the proportion of overall trait variance caused by Vd was therefore similar, although slightly smaller in the inbred lines, in the outbred and inbred treatments (see Table 1).

Doubling the values of Vd gives the variance of the signed differences between the sides, Var(L−R), a commonly used measure of FA. This value clearly showed the same pattern of higher values in the inbred treatments (Tables 1, 2 and 3), the mean increase in variance relative to the outbred value being 16% in the LHM lines and 38% in the IV lines. A randomization test showed mixed results for the significance of the difference in Var(L−R) values between treatments (P=0.082 in LHM and P=0.007 in IV). Results from a bootstrap test were similarly mixed (P=0.421 in LHM and P<0.001 in IV).

We conclude that the increase in Var(L−R) due to inbreeding is suggestive, but not statistically significant, in the LHM population whereas the increase in Var(L−R) is unambiguous and highly significant in the IV population.

To characterize the overall FA of an individual, we used the mean of the 35 unsigned, DA-corrected FA values. In both lines, the inbred treatment showed generally higher unsigned FA values after correction for mean DA and measurement error (LHM: 3.6% higher; IV: 10.9% higher; see Figures 3, 4, Table 1). To test whether these differences were significant, we calculated the mean FA index for each family. A Student's unpaired, one-tailed t-test on these means revealed a nonsignificant difference between the outbred and inbred LHM lines (P=0.17) and a highly significant difference between the outbred and inbred IV lines (P<0.005). Randomization and bootstrap tests for treatment differences of family mean FA showed similar results (randomization: P=0.134 in LHM and P=0.019 in IV; bootstrap: P=0.557 in LHM and P=0.009 in IV). We conclude that the increase in mean unsigned FA is suggestive, but not statistically significant, in the LHM population whereas the increase in mean unsigned FA in the IV inbreeding treatment is unambiguous and highly significant.

Figure 3
figure 3

Comparison of outbred and inbred fluctuating asymmetry values of LHM lines (in mm, corrected for mean directional asymmetry and measurement error) for the 35 pairwise distances between landmarks retained for analysis; line indicates 45° angle of equal fluctuating-asymmetry values.

Figure 4
figure 4

Comparison of outbred and inbred fluctuating-asymmetry values of IV lines (in mm, corrected for mean directional asymmetry and measurement error) for the 35 pairwise distances between landmarks retained for analysis; line indicates 45° angle of equal fluctuating asymmetry values.

Of the 35 FA means in each population, 26 were larger in the inbred LHM flies and 31 were larger in the inbred IV flies. FA values for the 35 distances are not independent, but the mean correlation coefficient among FA values within populations was only 0.13 (median=0.08), so that R2 between distances is only 1.7%. This suggests that the pattern of FA among traits could be meaningful. If the 35 indices are treated as independent, three tests suggest that the inbred treatments have significantly higher FA than outbred ones: sign test (LHM: P<0.006; IV: P<3.5 × 10−6), Wilcoxon matched-pairs signed-ranks test (LHM: P<0.025; IV: P<1.2 × 10−5) and Student's paired t-test (LHM: P<0.008; IV: P<3.4 × 10−7).

The mean of the CV(FA) values was 97.7% and 138 of the 140 CV(FA) values were over 75.6%, implying real differences in DI between individuals (Palmer and Strobeck, 2003). The randomization test showed a significant difference in mean CV(FA) between treatments (P=0.039 in LHM and P=0.026 in IV). The results from the bootstrap test were similar, although the difference in the LHM treatments is no longer significant (P=0.242 in LHM and P=0.026 in IV). These mixed results in the LHM treatments are due to the pattern of CV(FA) values, the inbred families showed a threefold higher variance of mean CV(FA) values (from the 20 family means, 5 of the lowest 7 are from the inbred treatment as well as 4 of the highest 5 values).

Our analysis reveals a significant degree of DA. Randomization tests showed significant DA in each population and each treatment (LHM: outbred P=0.002, inbred P=0.007; IV: outbred P<0.001, inbred P=0.001). These mean directional asymmetries were very low relative to the trait value however; only 12 of the 140 values (35 distances by two treatments by two populations) exceeded 0.5%, and only 1 exceeded 1% of the trait value (the distance between landmarks 1 and 11 was 1.15% larger on the left wings in the inbred LHM flies). If we treat the distance DA values as independent observations, of the 140 distances, half (73) were asymmetric at the P<0.05 level (paired t-test of left and right sides within each background and treatment) and one-quarter (35) were asymmetric after a Bonferroni correction (P<0.05/140). The LHM treatments had fewer asymmetric distances than the IV treatments in general (24 and 10 asymmetric at each P-value level above, respectively, compared to 49 and 25). The magnitude of DA was smaller than FA; the DA- and error-corrected FA of 131 of the 140 distance-treatment-population combinations was above 0.5% and 69 of 140 were above 1% (averaged over all traits and all treatments, the mean corrected FA was about 1.2% of trait size).

The patterns of the mean DA values of each treatment within each population match one another, but differed qualitatively between the two populations (Figure 5). Regression of the inbred DA values on outbred DA values within each population generated significant positive slopes of approximately 1 and R2 values of 0.70 (LHM) and 0.65 (IV). Regression of LHM estimates on IV estimates, however, generated nonsignificant slightly negative slopes and R2 values of 0.03 (LHM) and 0.09 (IV).

Figure 5
figure 5

Comparison of outbred and inbred mean directional asymmetry value for each of the 35 pairwise distances between landmarks retained for analysis. (a) LHM lines and (b) IV lines. The pattern differs qualitatively between populations, but it is similar in the treatments within each population.

Discussion

We have investigated the effect of inbreeding on FA in two populations, while correcting for measurement error and other forms of asymmetry. Our data indicate that while the mean trait values were essentially the same, the inbred lines in one of our populations (IV) clearly showed elevated levels of FA, and therefore DI. Our findings are consistent with those studies that have reported higher FA in inbred or more homozygous populations (Waldmann, 1999; Schaefer et al., 2006) as well as those that indicate that inbreeding may have different effects in different populations (Lens et al., 2000); confirming doubts regarding the consistent utility of FA as a stress indicator (Palmer, 1996; Leung et al., 2003). In the LHM line the mean increase in FA was 3.6%, and not statistically significant. In the IV line the increase was 10.9%, three times as large, and strongly significant. A comparison of these increases in the population trait FA means through an unpaired heteroscedastic t-test of the randomization data (that is, the sets of 1000 differences between the original data and the randomized data) indicated these differences did not significantly differ from one another (P=0.27). Nonsignificance does not prove a lack of a causal factor however; we speculate that in addition to random factors, differences between the lines may be due to different levels of genetic variation in the two populations, for example as a result of differences in past inbreeding, or due to mean differences in the norm of reaction of DI with inbreeding. The influence of inbreeding on the FA observed suggests differences in heritable contributions to FA; these differences are of interest because evidence for heritable FA has been weak (Fuller and Houle, 2003) or absent (Breuker and Brakefield, 2003).

Inbred families had 20% higher variance than outbred ones, even though the mean values of the traits were remarkably similar (Tables 1, 2 nad 3). Increases of this nature can be due to an increase in the variance among flies either due to an increase in genetic variation between populations (Falconer and Mackay, 1996), due to exposure of dominance variance in the traits themselves (David, 1999) or due to an increase in the susceptibility to environmental variation (Whitlock and Fowler, 1999). The increase we observed was driven in large part by a single family in each inbred treatment, suggesting that exposure of dominance variance in the traits may have been responsible. The variance between wings on the same fly due to DI, Vd, also increased. This increase in the variance due to DI for each trait mirrored the increase in the variance of the trait values themselves, consistent with the presence of segregating variation for developmental stability. The observation that the CV(FA) values were essentially all above 75.6% is also consistent with the presence of segregating variation for developmental stability.

The presence of significant DA and the similarity of that pattern in the treatments within each population, combined with a difference between the populations (Figure 5), suggests genetic variation for the pattern of wing landmark DA. Unambiguous evidence of heritable genetic variation for DA is notoriously absent (Lewontin 1974, Tuinstra et al., 1990). In a separate artificial selection experiment (Carter et al., submitted), we were unable to change the DA of posterior crossvein position in two other Drosophila populations after 15 generations of selection. Our data are therefore another piece of circumstantial evidence for genetic variation for DA; genetic variation that is often hinted at, but rarely, if ever, directly demonstrated.

Our observed values of Vd relative to trait variance (mean=9.1%, median=6.2% calculated from Table 1) can be compared to those reported from other studies. Hansen et al. (2006) reported relative Vd mean and median values of 15 and 6% for traits from natural populations, whereas Lajus et al. (2003) reported mean and median values of 31 and 26% for a selection of mostly laboratory populations. Our mean values are smaller than those reported by Hansen et al. and Lajus et al., but our medians are similar to those reported by Hansen et al. and closer to our means, indicating a less skewed distribution of these relative Vd values. The reason for our lower values may be our correction for measurement error, which Hansen et al. and Lajus et al. were unable to do. The relative magnitude of the error terms are too low to account for a full 50% reduction and the full cause of these lower Vd values remains unclear. The values reported here and elsewhere indicate that the contribution of DI to overall phenotypic variance is substantial. Furthermore, because it is neither directly environmental nor caused by genotypic variation for the mean value of the trait under question, the contribution of DI to phenotypic variance represents an underappreciated source of phenotypic variance.

In comparison to many other studies of FA, ours measured a relatively large number of individuals, allowing the quantification and removal of the effects of measurement error and DA. Not all studies examine these effects, and those that do often merely check for significance and then neglect them. Even when not significant, the added variance in FA due to measurement error inflates reported measurements, resulting in underestimates of any differences in FA between experimental treatments or populations. We urge careful consideration of these factors by other researchers performing similar studies in the future.