Introduction

Developmental plasticity, the ability of a genotype to vary its phenotype across environments, can allow organisms to cope with environmental variation by adjusting traits such as defenses, cold tolerance or foraging behavior, resulting in high survival and performance across that environmental gradient (Schlichting and Pigliucci, 1998; West-Eberhard, 2003). Despite the benefits of plasticity, organisms are not infinitely plastic. Why are some genotypes capable of expressing a wide range of alternate phenotypes, whereas others are more canalized around one developmental pathway? Biologists have long been fascinated with the costs and constraints that affect the evolution and expression of plasticity (DeWitt et al. 1998; Callahan et al., 2008; Auld et al., 2010). The ‘epiphenotype’ hypothesis emphasizes the importance of developmental timing—the earlier in development an individual receives information about the state of the environment, and starts to develop the appropriate phenotypic response, the greater the possible range of phenotypes that can be expressed (assuming that different environments favor different optimal traits, DeWitt et al., 1998). This is in part because organisms can avoid the costs of breaking down and rebuilding traits once the appropriate information has been received. Because information can be inherited across cell divisions (Skinner, 2011), information received earlier in development can direct the downstream differentiation of cells through ‘epigenetic cascades’ (sensu Atchley and Hall, 1991; see Figure 1).

Figure 1
figure 1

Timing of environmental information across development. If environmental information is received relatively earlier in development (point A versus point B), it has a greater capacity to affect the subsequent development of traits, potentially along alternate developmental trajectories. Through ‘epigenetic cascades,’ information can be inherited across cell divisions, influencing the development of that cell lineage. Information received later in development must reach more cells (indicated by the colored circles) when developmental differentiation may have already occurred (indicated by changing cell color over time).

The epiphenotype hypothesis has received support from a number of empirical studies. For instance, species of ants that receive nutritional cues about caste fate earlier in development express greater phenotypic differences across those castes (Wheeler, 1986) and dispersal polyphenisms are more pronounced in species of insects that have an earlier morph switchpoint (Zera and Denno, 1997). Defenses in snails that are induced later in development are not the same as those induced earlier, for instance, shells are often thinner than the predicted optimum (Hoverman and Relyea, 2007). The development of alternate leaf morphologies in aquatic and terrestrial Ranunculus leaves is generally more pronounced when an environmental shift occurs earlier in development (Bruni et al., 1996). However, plasticity in some traits and species has been shown to be relatively insensitive to developmental timing, suggesting that this idea may apply differently depending on the species and traits in question (Wimberger, 1991; Bruni et al., 1996).

If the timing of environmental information is an important constraint on the developmental range of plastic traits, then variation in life cycles and life history traits (Stearns, 1992; Fox and Czesak, 2000) could have important consequences for plasticity. The life history of a species can influence the time point at which the environment may start to have a significant effect on development. For instance, information on common resources or habitat structure is available to individuals once they start exploring the environment and foraging independently (Greenberg, 1989; Stamps, 1995; Eliassen et al., 2007; Slagsvold and Wiebe, 2007). This research tests the hypothesis that life history variation can constrain the evolution of plasticity by affecting the time point that the environment begins to have a significant role in development. In general, we predict that relatively earlier interaction with the environment leads to greater opportunity for the environment to influence trait development. This is a broad definition of plasticity that encompasses a range of environmental influences on development, from evolved alternate developmental pathways to the opportunity for underlying genetic variation to have an effect on a developing phenotype (for example, the ‘release’ of cryptic genetic variation: Gibson and Dworkin, 2004; Schlichting, 2008; Ledon-Rettig et al., 2010).

Here we use mandible variation in birds as a system to test the idea that life history timing mediates the range of environment-dependent developmental trajectories. Across vertebrates, mechanical stress has been shown to have a pronounced influence on both muscle and skeletal development (Turner, 1998; Moore, 2003; Ruimerman et al., 2005; Ravosa et al., 2007; Ravosa et al., 2008), in particular for jaw and craniofacial structures. For example, fish reared on different diet types develop entirely different jaw morphologies (Meyer, 1987; Wainwright et al., 1991; Adams et al., 2003; Muschick et al., 2011), which adaptively affects their feeding performance on locally abundant resources (Bouton et al., 2002; Parsons and Robinson, 2007). Similarly, rabbits reared on different diets develop entirely different jaw, palate and cranial structures (Menegaz et al., 2009; Menegaz et al., 2010), whereas pigs reared in different locomotor environments differ in joint and bone structure (Hammond et al., 2010; Congdon et al., 2012). We focus on avian bills because variation in the length, depth and width of the bill has been tied to functional variation in avian foraging, both within and between species (Grant, 1979; Smith, 1987; Benkman et al., 2001; Herrel et al., 2005). Bill development (which is tied to skeletal development) is sensitive to environmental influence after hatching (Young and Badyaev, 2007; Solem et al., 2011). Thus, well-described life history variation across birds (for example, Poole, 2005) can contribute to variation in exposure to environmental influences during development. Overall, we predicted that species that start to explore their environment relatively earlier in development will exhibit greater phenotypic variation due to environmental influences on development (assuming some degree of environmental heterogeneity over time and space). More specifically, we predicted that for precocial birds, which start interacting with their environment almost immediately after hatching, the key life history trait would be incubation length. In other words, incubation length variation across species of shorebirds and ducks should be negatively correlated with trait variation within species. In contrast, for altricial birds, which start exploring their environment and feeding themselves after leaving the nest, following a period of relative helplessness in the nest (Starck and Ricklefs, 1998), the key time point would be timing of flight, once offspring have left the nest. In other words, variation in the timing of flight across species of warblers and sparrows should be negatively correlated with trait variation within species.

Materials and methods

As detailed below, we used two approaches to quantify intraspecific variation. First, we used measurements of museum specimens, focusing on wood warblers and shorebirds because these clades represent two major developmental modes in birds (altricial and precocial), they are each diverse, and their evolutionary relationships are fairly well resolved. This approach allowed us to control for measurement error and sample a number of species within each lineage. Second, we used measurements of trait range from the literature, focusing on two additional lineages, sparrows and ducks. This latter approach allowed us to sample an additional altricial and precocial lineage, respectively. We focused on how life history timing was related to the range of a trait measurement within a species, assuming that greater environmental influences on bill development would be reflected in a greater trait range. We went to lengths to control for variation in sex, geography and age, to maximize the chances that our measures of variation reflected developmental plasticity as opposed to genetic variation or differences across sexes or ages, which likely added some noise to the data set. We focused on the prediction that species which begin exploring their environment relatively earlier in life would exhibit greater phenotypic variation.

Measurements of museum specimens: shorebirds and warblers

Study specimens

We measured body size and bill traits on specimens in the University of Minnesota Bell Museum Ornithology Collection, measuring 28 species of shorebirds and 32 species of warblers (fewer species were used in final analyses owing to gaps in life history data and specimen availability). We sought to measure six males and six females of each species; however, the availability of specimens limited our measurements in some cases; final analyses included on average 10.2 individuals per species for shorebirds and 10.7 for warblers (N=631 individuals total). For some individuals, damage to part of the bill (such as a bent bill tip) allowed some bill measurements, but not others. As discussed below, in our analyses, we set a minimum of five individuals per sex for inclusion in analyses.

To control for phenotypic differences across sexes and ages, we focused on measurements of individuals of known sex (generally determined by plumage or inspection during the skin preparation and noted as specimen data). In a few cases (N=2), for species with only 10–20 individual specimens available, we measured individuals of unknown sex and used linear discriminant analysis of all four morphological traits (wing length and three bill measurements) to classify them as males or females based on measurements of individuals of known sex (with confidence of >90% in those cases, all for shorebirds). For warblers, we only included individuals that were at least a year old (AHY=’after hatch year’). For shorebirds, the majority of specimens in the collection were cataloged as unknown age. We included as many aged, >1-year-old individuals as possible in our analysis, but included those of unknown age out of necessity (about 64% of the individuals).

To control for phenotypic variation that stemmed from genetic variation across populations, we limited our measurements to specimens from a single geographic area. For each species, we determined the region most represented in the collection (for example, east coast of the United States, Midwest, Pacific states, one to two collecting sites in Mexico) and only measured specimens from that region. Most species measured are migratory and so in some cases these regions represent both breeding populations and migrants moving through that migratory corridor. Constraining our samples to a given region and flyway suggests that our measures of variation are not confounded by east–west variation across populations of these birds. The limitations associated with this approach are addressed further in the discussion.

Morphological measurements

We focused on bill traits as our measures of phenotypic variation. Bill variation was chosen for both practical and biological reasons. As mentioned above, variation in the length, depth and width of the bill is tied to functional variation in avian foraging, and bill development (which is tied to skeletal development) is sensitive to environmental influences after hatching. From a practical perspective, bill traits are easy to measure on preserved specimens, whereas other traits, such as tarsus length, tail length or skull traits can be difficult to measure on dried specimens, or are subject to less repeatability.

Bill length, depth and width were measured three times on each specimen by the same person (ESR) using standard methods for birds (Nebel et al., 2005; Stein et al., 2008). Traits were measured to the nearest 0.01 mm using digital calipers. All measurements were performed under a Leica M80 dissecting scope at × 7.5 magnification to ensure that all measurement landmarks were correctly identified. Each replicate measurement was not performed in succession to ensure independence of replicates. Bill length was measured as the distance from the anterior edge of the nares to the anterior tip of the upper mandible. We used this measurement (as opposed to exposed culmen length) because it has been shown to be less affected by specimen shrinkage (Wilson and McCracken, 2008) although specimen shrinkage appears to occur only within the first 1–3 years after specimen skinning and the average age of our specimens was 60 years (Harris, 1980; Engelmoer et al., 1983). Bill depth was measured at the anterior nares for fully closed bills; specimens whose bills had dried partly opened were eliminated for this measurement. Bill width was measured as the widest point (of both mandibles) at the anterior nares. All measurements were highly repeatable (average pairwise difference divided by average size for each of three measurements): for shorebirds, 0.77, 2.0 and 2.2% for bill length, depth and width, respectively; for warblers, 0.73 1.37, and 2.40% for bill length, depth and width, respectively.

Two measures of body size were considered. Wing length was measured on each specimen from the collection as a measure of within-species variation in body size. Other body size measurements (such as mass or tarsus length) were not available on all specimens in the collection. We used species body mass as our measure of between-species size variation (taken from Poole, 2005). We reasoned that body mass represented a more accurate measure than wing length for between-species size variation for species that vary so highly in migration distance (given that wing morphology is tied to migration: Lockwood et al., 1998; Perez-Tris and Telleria, 2001) and that links between body size variation and life history traits generally consider mass (for example, Western and Ssemakula, 1982).

We did not log transform our morphological measurements because there was not sufficient evidence arguing in favor of log transformation. First, for 10 of the 12 models run, log transformation did not improve the normality of model residuals; for the two exceptions (for example, female warbler bill width), log transformation not only improved residual fit, but also increased the significance of the reported results. Second, body size and trait range tended to show no correlation with each other, and in one case where it did (shorebird bill length), it was a linear relationship that was not changed with log transformation. It is likely that size variation within each focal lineage was not great enough to warrant log transformation.

All individual morphological measurements (and replicates) are available in DRYAD (DOI: doi:10.5061/dryad.c5s06).

Measures of phenotypic variation

We quantified phenotypic variation as the range of a bill trait (averaged across all three replicates) measured within a species (max–min). Sexes were analyzed separately given known sexual dimorphism in bill traits, especially for shorebirds (Durell, 2000; Nebel et al., 2005; Stein et al., 2008). We calculated a sex-specific range for each trait for each species where at least five (up to six) measurements were available for the sex in question. Residual variation in sample size (that is, 5 versus six specimens) did not affect measures of trait range.

We were interested in whether differences between species in trait variation were repeatable despite measurement error. To test this, we calculated trait range separately for each of the three replicate measurements and ran an analysis of variance treating species as a fixed variable to test for differences between species (that is, are there significant differences across species in the range of a trait measured across all three replicates?). For both shorebirds and warblers, there were significant differences across species in the range of all three bill traits (Supplementary Table 1). We further investigated the effects of measurement error by testing for correlations between a species average measurement error (average pairwise difference divided by average size for each of three measurements, averaged across all individuals for a species) and the range of each bill trait—we found no evidence of any association between measurement error and trait range for either shorebirds or warblers. These analyses suggest that the differences across species in trait range do not reflect measurement error.

We focused on trait range as our measure of variability because we reasoned it was the most applicable to our focal hypothesis, concerning the range of alternate developmental trajectories. However, trait range was positively correlated with other measures of trait variation, such as standard deviation (s.d.) as calculated from the present data set (for example, R2 for regression of male bill trait range with trait s.d. in warblers was between 0.92 and 0.93). Museum-based measures of trait range were also correlated with those taken from the literature. In shorebirds, there were positive correlations between bill range taken on museum specimens and population-based measures of bill range and s.d. taken from the literature (Poole, 2005; Supplementary Table 2). Although these measurements do not control for the person taking the measurements and are available for a smaller subset of species, they do include population-specific estimates for a much larger sample size (for example, average N=70.8 for shorebirds), lending confidence to our estimates of bill trait variation.

Although bill length, depth and width were correlated across species, measures of trait range were not significantly correlated. Thus, we treat the range of each bill trait separately.

Life history variation

We focused on two life history measurements relative to our focal hypothesis about developmental timing—incubation length and the average age of first flight. We reasoned that both of these time points represented key developmental transitions where young birds are increasingly interacting with their environment. In addition, we reasoned the incubation timing would be more important for precocial birds (shorebirds) than altricial birds (warblers) because precocial birds leave the nest within 24 h (Starck and Ricklefs, 1998).

We took data on life history variables from several sources, in particular the Birds of North America Life History series (Poole, 2005). For flight timing, we noted both the age of first flight and the age of first ‘sustained flight.’ Because timing of first flight was not available for all species in this series, we additionally took data on flight timing from Iwaniuk and Nelson (2003). For some species, flight timing data was available from more than one source. We defined age of first flight as the average of all of these values (one to three values depending on the species). All species-level data are accessible in DRYAD (DOI: doi:10.5061/dryad.c5s06).

Phylogenetic analyses

To account for phylogenetic autocorrelation due to shared evolutionary history, we used the phylogeny of Lovette et al. (2010) for relationships among wood warblers (Parulidae) and that of Gibson and Baker (2012) for relationships among shorebirds (Scolopacidae), both with Pagel's arbitrary branch lengths (Pagel, 1992; calculated with PDAP—PDtree in Mesquite (Maddison and Maddison, 2004; Midford et al., 2005)). We accounted for phylogenetic relatedness among species using phylogenetic generalized least squares regressions implemented in R version 2.15.2 (R Core Development Team, 2012) using the package 'caper' (Orme et al., 2012). We used Pagel's lambda to represent strength of phylogenetic effect, allowing lambda to take its maximum likelihood estimate in each case. Lambda varies between 0 and 1, with values of 0 representing no detectible phylogenetic effect. Each multiple regression included as predictors incubation length, age at first flight and mean mass for the species. The maximum likelihood estimate for λ for all of our models was 0.

In all shorebird and warbler analyses, predictor variables were mean centered and standardized for comparisons across traits and sexes. To visualize the data, we plotted data shown for each response variable as residuals from a phylogenetically corrected model including all other predictors but not the predictor of interest.

Literature-based measures of phenotypic variation: ducks and sparrows

Data on museum specimens allowed us to sample a wide range of species while controlling for measurement error. However, we were limited by specimen availability for some species and measurement time restricted our data set to two taxonomic groups. Thus, we ran two sets of additional analyses using data from the literature on ducks (Anatidae) and new world sparrows (Emberizidae). Data on trait range (max–min measures for a trait for a given population) were taken from appendices in the Birds of North America Life History series for both males and females (Poole, 2005). When data were listed for more than one population, we took the average trait range across populations to avoid confounding our measure with geographic variation. We focused on bill length (the one bill trait commonly reported) and tarsus length, taking the average range reported for males and females. Data on life history traits were taken from the literature as reported above. Adding these two groups not only increased our sampling of precocial species (adding ducks) and altricial species (adding sparrows), but also lent confidence to our measures of trait range as the sample sizes were much larger than those allowed by museum measurements (for example, N=116 per population for bill range measurements for ducks and N=33 population for bill length measurements for sparrows).

Unfortunately, there are no comprehensive molecular phylogenies published for these lineages, although smaller-scale molecular phylogenies and comprehensive morphological phylogenies do exist (Livezey, 1986; Klicka and Spellman, 2007; DaCosta et al., 2009). Given this, and the fact that the effect of phylogeny was absent for our shorebird and warbler analyses (see above), for these additional analyses, we only report analyses of raw data, not corrected for relatedness. For all duck and sparrow analyses, we used JMP v9 (SAS Institute) to construct general linear models that tested for effects of species’ body size (mass), incubation length and timing of flight on measures of trait range (the mean across sexes and populations, see above).

Results

Measures of trait range from museum specimens

Shorebirds

We predicted that precocial birds would exhibit negative correlations between timing of hatching and measures of phenotypic variation. As predicted, male shorebird species with relatively short incubation lengths exhibited greater variation in bill length (Table 1, Figure 2). In addition, species with relatively earlier age of first flight had a greater range of bill widths and bill lengths (Table 1, Figure 2). In addition, there was a negative, but nonsignificant, trend between incubation length and male bill width depth range (Table 1, Figure 2). For females, there were no significant relationships between life history timing and trait variation (Table 1, Figure 2). For both males and females, body mass was significantly positively related to trait range, although this relationship was not significant for male bill depth (Table 1).

Table 1 Life history and phenotypic variation across shorebirds
Figure 2
figure 2

Life history timing affects phenotypic variation across shorebirds. Relationship between life history timing (length of incubation and timing of first flight) and the range of bill traits measured within a species for both male (a) and female (b) shorebirds. Predictor variables were mean centered and standardized for comparisons across traits and sexes. Solid lines indicate relationships that are statistically significant at an α=0.05 (see Table 1 for statistics). For presentation purposes, data shown for each response variable are residuals from a phylogenetically corrected model including all other predictors but not the predictor of interest. Confidence intervals shown are not phylogenetically corrected.

Warblers

We predicted that altricial birds would show negative relationships between timing of flight and measures of trait variation. However, for male warblers, there were no significant relationships between life history timing and trait variation. For females, warbler species with relatively shorter incubation lengths showed greater variation in bill depth (Table 2, Figure 3) somewhat consistent with predictions. Unexpectedly, for females, warbler species that started to fly relatively earlier were significantly less variable for bill length (Table 2, Figure 3) a pattern opposite to that predicted. Mass was not significantly related to trait range.

Table 2 Life history and phenotypic variation across warbler species
Figure 3
figure 3

Life history timing affects phenotypic variation across warblers. Relationship between life history timing (length of incubation and timing of first flight) and the range of bill traits measured within a species for both male (a) and female (b) warblers. Predictor variables were mean centered and standardized for comparisons across traits and sexes. Solid lines indicate relationships that are statistically significant at an α=0.05 (see Table 2 for statistics). For presentation purposes, data shown for each response variable are residuals from a phylogenetically corrected model including all other predictors but not the predictor of interest. Confidence intervals shown are not phylogenetically corrected.

Measures of trait range from the literature

Ducks

We supplemented our measurements of museum specimens with data taken from the literature, again predicting that precocial species would show negative relationships between phenotypic variation and incubation timing. As predicted, in ducks, for both bill length and tarsus length, there was a significant negative relationship between incubation length and trait range (Table 3, Figure 4), mirroring patterns seen in shorebirds. Unexpectedly, there was a positive relationship between flight timing and trait range, although this was only marginally significant for tarsus range (Table 3). Finally, as seen in shorebirds, there was a positive relationship between body size and trait range, but this was only marginally significant for bill length range.

Table 3 Correlations between life history and trait variation in ducks and sparrows
Figure 4
figure 4

Influence of life history on trait range in ducks. Data on bill and tarsus range were taken from the literature. Shown are leverage plots from models that also included body size (see Table 3 for statistics). Significant trends (P<0.05) are represented by solid lines; marginally significant trends (P<0.10) are represented by dotted lines.

Sparrows

Mirroring results from warbler museum specimens, there were no significant relationships between life history traits (or body size) and trait range in sparrows (Table 3). There was a negative trend between flight timing and bill length range, similar to trends seen in warblers, but it was not significant (P=0.15).

Discussion

Life history timing and phenotypic variation

Our results provide partial support for the idea that life history traits may constrain environmental effects on trait development (Figures 1 and 5). The results varied with trait, sex and lineage. For precocial birds, our primary prediction was that species with shorter incubation periods would have greater phenotypic variation. We found strong, significant support for this idea for male shorebird bill length and duck bill and tarsus length, and hints at this relationship for male shorebird bill depth (Tables 1 and 3; Figures 2 and 4). Although flight timing was not our primary life history trait of interest for precocial birds, we did find evidence linking it negatively to trait variation in shorebirds (Table 1), suggesting that this developmental transition may also be important in shorebird trait development. In contrast, flight timing was positively related to trait variation in ducks (Table 3). It is possible that for ducks, flight timing is less important for environmental influences on bill development than the timing of swimming. However, it is not clear why flight timing and trait variation would be positively correlated in ducks. It may be that some other factor, such as migration distance and temporal heterogeneity, could be correlated with timing of flight and driving this pattern.

Figure 5
figure 5

Information timing and developmental sensitivity as constraints on plasticity. (a) For phenotypically plastic genotypes, exposure of individuals to either Environment 1 (blue circle) or Environment 2 (red circle) early in development results in different developmental trajectories. (b) Exposure to such information relatively later in development, for instance due to later transitions to explore or feed independently, limits the range of alternate phenotypes that can develop, provided that growth rates remain constant. (c) If growth rate is plastic, the developmental timing of information may not constrain plasticity. (d) In some cases, the development of a trait may be canalized, or insensitive to environmental information (dotted line), resulting in a limited plastic response despite the presence of environmental information early in development. See ‘Complexities of Development’ for further discussion.

For altricial birds, our primary prediction was that species with earlier flight timing would have greater phenotypic variation. We found no significant support for this prediction, although for both male warblers and sparrows, there were nonsignificant negative trends between bill variation and flight timing (Tables 2 and 3; Figure 3). Unexpectedly, for female warblers, we did find a significant negative relationship between incubation length and bill depth range and a positive relationship between flight timing and bill length range (Table 2, Figure 3). Taken together, these data suggest that there is some support for the focal hypothesis, but more likely in precocial birds than in altricial birds. This difference may be an artifact of statistical power resulting from a greater body size (and trait variation) range in precocial birds (see below). It is unclear whether there could be a more interesting ultimate-level biological explanation. However, one could speculate that morphological niche partitioning within species may be more important in ducks and shorebirds than in warblers and sparrows, where behavioral niche partitioning may have a more important role (for example, MacArthur 1958).

Life history traits are highly correlated with body mass across species. For instance, in birds and mammals, larger species have longer incubation and gestation times, respectively (Western and Ssemakula, 1982; Saether, 1987). We controlled for interspecific variation in body size by including species’ mass in all of our models (Tables 1, 2, 3). However, the body size variation between the precocial and altricial lineages sampled could potentially explain the differences in patterns between the groups in a manner consistent with our overarching hypothesis. Shorebirds and ducks tend to be larger, and span a greater body size range, than warblers and sparrows. If there is a constraint on body size within eggs, larger species may have more post-hatching growth, opening up more opportunity for environmental influences. This could explain why larger species showed greater phenotypic variation in this study (Tables 1 and 3) and recalls the positive correlation seen across species between body size and the degree of sexual size dimorphism (Fairbairn, 1997). This suggests a specific mechanism by which a change in life history (in association with body size) affects exposure to environmental variation and thus the expression of phenotypic plasticity. The greater body size range of shorebirds could also have led to greater statistical ease in detecting patterns of interest. Regardless, expanding these comparisons with other taxa will help to clarify the differences in patterns, and the role of different correlated life history traits in affecting plasticity.

In this work, we tried to limit the extent to which variation in age, sex or genotype affected measures of phenotypic variation such that we could focus on adult variation that resulted from environmental influences during development. However, with a study across more than 50 species using collection specimens, there are constraints, and several factors may have added noise to our analysis. For instance, many of the shorebirds we measured were of unknown age. Even though we limited our measurements of museum specimens to one geographic area, because these birds are migratory, a given sample may have included many different populations that could reflect genetic variation in the bill traits we measured. Although it is possible that these alternate factors could explain differences across species in phenotypic range, it is very unlikely that these factors would confound the measures of variation in any systematic manner that could explain the observed patterns. Specifically, we do not think that there is any reason why species with greater genetic variation in these traits would also have relatively shorter incubation times or first flights. Instead, we suggest that additive genetic variation that is expressed constitutively, rather than in response to environmental information, would represent statistical noise in our analysis, and although it may decrease our statistical power, will not bias our results. The fact that consistent patterns emerged despite this added noise lends weight to the observation that earlier exposure can influence patterns of intraspecific phenotypic variation.

Although it is unlikely that systematic differences across species in genetic variation can explain the observed correlations between life history traits and phenotypic variation, it is important to return to our general definition of plasticity within this context. Throughout this research, we have treated plasticity broadly as environmental influences on development. These environmental influences may come from changes in bone or muscle due to mechanical stress experienced due to different diets consumed during development. Alternatively, these environmental influences may stem from the ‘release’ of underlying genetic variation that was cryptic in a different life history state (Gibson and Dworkin, 2004). In many cases, the phenotypic effects of mutations depend on the environment (Kondrashov and Houle, 1994; Fry et al., 1996; Szafraniec et al., 2001), and benign environmental conditions such as parental care can sometimes buffer underlying mutations (Agrawal and Whitlock, 2010; Snell-Rood et al.). Thus, a shift in the timing of a life history transition may result in a mutation having a novel phenotypic effect; in other words, a genotype-by-environment interaction, or genetic variation in plasticity. Our data cannot distinguish between these two mechanisms of environmental influences on development, but more controlled rearing experiments could begin to tease apart the importance of adaptive phenotypic plasticity versus the release of cryptic genetic variation.

The complexities of development

Throughout this comparative analysis, we have made several simplifying assumptions about development. The added complexities of development, and variation in such processes across species, would likely add significant noise to this data set and could explain why patterns were less pronounced for some traits or taxa. First, we assumed that the timing of trait growth was consistent across species. That is, we assume that a shift in flight timing by a few days results in a shift of timing of information but that the relative maturity of that trait at that point in time is comparable. However, heterochronic shifts are common across populations and species (Raff and Wray, 1989; Klingenberg, 1998; Smith, 2001) and the relative timing of trait development is itself plastic, varying with nutrition and temperature (Strathmann et al., 1992; Miller and German, 1999; Mabee et al., 2000). For instance, the relative timing of bone ossification differs across populations of finches (Badyaev et al., 2008) and species of mammals (Smith, 2006).

Second, we assume that the growth rates are constant across species for a given developmental window. However, we know that growth rates vary across species, even within certain developmental windows such as the nestling period (Remes and Martin, 2002), and flexible, compensatory growth can buffer environmental effects (Metcalfe and Monaghan, 2001; Wilson and Reale, 2006). For instance, in cichlids, being switched between diets relatively later in development has no impact on adult phenotypic variation because there are plastic changes in trait development and compensatory ‘catch-up growth’ following a diet shift (Meyer, 1987). Our conception of the epiphenotype hypothesis assumes that growth rates are constant. Compensatory growth coupled with heterochrony is one mechanism by which the timing of environmental information may not constrain plasticity—delays in the development of a trait, followed by rapid compensatory growth may allow a wide range of alternate phenotypes in species that receive environmental information relatively late in life (Figure 5c).

Third, we assumed that traits did not vary in the time point at which they were sensitive to the environment. If a trait is relatively more canalized to environmental perturbations (Waddington, 1959; Fraser and Schadt, 2010; Gursky et al., 2012; Shingleton and Tang, 2012), exposure to the environment earlier in development will have relatively little effect on the range of plastic responses (Figure 5d). Canalization itself can evolve (Waddington and Robertson, 1966), for instance being selected for under stabilizing selection (Wagner et al., 1997), to prevent phenotype mismatches in rapidly changing environments (Padilla and Adolph, 1996) or emerging as a consequence of complex developmental networks (Siegal and Bergman, 2002). Indeed, variation in the environmental sensitivity of development can result in the emergence of critical periods or developmental windows where traits are sensitive to the environment, such as those in language and song learning (Hurford, 1991; Komarova and Nowak, 2001) or visual development (Hensch, 2004; Medini and Pizzorusso, 2008). However, many traits have long or seemingly absent critical periods and remain reversible and sensitive to the environment well into adulthood (Piersma and Drent, 2003) and many others fall somewhere in the middle of a continuum.

These details of development underscore the importance of complexities in the evolution of development and phenotypic plasticity. Because the timing of trait development, compensatory growth and canalization can all evolve, understanding limits on the range of alternate developmental trajectories is not as simple as considering the timing of environmental information. For instance, in shrews, delays in jaw ossification until the period when an individual is feeding has facilitated the evolution of adaptive plasticity in jaw morphology (Young and Badyaev, 2007; Young and Badyaev, 2010). One must consider both the environmental sensitivity of a trait and the availability of relevant information (Figure 5). Theoretical approaches may shed light on the complex evolutionary feedbacks that could result among these factors. Indeed, a recent model emphasizes the benefits to delaying trait development for a long enough period of time to obtain accurate information on the state of the environment (Fischer et al., 2014).

The ecology and evolution of species-level variation

Although comparative studies are limited by the wide range of factors that vary across species, when combined with more controlled experimental work on the epiphenotype hypothesis (Bruni et al., 1996; Hoverman and Relyea, 2007), we have some support for the idea that developmental timing can constrain plasticity. The fact that these patterns emerged for shorebirds and ducks despite noise due to variation among sexes, ages and populations, lends weight to the biological relevance of the data. If life history timing can shape plasticity, it suggests that selection on life history in one context, for instance selection to leave the nest early due to high nest predation (Remes and Martin, 2002), may influence plasticity. Such an effect may apply in the case of shorebirds given post hoc analyses that showed that larger-billed species have shorter incubation times (P=0.01; this is not the case in ducks, P=0.75). It is possible that constraints on egg size necessitate earlier hatching in species with large bills relative to body size. Such early hatching could result in earlier interaction with the environment and greater trait range. Our results suggest not only that selection on life history may impact plasticity, but also that selection on trait range could result in changes in life history timing. For instance, altricial development has evolved in concert with brain size and behavioral plasticity in birds (Shultz and Dunbar, 2010) possibly because large, complex brains require environmental input for proper development (Snell-Rood, 2012). Shifts to emerge in a less mature state will likely result in correlated changes in parental care to compensate for longer periods of learning and trait development (Heinsohn, 1991; Wheelwright and Templeton, 2003; Thomas and Szekely, 2005; Shultz and Dunbar, 2010) as the mismatch between immature and adult phenotypes increases (Marchetti and Price, 1989; Wunderle, 1991).

The present data set suggests that some species are more phenotypically variable than others for traits that are tied to foraging. Given that developmental and genetic variation in mandible traits are often linked to adaptive performance differences (Durell, 2000; Herrel et al., 2005), our results suggest there may be differences between species in functional trait variation. Although it's possible that the variation we are detecting could be due to developmental noise (Parsons, 1990), it's likely that differences in phenotypic variation within species, regardless of the mechanism, will have consequences for both ecological and evolutionary processes (Smith and Skulason, 1996; Bolnick et al., 2011). At the same time, its important to note that we have assumed that, in general, variation in resources favors variation in foraging morphology. It is possible that variation over time and space in the food of these species is relatively minor, which could offer another explanation for the negative results seen across warblers and sparrows.

The fact that species differ in functional variation recalls a large body of work that considers individual variation and individual specialization within species, many of which are generalist at the species level (Fox and Morrow, 1981; Bolnick et al., 2003; Dall et al., 2012). It is possible that phenotypic plasticity in foraging traits may mediate individual specialization within species. This could also explain why the predicted patterns tended to be more pronounced in males in our data set and why the only result opposite to our predictions was seen in females. In birds, males tend to be the more philopatric sex (Greenwood, 1980) even for sex role-reversed species of shorebirds (Reynolds and Cooke, 1988). If morphological development is responsive to conditions experienced during the immature phase, we might expect benefits to adults choosing similar habitats or resources; indeed, natal habitat preference induction is common across vertebrates (Stamps, 1995; Davis and Stamps, 2004). Thus, the benefits of developmental plasticity induced during the immature period are expected to be higher in males, consistent with the theoretical prediction that developmental plasticity should be favored when organisms experience coarse-grained environmental variation (Schlichting and Pigliucci, 1998). In contrast, in female birds, we might expect the evolution of insensitivity of development to the environment due to a lack of correlation between adult and immature habitats. Fine-grained environmental variation is thought to select against developmental plasticity unless trait development is reversible. Interestingly, in shorebirds, we see that although both male and female trait range is impacted by body mass, developmental timing among males appears more likely to relate to trait range than among females.

Overall, this research provides partial support for the epiphenotype hypothesis and its extension to the importance of life history timing for shaping the evolution of plasticity. At the same time, this work suggests that surveying intraspecific variation across broader taxonomic groups would help resolve many of the remaining open questions. Furthermore, controlled experiments in the lab could shed light on some of the complexities of variation in the timing of trait development, compensatory growth and critical windows for when traits are sensitive to the environment.

Data archiving

All morphological measurements and comparative life history traits are available on DRYAD (http://dx.doi.org/10.5061/dryad.c5s06).