Introduction

Understanding the evolution of ecological adaptations requires the measurement of heritability, which is the proportion of phenotypic variation that is genetically transmitted to offspring1,2,3,4. Heritability has long been identified in numerous life history, physiological and morphological traits1,4. It has also been reported in such complex behavioural traits as dominance, aggression5, dispersal6, personality7,8,9,10,11,12, cooperative breeding13 and group size choice14. These findings are striking because measuring heritability of behavioural traits, especially in the field, is a daunting task3,4. A reason for this is that the constraints on individuals to make optimal choices create considerably more variation than in other characteristics such as morphological traits. In this context, Brown and Brown14 reported exceptionally high heritabilities of individual preferences for colonies of particular sizes. Animals in many species forage, travel or breed in groups and group size often varies by orders of magnitude across species and populations15,16,17. Studying variation in group size therefore, is a widely applied approach for understanding group living16,17,18,19,20. A taxonomically widespread form of group living is omit coloniality, in which breeders defend only relatively small, aggregated breeding territories and forage elsewhere16,21.

A novel solution for explaining variation in colony sizes has been offered by Brown and Brown14 based on their study of cliff swallows Petrochelidon pyrrhonota, whose colonies range from two to over 3000 nesting pairs18. Brown and Brown14 proposed that variation in colony size is maintained by a genetic predisposition by breeders to recruit to colonies of similar sizes to those chosen by their parents. This idea was supported by highly significant parent-offspring regressions of colony size ranks, suggesting that colony size choice is heritable. To exclude the possibility that these findings could be explained by non-genetic factors such as early social imprinting, Brown and Brown14 performed a partial cross-foster experiment in which half of the nestlings in broods in small colonies were transferred to be raised in large colonies and vice versa. The authors found that their populations showed significant positive regressions to the natal colonies and negative regressions to the colonies in which individuals were raised. Similar results have been subsequently reported in an experimental study of barn swallows Hirundo rustica22 and in a correlational study of lesser kestrels Falco naumanni23.

Genetic transmission of a complex behavioural trait from parents to offspring may have considerable evolutionary consequences. The cliff swallow study is especially compelling given its strong results and the extraordinarily large sample sizes, which were produced by the cross-fostering of almost 2000 nestlings of which 721 were recovered as breeders in the following year14. For these reasons the study has been hailed as a milestone24.

These impressive results are however unexpected for at least three reasons. First, compared to morphological traits, the high plasticity of realized behaviour makes behavioural traits unlikely to be highly heritable4. Second, habitat selection is strongly influenced by the spatial distribution of available breeding locations. Previous studies have stressed that heritability estimates of behavioural traits that involve movement by animals across distances can be strongly inflated when not considering that different individuals have different sets of possible movements25,26. The sets of possible outcomes are affected by such factors as the distribution of suitable breeding sites, the locations of natal nests and the shape of the study areas. For example, animals born in the centre of a study area have a different set of possible dispersal distances than those born in the periphery. Because dispersal distance can only be studied in individuals that remain within the study area, this method produces a bias toward individuals with the same short dispersal distances as their parents. If offspring also disperse relatively short distances, as do most individuals in many species6,26,27, it could produce spurious parent-offspring regressions. This spatial fallacy can be circumvented by using a null model of possible choices accounting for the spatial distribution of potential movements for every individual25. However, none of the three studies reporting on heritability of colony size choice used such a null model of possible choices, suggesting that heritabilities may have been overestimated.

Third, parent-offspring regressions are well known to be particularly prone to the ubiquitous fallacy called “regression to the mean” (RTM) that was first identified in the 19th century28 in the context of parent-offspring regressions. The RTM fallacy results from the fact that uncommonly large or small measurements are generally followed by measurements that are statistically closer to the mean simply because average values are far more common than extreme ones. In cross-foster experiments that study the heritability of colony size choice, when individuals are fostered from small to large colonies they will on average recruit to colonies that are statistically smaller than their foster colony because these recruitment colonies are closer to the mean colony size (and vice versa).

The ecological and evolutionary implications of a genetic component of group size choice are profound. We have thus explored, using simulations of published data, the potential of the RTM and spatial fallacies to produce spurious parent-offspring regressions in the context of colony size choice. Because, to our knowledge, the impact of the RTM and spatial fallacies has never been explored in tandem, we also examined the interaction between the potential effects of the two fallacies on estimated heritabilities. We finally applied this approach to develop a method for avoiding both pitfalls.

Results

Simulating heritabilities from an experimental study

We used individual based simulations to explore the potential occurrence of the two fallacies and their impact on estimated heritabilities (See Methods). Our simulations of the cliff swallow experiment14 randomly produced a high proportion of regressions on recruitment colony size that were positive to birth and negative to foster colony size (Figure 1a). This finding was obtained at both significance thresholds and in all four spatial colony distributions. The first three distributions simulated the natural condition of large colonies being widely spaced and surrounded by smaller colonies. Of these, the BigFar5 distribution was designed to generate the maximum contrast in the sizes of neighbouring colonies. That distribution produced regressions with equal or lower p-values than those of the cliff swallow study in 91% of the simulations (black bars in Figure 1a). The other two distributions in which large colonies were widely spaced, BigFar5Random and BigFarHalf, both randomly yielded over 50% of regressions with equal or lower p-values than in the cliff swallow study. The Random distribution of colony sizes comprised a null model in that it was generated without any assumptions about the spatial distributions of colonies, yet even it yielded equal or lower p-values in 33% of the simulations. When using the significance threshold of 0.05, the proportion of significant regressions increased only slightly in all four distributions (white bars in Figure 1a) because the distributions of the generated p-values were strongly skewed toward highly significant regressions. In the data in Figure 1a, for instance, between 35% and 72% of the p-values were lower than 0.0001. These findings suggest that highly significant heritabilities should be viewed with caution.

Figure 1
figure 1

Percentage (±SE) of randomly generated significant parent-offspring regressions in relation to four types of colony distributions.

Individuals recruited from their foster (a and b) or birth (c) colony using a random process within a linear lattice. Open bars: when using 0.05 as the significance threshold; Black bars: when using the p-values of the cliff swallow regressions as the significance threshold. As explained in Methods, results only include instances when the parent-offspring regression was positive to the birth colony and negative to the foster colony. Each situation was simulated 2,000 times. (a) Simulations of the cross-foster experiments using the same protocol and sample sizes as in the cliff swallow study's Figure 2. (b) Simulations of the cross-foster experiments using the same sample sizes as in the cliff swallow study, but in which the birth and cross-foster colonies were selected randomly. (c) Simulations of the non-experimental parent-offspring regressions using the data provided in the cliff swallow study's Figure 1. Results only account for non-philopatric individuals. *: in this distribution there were no significant parent-offspring regressions of the expected signs, but 99% of the opposite signs.

All of these results were generated with the random walk process of recruitment, which does not involve active choices of colonies but accounts for their spatial distributions. This recruitment strategy reproduces the fact that the recruitment probability rapidly declines with distance to the natal site6,26,27. These findings highlight the importance of accounting for the spatial distributions of potential choices and suggest that even in experiments, it is impossible to properly estimate heritabilities from parent-offspring regressions in the absence of a null model.

Simulating heritabilities when birth and foster colonies are randomly chosen

We then conducted simulations to analyze factors that may have contributed to the significant positive regressions to the birth colony and negative regressions to the foster colony in the cliff swallow study. Simulating an experimental design that randomly selects birth and foster colonies substantially diminished the percentage of significant regressions (compare Figures 1a and 1b). However, the percentage of significant parent-offspring regressions still ranged from 14% to 50% versus the expected 2.5%, indicating that randomization does not solve the problem.

Simulating non-experimental heritabilities

The three studies of the heritability of colony size choice14,22,23 reported highly significant parent-offspring regressions with non-experimental correlations. Our simulations of data from Brown and Brown's Table 1 showed that significant parent-offspring regressions of the predicted signs were much less likely to be randomly generated from the non-experimental than the experimental data in three of our four distributions (compare Figure 1a and 1c). These were the three distributions with spatially structured colonies. In contrast, when colonies of various sizes were randomly distributed, the percentage of significant regressions of the expected sign was similar in the experimental and non-experimental data. This exercise illustrates how spatial structuring can generate spurious significant parent-offspring regressions. These differences exist even though our random colony distribution is conservative in that it generates some proportion of spatial structuring. The highly spatially structured BigFar5 colony distribution illustrates this point as it was designed to generate the maximum contrast in the sizes of neighbouring colonies to depict the highest risks of producing spurious experimental regressions. The consequence is that in non-experimental data, this distribution generated no significant regressions of the predicted signs (Figure 1c) but 99% of the opposite signs. This occurred because large colonies were surrounded by small ones, making it inevitable that most birds that fledged from large colonies (whether birth or foster colonies) would recruit to small ones.

Table 1 Significance of the regressions shown in Figures 2 and 3. Regressions are of the percentage of significant parent-offspring heritabilities (Y-axis) on contrasts in size ranks between the birth and foster colonies (X-axis). All slopes were positive, suggesting that RTM was present in all situations. The greater the contrast in the sizes of natal and foster colonies, the more frequently spurious significant parent-offspring regressions were generated. There were 21 colonies in all simulations. All individuals were recruited using the random walk process in which recruits returned to the colony from which they fledged. They next moved randomly, recruiting to the first colony they encountered. When running simulations with 10 and 100 colonies we found similar results (Figure 3). In all circumstances the frequencies of significant parent-offspring regressions with low contrasts in the size of natal and birth colonies were from 4 to 8 times higher than the expected 2.5%
Figure 2
figure 2

The effect of contrasts in colony size ranks between birth and foster colonies on the percentage of significant parent-offspring regressions.

We conservatively used the p-values from Table 1 of14 as significance thresholds and used the sample sizes from that study's Figure 2. There were 21 colonies and a square grid in all simulations (a linear grid leads to similar patterns; see Table 1). Recruitment was according to a random walk. Each point results from 2,000 simulations. The four regressions depicted here were significant (see Table 1). Standard errors are too small to be shown.

Figure 3
figure 3

Effect of the contrast in size ranks between birth and foster colonies on the percentage of significant parent-offspring regressions of colony sizes.

Contrast in size ranks between birth and foster colonies ranged on the X-axis from the lowest (value of 10) to the highest (value of 1). For example, with 100 colonies, the lowest contrast was between colony sizes 50 and 51 while the highest contrast was between sizes 1 and 100. All three regressions depicted here were significant (SAS GLM procedure, interaction p = 0.37; number of colonies: p = 0.028; Effect of the design: P < 0.0001). We used the BigFarHalf distribution of colonies on a square grid and recruitment followed a random walk. Each point was produced from 2,000 simulations. Standard errors are too small to be shown.

The other two spatially structured distributions generated a substantial proportion of significant non-experimental parent-offspring regressions of the predicted signs (Figure 1c). There was thus substantial overlap among the distributions in their capacity to produce spurious significant experimental and non-experimental regressions of the predicted signs (Figure 1). This suggests that regardless of the type of colony distribution, spurious regressions occur. Thus, both experimental (Figure 1a) and non-experimental (Figure 1c) regressions can be highly significant and yet spurious.

The spatial and RTM fallacies

We next explored possible mechanisms responsible for the frequently generated spurious regressions in the experimental data. We found that regressions from cross-foster data are sensitive to differences in the sizes of birth and foster colonies (Figure 2). When nestlings were exchanged between large and small colonies (high contrast), more positive regressions were generated with the birth colony and more negative regressions with the foster colony than when they were exchanged between colonies of intermediate sizes (low contrast). The colony distributions were held constant within each of the curves in Figures 2 and 3. Thus, the increase in the proportions of significant regressions could only result from the occurrence of RTM.

The four colony distributions were each simulated in a linear and a square lattice. The percentage of significant heritabilities increased with the size contrast between birth and foster colonies (Figure 2). All slopes in Figure 2 were positive and significant, even when colony sizes were randomly distributed (statistics in Table 1). Similar results were obtained by running these simulations with 10, 20 and 100 colonies (Figure 3), suggesting that the frequency of spurious regressions was independent of the number of colonies in the experimental population.

When using the lowest possible size contrast between birth and foster colonies, RTM should be weak. Nevertheless, the proportion of significant regressions was approximately four to eight times higher than the expected significance threshold of 0.025 (left part of Figures 2 and 3). In the relative absence of RTM, these findings suggest the existence of another effect that produces spurious regressions. Figure 2 illustrates that the slopes produced in the random colony distribution are much lower than in the three spatially structured distributions. This difference suggests that the spatial fallacy is responsible for these spurious regressions. Results in Figures 2 and 3 thus suggest that the spatial fallacy and RTM combine to produce spurious regressions. The combined effects of the two pitfalls have unexpectedly strong consequences, with the frequency of spurious regressions of the expected signs ranging from 38% to 95% among the four colony distributions (Figure 1a).

The common options approach

We used further simulations to explore the conditions under which we would obtain the expected 2.5% of significant parent-offspring regressions in a one-tailed test. When we selected the median sized colony as the common foster colony, the simulations yielded 2.5% of significant regressions that were positive to the birth colony and negative to the foster colony (Figure 4a). This was true in all four types of colony distributions (Figure 4a), suggesting the robustness of this design. When we selected any colony size as the common foster colony, we obtained the same result (Figure 4b). The common options approach thus seems immune to the two pitfalls.

Figure 4
figure 4

The “Common Options” experimental design.

Percentage (±SE) of significant parent-offspring regressions that were randomly generated according to four types of distributions of colonies. There were 21 colonies. Individuals recruited from the foster colony with a random walk. Open bars: when using 0.05 as the significance threshold; Black bars: when using the p-values of the cliff swallow regressions as the significance threshold. We only tabulated regressions that were positive to the birth and negative to the foster colonies. Each situation was simulated 2,000 times. (Note that the scale of the Y-axes differs from those of Figures 1, 2 and 3.) (a) When transferring chicks from randomly selected colonies to the median colony (of size rank 11). The two parameters are: the BigFar5 colony distribution and the transferring of chicks from the largest and smallest colony (1 and 21) into the median size colony (rank 11). (b) Analysis of the effect of the size of the common foster colony on the percentage of significant parent-offspring regressions (P < 0.05). The two parameters are: the BigFar5 colony distribution and the transferring of chicks from the largest and smallest colony (1 and 21) into a single colony whose size was allowed to vary from ranks 2 to 20.

Discussion

Our simulations highlight two pitfalls in estimating the heritability of behavioural traits. One pitfall is regression to the mean (RTM), which was first identified in the context of parent-offspring regressions in the 19th century28. The other pitfall is the spatial fallacy, which may occur in studies of behavioural traits involving movements by animals between a set of potential locations.

RTM results from the fact that uncommonly large or small measurements are generally followed by measurements that are statistically closer to the mean simply because average values are far more common than extreme ones. This fallacy has been undermining studies in many fields such as economics, social sciences, cognition, health care policy, sports medicine and epidemiology despite its early discovery28. The current use of the term ‘regression’ is itself derived from the regression to the mean fallacy.

Our simulations show that analyses of data from cross-foster experiments may be sensitive to differences in the sizes of birth and foster colonies. This is potentially important because the greater the difference, the higher the probability that an individual will randomly recruit to a colony that is substantially different in size from the one in which it was fostered. For example, in a population with n colonies, if offspring are transferred from the largest to the smallest colony and then randomly select a colony, they have (n − 1)/n probability to recruit to colonies that are larger than their foster colony. This may generate spurious significant positive correlations between the sizes of recruitment and birth colonies and negative ones between recruitment and foster colonies. Despite its ubiquity and venerability, RTM is so subtle and counter-intuitive that it continues to be overlooked29. It can be corrected for in post hoc analyses30,31, but is considered to be avoidable by proper experimental designs31. Unfortunately, as our simulations suggest, even classically designed experiments are not immune to RTM and may even amplify its effects.

Accordingly, when our simulations varied the contrast in colony sizes, we found that the greater the contrast, the higher the proportion of spurious significant regressions (Figures 2 and 3). The contrasts in colony sizes may further explain why simulating randomly selected birth and foster colonies did not avoid RTM. In such a design, some of the randomly generated pairs of colonies inevitably have high contrasts in size and thus generate spurious regressions. Our exercises demonstrate the subtlety of RTM29 and imply that randomization may not be an ubiquitous solution for avoiding biases in experimental designs.

Moreover, RTM mainly results from the bell shaped frequency distributions of most traits. This is due to the fact that the most frequently occurring values are of intermediate measures. In our context, the frequency distributions of colony-sizes are probably bell shaped because colonies of intermediate sizes are much more frequent than colonies of extreme sizes. In the cliff swallow study however, the frequency distributions of colony-size were flattened by the use of size ranks, with each rank being represented by a single colony. Thus, the effects of RTM became detectable only with increased contrasts in colony sizes. This explains the increase in the proportions of significant regressions in Figures 2 and 3. Consequently, the use of ranks in colony sizes in the cliff swallow study substantially diminished the effect of RTM, making our simulations conservative. This factor indicates that RTM may arise even in the absence of a bell shaped frequency distribution of the concerned trait and highlights the importance of its impact in cross-foster experiments.

In the cliff swallow experimental design, chicks were cross-fostered between large and small colonies. This is intuitively appealing because experimenters seek to produce strong effects. However, our simulations show that spurious regressions can also be obtained even when chicks are swapped between colonies of low contrasted sizes (Figures 2 and 3). This suggests that an additional fallacy is involved.

The spatial fallacy was raised by van Noordwijk25, who questioned whether a study that reported significant non-experimental parent-offspring regressions of dispersal distance in great tits Parus major32 could conclude that dispersal distance is heritable. van Noordwijk25 simulated data on between-nest box distances from three populations of great tits to generate sets of all possible inter-nest distances. His simulations led him to conclude that space must be accounted for in all correlative studies involving the distribution of suitable habitat in the environment. The simulations clearly showed that in a non-experimental study, spurious parent-offspring regressions of dispersal distances can be generated in the absence of a null model of all possible options. This method was recently applied in the context of dispersal in lesser kestrels26, which found that philopatry to the natal colony was much higher and observed distances much lower, than predicted by a null model accounting for all possible distances. More recently, van Noordwijk and collaborators designed methods to account for the impact of the heterogeneities in detectability of individuals that may result from differences in personality or sex on estimated heritabilities33,34.

In the three studies of heritability of colony size choice14,22,23, the distribution of colonies is comparable to the distribution of nest boxes in the great tit study in that there is also a finite set of choices of breeding locations. Consequently, estimating heritability of colony size choice requires taking into account the distribution of colonies of various sizes, which includes the number and density of colonies, the suitability of habitat and inter-colony distances. For example, it is known that large colonies tend to be spaced relatively far apart, with smaller colonies situated in between35,36,37 due to local competition for food38. Thus, as in dispersal studies25,26, it is necessary to incorporate into heritability estimates the randomly generated expected distributions of choices resulting from the spatial distribution of colonies of various sizes. By not doing so, the three studies assumed that individuals were equally likely to recruit to any of the colonies, regardless of their size and location. However, the probability of a given bird recruiting to a colony is likely to decrease with the distance to the natal (or foster) colony26,27. The colony of recruitment may thus be significantly influenced by the distribution of colonies of varying sizes and by the distance from the colony of origin, independently of the possible preference of the individual.

Our simulations of both non-experimental and experimental studies randomly produced high proportions of significant parent-offspring regressions of colony size choice and showed that these frequencies are highly sensitive to types of colony distributions (Figure 1a and 1c). More importantly, our simulations suggest that experimental data may be even more exposed to the two fallacies (Figure 1a).

Our simulations show that it is necessary to avoid both pitfalls to conclude that a behavioral trait such as habitat selection has a heritable genetic component in the population. Our final simulations suggest that the two pitfalls can be avoided by designing experiments that provide individuals with the same set of options. The standard method for estimating heritability is the partial cross-foster experiment39,40, as was performed in the cliff swallow study. In this design, half of the nestlings were swapped between nests of small and large colonies creating two types of offspring dyads, full sibs in nests in different colonies and foster sibs in the same nests.

Although the goal of this design is to use these dyads to perform two pair-wise tests in a single analysis, this test was not reported in the cliff swallow study. However, being raised in the same nest provides foster siblings with the same set of options, which according to our simulations would make their comparison immune to both the spatial fallacy and RTM. In contrast, full siblings raised in different colonies have different sets of options, making any comparison between them susceptible to both pitfalls. The effect of these fallacies should increase the actual differences in colony size choice between full siblings, thus biasing the full analysis and, consequently, estimates of heritability.

Our simulations suggest that only comparisons between individuals with the same set of options are immune to the two pitfalls. Providing the same options may be achieved by fostering all offspring within a single foster colony. This method clearly avoids the spatial fallacy. However, while our simulations show that this also avoids the RTM fallacy when working with ranks (Figure 4), it is nevertheless possible that it only avoids RTM when using the most frequently occurring colony size as the common foster colony. Unfortunately, actual colony sizes were not reported in the cliff swallow study, which did not allow us to simulate them.

However, the logic of RTM allows us to predict that using the most frequently occurring colony size as the common foster colony will lead to a non-significant proportion of spurious heritabilities. Further simulations may demonstrate whether experimenters can be more flexible by being able to select a colony of any size as the common foster colony. Pending such simulations we propose selecting a colony of the most frequently occurring size.

Methods

Our main goal was to determine whether parent-offspring regressions produced by cross-foster experiments can be generated by randomly simulating Colony size choice. Brown and Brown's14 methods and results were reported in considerable detail, allowing us to use their published data to simulate parent-offspring regressions of colony size choice. Throughout this paper unless otherwise noted, references to the cliff swallow study are of14. As in that study, we used breeding colony size choice as the phenotypic trait.

Our individual-based simulations produced a null model of the expected statistical significance of parent-offspring regressions when accounting for the spatial distribution of colonies of various sizes. We used the colony size ranks data and the number of recruits provided in Figure 2 of the cliff swallow study to simulate parent-offspring regressions. We were unable to simulate distributions of real colony sizes because this was not described in the cliff swallow study. In that study, a total of 721 chicks were recovered after they had been cross-fostered in the previous year between colonies of various sizes. After simulating the 721 recruits in each run, we estimated heritabilities by separately regressing recruitment colony size against both birth and foster colony size. These simulations are based on the logic of14 that positive regressions to the birth colony and negative regressions to the foster colony support the existence of a genetic component in colony size choice.

We ran 2,000 Monte Carlo simulations for each set of parameters. A new distribution of colonies was recreated for each run. We tabulated the p-values of the generated parent-offspring regressions using the standard significance threshold of 0.05 (white bars in Figures 1 and 4a). To further examine whether the extremely high significance of the cliff swallow regressions (in which 8 out of 9 were significant, including four with p-values under 0.0001 and one that was negative to the rearing colony) could be generated randomly, we also used the actual p-values reported in Table 1 in14 as significance thresholds (black bars in Figures 1 and 4a). We adopted14 prediction that regressions of recruitment colony size should be positive to the birth and negative to the foster colony size in tabulating only simulated regressions that met these criteria. However, our simulations sometimes also generated substantial proportions of significant regressions of the unpredicted sign (see for instance Figure 1c).

Spatial distribution of colonies

In the cliff swallow study, colonies were aggregated in five clusters ranging from 11 to 25 colony sites. The largest cluster in a single year comprised 21 active colonies. We therefore simulated 21 colonies with size ranks ranging from 1 (largest) to 21 (smallest). We used a linear lattice of 150 × 5 cells to represent the fact that cliff swallows breed along rivers and a square lattice of 50 × 50 cells, which may better represent the habitat of most species, including barn swallows and lesser kestrels.

We then simulated four different types of distributions of colonies of varying size ranks (examples of the simulated distributions are provided in the Supplementary Online Material). In all four distributions, colony positions were drawn randomly from all cells of the lattice, with a given cell containing at most one colony. We then assigned each selected cell (i.e., colony position) a size rank according to four distributions. The first three distributions represented the fact that in nature, large colonies tend to be far apart35,36,37,38.

In the BigFar5 distributions, the five largest colonies were randomly assigned to some of the selected colony positions so that they were at least 20 cells apart. The remaining colonies were then assigned so that the smallest colony was closest to the largest colony, the second smallest near the second largest and so on until the fifth smallest colony. The process was then reiterated for the next five smallest colonies so that the 6th smallest colony was in the remaining location closest to the largest and so on until all colonies were placed. This type of distribution was designed to generate the maximum contrast in the sizes of neighbouring colonies in order to produce the highest probabilities of spurious significant experimental regressions. The purpose of this exercise was to explore the range of probabilities of producing spurious regressions when not accounting for the two pitfalls. By placing smaller colonies next to the five largest ones and assuming the random walk process of recruitment (see next section), individuals were likely to recruit to colonies of substantially different sizes than the ones from which they fledged. This was expected to generate negative regressions between recruitment and fledging colonies. In simulating cross-fostering experiments, the fledging colony was the foster colony, while in non-experimental situations the fledging colony was the birth colony.

In the BigFar5Random distributions, ranks of the five largest colonies were also at least 20 cells apart, whereas the ranks of the remaining colonies were assigned randomly.

The BigFarHalf distributions corresponded to the BigFar5 distributions in which colonies of the largest half of the distribution were separated by at least seven cells. The remaining smaller colonies were then placed so that the smallest was closest to the largest, the second smallest closest to the second largest, etc.

Finally, we simulated Random distributions which were totally random with regard to colony position and size. We designed the first three colony distributions to explore the impact of different kinds of spatial structuring of colony sizes and the random distribution as a null model to provide a basis of comparison with a non-structured distribution. This null model is conservative because our algorithm inevitably generates some outputs with some structuring. In the absence of any bias, we expect 5% of the simulations to generate significant regressions.

Recruitment

We simulated the philopatric recruits of the cliff swallow study by recruiting them to their birth colony. In contrast, non-philopatric individuals recruited according to a “random walk” strategy in which recruits returned to their fledging colony and then moved randomly within the lattice. At each step, all non-philopatric individuals had an equal probability of moving to any of the eight adjacent cells until they entered a cell containing a colony, to which they recruited. Simulated non-philopatric birds were not allowed to recruit to their fledging colony. This algorithm imitates the diffusion process in physics wherein molecules move randomly and allowed us to account for the spatial distributions of colonies without assuming any process of choice by the birds. It also imitates natural situations such as when newly fledged birds explore their environment starting from their fledging location before migrating.

Selecting experimental colonies

To compare how the results of cross-foster experiments may be influenced by the ways that researchers select experimental colonies, we simulated an experimental design in which the birth and recruitment colonies were randomly chosen. We then simulated another experimental design in which we systematically selected birth and foster colonies in order to cover the full range of contrasts in colony size ranks. The highest contrast was between the largest and smallest colony size ranks of 1 and 21 and the lowest contrast was between colonies of nearly the same intermediate rank, i.e., 10 and 12.

Providing common options

Our final goal was to explore whether fostering all nestlings to a common colony may avoid RTM and the spatial fallacy by providing them with the same set of choices when dispersing from the same location. This method resembles a common garden experiment in which all individuals are fostered into the same location in order to apportion genetic and environmental effects on the phenotype41,42. Our proposed “common options” experiment is designed to additionally provide all individuals with the identical set of opportunities to disperse to any location.

In a first of two sets of simulations, we placed all fostered young into the median sized colony. In the second set, we performed the same simulations separately for foster colonies of each size rank. A design avoiding the two pitfalls should lead to 5% of significant parent-offspring regressions, of which only half (2.5%) should be positive to the birth and negative to the foster colony sizes in one-tailed tests.

All our simulations are based on ranks in colony size. We did not simulate distributions of real colony sizes because this was not described in the cliff swallow study. We reckon that such distributions would be bell shaped with the median colony also being a colony of a frequently occurring size. In such conditions the effect of RTM should be increased, which could change the shape of Figure 4 b, in having higher proportions of significant regressions when using a less frequently occurring colony as the common foster colony.