Introduction

The need to protect the genetic resources of the world's most important cultivated plants has sparked a growing interest in crop diversity (Brush, 2000, 2004). Considerable effort has been expended in cataloging agricultural genetic diversity (Brush, 2004; Pressoir and Berthaud, 2004; Perales et al., 2005; Jarvis et al., 2008), and debating the practical, political and scientific bases for maintaining diversity of our major food crops (Brush, 2000; Fowler and Hodgkin, 2004; Esquinas-Alcazar, 2005). It has become clear that a large amount of genetic diversity is contained within the small-scale agricultural systems that are typical of the developing world (Jarvis et al., 2008). In contrast to commercial farmers, small-scale farmers generally obtain their crop varieties through a traditional system of seed management that is based on saving and exchange of local germplasm (Almekinders et al., 1994). The dependence on this traditional seed system means that genetic diversity is affected by seed management dynamics (Louette et al., 1997). For this reason, traditional seed management has become an important research topic for crop conservationists (Badstue et al., 2007).

Borrowing from population biology, researchers have noted parallels between traditional crops and subdivided populations (Brush, 1999; Alvarez et al., 2005; Dyer and Taylor, 2008). Surprisingly little effort, however, has been spent on developing models of population subdivision that are suitable for traditionally managed crops. To date, discussion of agroecosystems in a metapopulation context has been predominantly metaphorical (Louette, 2000; Pressoir and Berthaud, 2004; Alvarez et al., 2005), and the few attempts to treat population dynamics mathematically (Heisey and Brennan, 1991; Dyer and Taylor, 2008) have ignored population genetics completely.

Population genetic models of subdivided species have been instrumental to our understanding of neutral genetic diversity and structure in nature. General results from models such as Wright's island model (Wright, 1951) and more recent metapopulation models (Slatkin, 1977; Maruyama and Kimura, 1980; Lande, 1992; Whitlock and Barton, 1997; Wakeley and Aliacar, 2001) have served to predict the genetic effects of population size, migration rates and extinction/colonization in natural populations. And in spite of vast differences in natural history, most species of animals and plants present patterns of demography and migration that can be interpreted and modeled in a metapopulation framework (Harrison and Taylor, 1997).

In contrast to natural populations, demography and seed migration in cultivated plants are subject to conscious intervention by farmers. Traditional agricultural practices are rather well documented, particularly for grain crops such as maize, rice, sorghum and millet (Longley and Richards, 1993; Louette et al., 1997; Brocke et al., 2003; Barnaud et al., 2008). The available literature suggests that farmer-managed crops differ from a classic metapopulation in several respects. First, many crops are characterized by having a large number of seeds per inflorescence (Harlan et al., 1973). This has made the inflorescence the focus of seed management (Li and Wu, 1996; Louette and Smale, 2000; Brocke et al., 2003; Perales et al., 2003; Barnaud et al., 2007), and because the seed is derived from a limited number of maternal plants, effective deme size is expected to be much smaller than census size (Louette et al., 1997). Second, seed migration into demes is not random. Whereas existing models assume that migrants are drawn from the entire metapopulation, farmers generally obtain seed from a very limited number of familiar sources (Almekinders et al., 1994; Zeven et al., 1999; Badstue et al., 2007; Barnaud et al., 2008). In cases where detailed data are available, farmers are reported to receive seed from a single source each time (Rice et al., 1998). Third, seed is often recycled for several years without any influx of foreign germplasm (Perales et al., 2003), so seed migration into individual demes is episodic rather than continuous. Finally, the process of extinction and recolonization generally occurs without passing through the bottleneck that is assumed in most metapopulation models, because farmers will generally obtain enough seed to plant the desired acreage of land instead of reducing the planted area. There is thus good reason to believe that farmer-managed seed systems may deviate substantially from classical metapopulation models, yet the population-genetic implications of farmer–seed management and the validity of models of subdivided populations have not been effectively explored.

In this paper, we explicitly address the population genetic dynamics of managed crops in a metapopulation framework. We show that incorporating characteristics that distinguish crop metapopulations from most natural species leads to predictions that are different from those emanating from classical models. We begin by generalizing a common approach to modeling neutral genetic diversity in metapopulations and adapt it to include several important features unique to farmer-managed crops. We present results on the effects of seed migration quantity, migration frequency and extinction on patterns of genetic diversity in crop metapopulations. Notably, we find that two predictions of classical models—namely geographical invariance of within-deme diversity and the reduction of genetic structure through migration—do not necessarily hold in farmer-managed systems. Finally, we end with analysis of other factors of potential importance in crop metapopulations, including deme size and pollen migration. We frame our work in the context of maize cultivation to take advantage of the large body of knowledge of seed management practices and diversity at the farm level, but we expect our results to be representative for other sexually propagated crops.

Materials and Methods

Defining the model

To illustrate the features that are unique to our crop metapopulation, we define what we will refer to as a classic metapopulation model. Our definition is based on Slatkin's (1977) model II. This model describes a number of discrete sub-populations, or demes, consisting of N sexually reproducing diploid organisms. Demes are linked by a constant flow of migrants sampled from the entire metapopulation. In case of extinction of demes, there is instant colonization by a limited number of colonists. Colonists are either drawn at random from the metapopulation (migrant pool model), or each deme receives colonists from one randomly chosen source deme (propagule pool model).

We start by presenting a generalization of the recurrence methods initially developed by (Latter, 1973; Slatkin, 1977; Maruyama and Kimura, 1980), and reframed in terms of average coalescence times by Pannell and Charlesworth (1999). A subdivided population is described in terms of the mean time to coalescence for two alleles sampled at generation t from either a single deme (T0), or two different demes (T1). Mean coalescence time can be defined as the time that has elapsed since two sampled alleles were derived from the same ancestral allele, and is directly proportional to genetic diversity under the infinite sites model without recombination (Hudson, 1990). T0 and T1 thus represent the equilibrium values of genetic diversity for alleles sampled within and between demes, respectively. Average diversity for the entire metapopulation may be expressed as T=(T0/n)+T1(1−(1/n)), where n is the total number of demes (Pannell and Charlesworth, 1999). Genetic structure, defined as the relative reduction in within-deme diversity, is estimated by FST=(T−T0)/T (Slatkin, 1991).

Coalescence of a pair of alleles can only occur when they are present in the same deme. We will refer to this condition as co-location. If we define T0′ and T1′ as the mean coalescence times for alleles in the previous generation, then for an allele pair sampled at generation t, three possible coalescence times exist: one generation for those that co-located and coalesced in the previous generation, 1+T0′ generations for alleles that co-located in the previous generation but did not coalesce, and 1+T1′ generations for two alleles from different demes. Mean values of T0 and T1 may then be calculated by the following recursion equations:

where Pi is the probability of coalescence for two parental co-locating alleles. The subscript reflects the fact that coalescence probabilities may be different for different combinations of alleles. The terms ai and bi are compound terms expressing the proportion of all possible allele pairs that co-locate and have a coalescence probability of Pi. The sums ∑iai and ∑ibi thus represent the mean co-location probabilities for allele pairs sampled within and between demes.

At equilibrium T0=T0′ and T1=T1′, we may therefore substitute T0′ and T1′ with T0 and T1 in equations (1) and (2) such that

and

where

with P̄b=∑ibiPi/∑ibi being the mean coalescence probability for co-locating allele pairs from different demes, and P̄a=∑iaiPi/∑iai representing the mean coalescence probability for co-locating allele pairs from the same deme. Equation (5) may be interpreted as the mean coalescence probability for pairs of alleles that are present in the same deme. The equilibrium expression for T0 may therefore be understood as follows. At any point in time a fraction 1−∑iai of alleles sampled from a deme will have left their deme and will return to a single deme with a probability of ∑ibi or on average each 1/∑ibi generations. A fraction ∑iai will not leave the deme and will be in the same deme in the previous generation with probability ∑iai. This is equivalent to a fraction ∑iai sharing the same deme every 1/∑iai generations. On average, alleles will thus share a deme every (1−∑iai)(1/∑ibi)+(∑iai)(1/∑iai)=((1−∑iai)/∑ibi)+1 generations. Each generation that two alleles spend in the same deme they have an average coalescence probability P̄, so the mean time to coalescence is given by equation (3). Within-deme coalescence is thus essentially a function of the time that allele pairs spend within a single deme.

The average coalescence time for allele pairs sampled from two different populations is given in equation (4) by the average time 1/∑ibi it takes for two non-colocating alleles to reach the same deme and the mean time needed for two alleles entering the same deme to coalesce. A fraction P̄b of allele pairs coalesces upon entering the same deme and a fraction 1−P̄b coalesces in T0 generations.

Metapopulation model for farmer-managed maize

We will proceed by presenting the parameters of our crop metapopulation model that will allow the estimation of T0 and T1 as described above. We consider a diploid, monecious plant species with random mating within demes. There are n demes, each of which consists of seed from Nf ears, yielding N mature plants, with Nf≪N and a fixed number of N/Nf seeds per ear. Generations are discrete, and the life cycle of each deme consists of two consecutive phases: a reproductive phase and a seed phase. During the reproductive phase random pollination, pollen migration and zygote formation occur. Each new seed that is formed contains a maternal allele inherited from one of Nf ears and a paternal allele derived from one of N pollen parents. A proportion of 1−mg of all paternal alleles will result from random pollination by pollen from the same deme whereas a proportion mg result from migrant pollen from other demes. Pollen migration follows an island model with migrants originating from any of the other n−1 demes.

The seed phase begins after flowering and lasts until the onset of the next reproductive phase. It is in this phase that extinction, recolonization and seed migration take place. Extinction occurs with probability e. Each generation, n e demes go extinct and n(1−e) demes remain. An extinct deme is replaced by introducing Nf ears from the non-migrant fraction of any of n−1 extant demes (propagule pool model), with no subsequent migration during the seed phase. Seed migration into individual demes is episodic, occurring with probability pm. Consequently, an expected fraction pm of all n(1−e) extant demes receive seed migrants from any of n(1−e)−1 potential source demes. There is a single seed source per generation for each deme. For demes in this fraction, Nfm migrant ears are planted in addition to Nf−Nfm ears taken from the resident deme. The fraction of migrant seeds is thus m=Nfm/Nf in demes undergoing migration and m̄=pmm in all extant demes. For mathematical simplicity, we will assume that n(1−e) is large so that n(1−e)≈n(1−e)−1 and we will use n(1−e)−1 as the number of seed sources for both migrants and colonists.

At the end of the seed phase the metapopulation consists of a set of 2Nn gene copies that can be divided into non-overlapping subsets of paternal and maternal alleles that did or did not undergo seed extinction, seed migration or pollen flow (Table 1). The proportions represented by these subsets are assumed to remain constant over time. Genetic diversity within this system may now be described as the average time to coalescence for pairs of lineages sampled from the total collection of allele subsets. As outlined in the general model, different combinations of alleles may have different coalescence probabilities when co-locating. Table 2 presents these different probabilities and the corresponding expected fractions ai and bi of co-locating allele pairs. Derivation of these terms is given in the Appendix, and R code to calculate T0, T1 and FST under our model is available upon request.

Table 1 Representation of maternal (seed) and paternal (pollen) allele fractions in a metapopulation
Table 2 Coalescence probabilities for allele-pairs sampled within and between demes and corresponding co-locating fractions

Simulation study

We compared our theoretical results to expectations from simulated data. We made use of a stochastic, biallelic simulation algorithm developed for maize and described in Piñeyro-Nelson et al. (2009), modifying it to exactly match the assumptions of our metapopulation model (C++ code is available upon request). Reproduction and seed management were explicitly modeled, with N, Nf, Nfm and mg included as deterministic parameters and pm and e as binomial probabilities. FST in each run was calculated directly from the simulated allele frequencies as

Results

We can now use the metapopulation model outlined above to investigate the effects of farmer-mediated demographic processes on genetic diversity and structure in crop metapopulations. In the following sections, we discuss the behavior of the model by deriving analytical approximations, and present graphical results from the full model using the general equilibrium solutions presented in the Appendix.

Effective size of individual demes and coalescence time

The common practice of selecting a limited number of ears per deme as the source of the next generation's seed reduces the effective size of individual demes with respect to the census size N. The inbreeding effective size of a panmictic population is related to the mean probability of coalescence P in the previous generation by Ne=1/2P (Kimura and Crow, 1963). In our model, we may hence calculate the effective size of a single deme without migration by setting pollen and seed migration to zero and substitute the terms ai and Pi from Table 2 in equation (5). This yields:

which is identical to Crossa and Vencovsky's (1994) variance effective size with female gametic control. We will use the term Ne to describe the effective size of a single deme in the absence of migration throughout the paper, rather than as effective size of the metapopulation. In the classical metapopulation model without extinction, there is only a single coalescence probability for any pair of parental co-locating alleles. Therefore, PÌ„=P and we may write:

In case of different coalescence probabilities, PÌ„ does not need to be equal to P. It can be shown numerically, however, that PÌ„ in our model closely approximates 1/2Ne under a wide range of parameter values. We may thus use equation (7) as an approximation to T0. Moreover, assuming Ne is large we will use

for between-deme coalescence time. We will make further use of expressions (7) and (8) as they greatly simplify comparison to previous results.

Extinction, seed migration and within-deme coalescence time

Under most models of subdivided populations, the weighted mean within-deme coalescence time can be shown to be unaffected by the rate of migration (reviewed in Nagylaki, 2000). For the classical metapopulation model, with conservative migration and equal deme sizes, this means that T0 has an expected value of 2Nen (Nagylaki, 1998). Pannell and Charlesworth (1999) showed that including extinction leads to a breakdown of this invariance result. Under extinction, T0 increases with migration rate because genetic diversity that is lost in the process of extinction and recolonization is partially restored by diversity contained in the migrant pool. When extinction is assumed absent, invariance follows directly from the equilibrium solution for T0 in the classical metapopulation model. Substituting a=(1−m)2+m2/(n−1) and b=m2/(n−1)+2m(1−m)/(n−1) from Pannell and Charlesworth (1999) into equation (7), we may thus write:

The term m2 represents the fraction of allele pairs sampled from two migrant alleles. As migrants are assumed to be a random sample from the metapopulation, they have a co-location probability of 1/(n−1). When n is large, m2/(n−1) can be ignored and equation (9) reduces to 2Nen. Invariance to migration rate may thus be understood as the balance between the fraction 1−∑iai of allele pairs that do not co-locate and the fraction ∑ibi that relocates from different demes.

Seed migration in our model differs in two key aspects from migration in a classical metapopulation. First, migrants are sampled from single source demes rather than from the entire metapopulation. Second, migration is defined by both a frequency (pm) and a quantity (m) instead of by a single parameter. The response of T0 to changes in the quantity of exchanged seed m under different rates of extinction is shown in Figure 1a. Clearly, the invariance result does not hold with respect to m, even in the absence of extinction.

Figure 1
figure 1

Within-deme diversity as a function of extinction rate e and (a) quantity of migrating seed m (with pm=1) or (b) seed migration frequency pm (with m=0.25). Assumes N=1000, Nf=200, mg=0.01, n=100.

We will explain this result mathematically by substituting the terms ai and bi from Table 2 into (3), and then setting pm to unity and mg to zero and defining

The term m2 in the numerator in (10) is now divided by unity instead of by n−1 as was the case for the classical metapopulation model. This difference arises because under single source migration such as assumed in our model, two alleles that are sampled from seed migrants in the same deme always co-locate. The term m2 can thus not be ignored when m is high. At e=0, increasing m leads to a decrease in T0 from approximately 2Nen when m is close to zero to 2Ne when m is one. When e>0, the term ɛ in the denominator lowers T0. This is partially reverted as 1−(1−m)2 in the numerator becomes larger with larger m. The numerator in equation (10) equals zero at both m=0 and m=1 and has a maximum at m=0.5. Therefore, T0 increases monotonically with m until reaching a maximum, which is dependent on e, and then decreases for higher m. The relation between T0 and migration quantity thus deviates strongly from what would be expected under the classical metapopulation model.

If we no longer assume pm=1 as above, we can see the effect of seed migration frequency on within-deme coalescence time as well (Figure 1b):

We note that when m is small so that we may ignore terms containing m2, T0 is invariant with respect to pm when e=0 and increases with pm when e>0 as predicted. At higher m, m2 may no longer be ignored. As 2pmm2⩾pm2m2, migration will always lead to a value of T0 that is below 2Nen. The term −pm2 in the denominator of (11) decreases with pm more rapidly than the term −2pm in the numerator, causing T0 to rise in response to migration frequency. Again, the invariance result breaks down, and we may conclude that single source migration causes dependence of within-deme coalescence time on both seed migration quantity and frequency.

The above results follow directly from the interpretation of T0 as the ratio between co-location and relocation from different demes. After extinction, the co-location probability for alleles in the same deme is not affected because all colonists derive from the same deme. But recolonization increases the probability that alleles in different demes co-locate, which decreases the time alleles spend in different demes and reduces within-deme coalescence time. This effect is exacerbated by the lower number of extant source demes, which further increases the probability of co-location for alleles in different demes. A similar explanation underlies the effects of m and pm. Under single source migration, the origin of immigrants within a deme is completely correlated and migration quantity, m, is therefore not equal to the rate at which lineages separate into different demes. Increasing m will decrease the proportion of co-locating alleles within a deme but at a decreasing rate until half of the alleles in a deme are migrants. Increasing m beyond this point will result in a higher proportion of co-locating alleles until all alleles co-locate at m=1. At the same time, the rate of relocation of alleles from different demes increases monotonically over the entire range of m, leading to a loss of invariance with respect to m. The response to pm is different because seed sources for each deme are independent. As more demes receive migrants, there is a proportionally higher probability that two demes receive migrants from the same source and thus co-locate. The time that alleles spend in different demes thus remains approximately unchanged.

Expectations for FST in the classical model without extinction

Many empirical studies of subdivided populations use Wright's (1951) fixation index FST or similar measures as an estimator of the amount of gene flow between demes. Under the island model with infinite demes and low migration rates, the expectation for FST is given by 1/(4Nm+1). Although recognized as too simplistic (Whitlock and McCauley, 1999), this formula serves as the basis for two general predictions with respect to genetic structure. First, an increase in the number of migrants, Nm, always reduces genetic structure. Second, FST will be approximately independent of deme size provided that Nm remains constant.

As expected, both expectations hold in the classical metapopulation model without extinction. Substituting T=T0/n+T1(1−(1/n)) into the equation FST=(T−T0)/T, we obtain:

which is identical to the result obtained by (Wright, 1951), and to his reduced equation when m is small.

Seed migration and FST

Figure 2 shows the response of FST to the quantity and frequency of seed migration in our model. The response of FST to m differs strongly from what is predicted by the classical model. Instead of the usual hyperbolic relation, the response of FST to migration is parabolic with a minimum at m=0.5. We can derive that this result is because of the assumption of single source migration by analyzing the equilibrium solution for FST without extinction or pollen flow.

Figure 2
figure 2

FST as a function of seed migration frequency (pm) and quantity of migrating seed (m). Assumes n=100, N=5000, Nf=200, e=0.2, and mg=0.01.

Ignoring extinction and pollen flow, and assuming n → ∞, the relation between FST and migration in our model is given by:

Because migrating seed derives from a single source in each generation, m=0.5 represents the point where the proportion of alleles that come from different demes is maximal and inbreeding is lowest. Any further increase in m increases the proportion of co-locating alleles within demes and will therefore cause an increase in the genetic structure. In contrast, migration frequency determines the amount of migrant seed that comes from different demes. For small m, the effect of pm can be explained under the classical model, as and thus . A combination of high m and low pm, however, may result in a higher value of FST than expected on the basis of the number of migrants Nm. Nonetheless, the negative relation between pm and FST will hold regardless of the magnitude of m.

Deme size and FST

In our model, the quantity (m) and frequency (pm) of seed flow may vary independently of one another. If we redefine m in equation (13) as Nm/Ne, where Nm is the effective number of seed migrants, we see that, in practice, the average effective number of seed migrants, pmNm, may be low whereas the effective number of seed migrants entering a receiving deme, Nm, is high. An important consequence of this model property is that FST becomes dependent on deme size, illustrated in equation 14:

When Nm is relatively large with respect to Ne, greater deme size thus causes a reduction in FST similar to that caused by migration. Figure 3 illustrates this by showing FST as a function of pm and Nf, given a fixed number of migrants Nfm. This effect is of potential importance in agricultural systems because quantities of migrant seed can be high.

Figure 3
figure 3

FST as a function of seed migration frequency pm and number of planted ears Nf. Assumes N=1000, n=100, Nfm=100, e=0.3, and mg=0.

Extinction and FST

In metapopulations with extinction, Wright's (1951) classic formula for FST no longer provides an adequate description of the relation between seed flow and genetic structure. Under Slatkin's (1977) model II with propagule pool recolonization, extinction increases differentiation among demes (Wade and McCauley, 1988; Whitlock and McCauley, 1990; Pannell and Charlesworth, 1999). This result is mostly due to the strong drift which occurs during recolonization bottlenecks. The present model does not share Slatkin's assumption of a bottleneck after extinction. Consequently, our results on the effect of extinction on FST are rather different. Figure 4 shows the full model results for FST as a function of the extinction rate at different frequencies of seed migration and for different numbers of demes.

Figure 4
figure 4

FST as a function of extinction rate e and (a) seed migration frequency pm (with m=0.25 and n=100) or (b) number of demes n (with m=0.25 and pm=0.05). Assumes N=5000, Nf=200, and mg=0.01.

As seed migration becomes more frequent, FST is indeed increased by extinction until total diversity becomes so low that any further increase in extinction will lead to an effective decrease in FST. At low migration frequencies, however, FST is decreased by extinction. The reason for this can be seen in expression (15):

The denominator consists of the sum of two terms that respond inversely to changes in e. When pm is small the first term becomes negligible compared with the second and FST will decrease with increasing e. On the other hand, when n becomes very large, the second term tends to zero and FST will respond positively to extinction. Equations (16) and (17) present the cases for pm=0 and n → ∞, respectively.

This result shows that the effects of extinction on differentiation depend both on n and migration frequency; the conclusions drawn by (Wade and McCauley, 1988) thus hold for large n but become dependent on migration frequency when n is low.

Seed management in the presence of pollen flow

In the results presented so far, pollen migration was assumed low to explore the effects of human-mediated gene flow on genetic diversity and structure. In reality, pollen migration may be extensive, and our ability to detect the effects of seed-related factors will depend on their interaction with pollen flow. It thus becomes relevant to know the sensitivity of genetic structure to seed management under different levels of pollen flow. Figure 5 shows results for our full model on the response of FST to extinction, migration frequency, migration quantity and number of ears planted at different levels of pollen flow (mg=0.001, 0.005, 0.01, 0.04). For the effect of deme size, pollen flow was defined by a fixed number of pollen migrants for each level. At the lowest level of pollen flow the response to the seed-related parameters is quite strong. At the highest level, however, the presence of pollen flow is dominant and overrides most effects of seed management on genetic structure.

Figure 5
figure 5

Clockwise from upper left: FST as a function of extinction rate e, migration frequency pm, number of planted ears Nf, and quantity of migrating seed m at different levels of pollen flow (mg=0.001, 0.005, 0.01, 0.04). In each pane, pollen flow increases from higher to lower curves.

Simulation study

Our model predicts expected coalescence times based on fractional coalescence probabilities. In doing so it treats probabilities, such as pm and e, as fractions of possible allele pairs. This makes the model mathematically tractable but raises the question whether the deterministic predictions hold when stochasticity is introduced. Furthermore, we define FST as a ratio of expected coalescence times rather than as the relative reduction of heterozygosity that forms the basis for most empirical FST estimates. Although the two measures are theoretically equivalent (Slatkin, 1991), it would be desirable to confirm that FST estimated from allele frequencies indeed concurs with our calescence-based predictions. We therefore performed a simulation study to evaluate the accuracy of our model. Using a modified version of the simulation algorithm of Piñeyro-Nelson et al. (2009), we performed stochastic simulations of a single biallelic locus under our metapopulation model. We calculated FST across the metapopulation for a range of values of m, e and Nf for each of the three values of pm. Results were obtained by averaging over 100 simulations of 2000 generations each. These simulated data match our theoretical predictions almost perfectly (Figure 6), providing strong corroboration of our analytical results under the specified metapopulation model and assumptions.

Figure 6
figure 6

Comparison of simulated data (open symbols) to analytical predictions (solid lines). From left to right: FST as a function of quantity of migrating seed m, extinction rate e, and number of planted ears Nf (with constant number of pollen migrants) at different levels of migration frequency (pm=0.2, diamonds; 0.02, triangles; 0.002, circles). All graphs use mg=0.005, e=0, Nf=200, Nfm=100 and n=100 when these parameters are fixed.

Discussion

The determinants of neutral genetic diversity and structure are of substantial interest to evolutionary and conservation biology. But while molecular markers can be used to describe the observed distribution of genetic diversity within and among populations, the interpretation of such data relies on models of subdivided populations that adequately represent the population genetics of the system under study. Following the introduction of Wright's infinite island model, a large body of theory has accumulated showing how deviations from basic assumptions can affect model behavior. Examples include the introduction of stepping stone migration (Kimura and Weiss, 1964; Slatkin, 1991), extinction-recolonization dynamics (Maruyama and Kimura, 1980), seed and pollen migration (Wang, 1997) and stochastic and kin-structured migration (Levin, 1988; Whitlock and McCauley, 1990). The results from these model refinements suggest that when a system is well defined, incorporating system-specific model features can provide a better understanding of population genetic processes. Although there has been a growing interest in understanding the genetic diversity of agricultural plant species under traditional management and the population dynamics of many crops is well documented, explicit models describing the population genetics of subdivided crop populations are currently unavailable. To our knowledge, the adapted metapopulation model presented here represents the first attempt to incorporate aspects unique to farmer-managed metapopulations into an explicit population genetic framework, and as such presents a significant step forward in our understanding of the effects of management practices on patterns of genetic diversity.

The main property that sets our present model apart from existing metapopulation models is that seed dynamics is mediated by conscious human intervention. Specifically, this translates into the assumptions of single source migration and the absence of a population bottleneck following extinction. Both assumptions are supported by empirical data (Rice et al., 1998; Badstue et al., 2007) and follow naturally from the basic need to obtain enough seed to ensure a successful harvest at minimal cost. A farmer's response to personal seed shortage is usually to look for a sufficient seed from a reliable source, often a friend or family member (Almekinders et al., 1994; Zeven et al., 1999; Badstue et al., 2007; Barnaud et al., 2008). Occasional departures from these assumptions may of course occur (Brocke et al., 2003), but providing they are infrequent, such deviations are unlikely to qualitatively change our results. Although, we use data on traditional maize agriculture to define the parameters of our model, studies on other crops have reported similar dynamics (Longley and Richards, 1993; Almekinders et al., 1994; Brocke et al., 2003; Barnaud et al., 2008), and we expect that our basic model predictions should apply to many traditionally managed species. A minor difference between our model and more general models is that migration is essentially kin structured because of the movement of ears rather than individual seeds. Other models have explored the effects of kin-structured migration in detail (Whitlock and McCauley, 1990). In our case, kin structured migration is of little theoretical interest because genetic sampling in the resident proportion of demes is similarly kin structured, so that migration remains a simple proportion of effective deme size.

By evaluating approximations for equilibrium coalescence times, our model provides insight into the mechanisms shaping neutral genetic diversity in crop metapopulations. Our predictions deviate significantly from those emanating from classical models of subdivided populations in several respects. First, the effects of single source migration on within-deme diversity and FST suggest that it is impossible to characterize gene flow by a single migration parameter, because the magnitude and frequency of seed migration have different and sometimes opposing consequences (Figures 1 and 2). The correlated origin of migrants causes a relative decrease in the time that alleles spend in different demes, leading to a loss of invariance of within-deme coalescent time with respect to migration as well as deviation from the monotonic relationship between FST and migration quantity. Second, the independence of migration frequency and quantity means that deme size may affect genetic structure (Figure 3), especially when migration is rare but involves large numbers of seeds. Dependence of differentiation on deme size has been reported in some theoretical studies on specific systems, for example by Ingvarsson (1997), who reported lower differentiation in small demes in a model of delayed population growth. Deme size is often ignored as a determinant of genetic structure however, based on the classical prediction that for low migration rates only the number of migrants affects FST (Wright, 1951). When faced with a shortage of planting material, however, farmers are likely to incorporate large quantities of migrant seed, suggesting that deme size is a factor that should be accounted for in order to understand the genetic structure in agroecosystems. Third, because a farmer can be expected to obtain sufficient seed in case of seed loss, the effects of extinction take a different form than in classical metapopulation models. The absence of a bottleneck after recolonization means that FST does not always increase with extinction as predicted by other models (Wade and McCauley, 1988; Pannell and Charlesworth, 1999), but instead depends on migration and the number of demes (Figure 4). Although finite deme number has been considered as a factor influencing FST Wade and McCauley, 1988; McCauley, 1991, previous work has not investigated the interaction of deme number and extinction.

The primary purpose of our model is to describe the genetic consequences of seed management. Our results suggest that researchers interested in linking empirical observations of genetic structure to data on farming practice should distinguish between replacement, migration quantity and migration frequency when collecting data, and that estimates of the number of planted ears and pollen migration are also required. Given our results showing the dampening effect of pollen migration, consideration of pollen flow should be of particular importance in any empirical study. Unfortunately, few estimates of pollen flow in traditional agroecosystems exist (for example Louette et al., 1997), and the dynamics of pollen migration are likely to be location specific (Messeguer et al., 2006).

We are not aware of any single study giving precise estimates of the above parameters; to show the use of field data to explain genetic differentiation, however, we compile seed management data on six maize farming communities in the Central Valley of Oaxaca from published articles and unpublished interview data obtained by the International Maize and Wheat Improvement Center (CIMMYT). We compare the predicted genetic structure from these data to empirical estimates of structure from the same region (Pressoir and Berthaud, 2004). Records of average planting area (2.5 Ha) (Smale et al., 1999), seed quantity planted (16 kg Ha−1) (Badstue et al., 2007) and grain weight of a single ear (70, 0.38 g per kernel) (Soleri and Smith, 2002) yield an estimated Nf≈560 and N≈100 000. Futhermore, CIMMYT interview data suggest that pm≈0.02 (M Bellon, personal communication), estimates of seed lot replacement provide e≈0.1 (Smale et al., 1999), and the mean quantity of exchanged seed by farmers (12.5 kg) leads to m≈0.30 (Badstue et al., 2007). In the absence of pollen flow data for the region we use the average estimate mg≈0.018 from adjacent fields reported by Messeguer et al. (2006). From these data, we calculate an equilibrium FST=0.008, quite close to the reported value of FST=0.011 (Pressoir and Berthaud, 2004).

The above example shows that our model produces reasonable values of population structure based on farming system data. Although the parameter values used are relatively rough estimates, the value of the model is precisely that it provides a means of assessing the effects of parameter variation. In this particular case, relatively high pollen flow causes close agreement with approximations from classical models (island model: FST≈0.007, Slatkin's model II: FST≈0.009), but we now know that specifics of seed management may cause deviations from these models when pollen migration is limited. Our model allows, for the first time, clear identification of the specific data required to explain observed population structure in traditional agricultural systems. Knowing whether farmers have ever mixed seed, for example, is insufficient; predicting genetic structure requires quantitative estimates of amounts and frequencies of seed migration. Our model thus serves as a guide to the kinds of data that breeders or conservationists interested in genetic structure in crop systems must collect.

It is important to point out that the current model is framed in terms of fixed parameter values and equilibrium conditions. Exact prediction of genetic structure under specific field conditions, however, may be better served by explicit computer simulations (Piñeyro-Nelson et al., 2009). Nonetheless, the close correspondence between our model results and computer simulations shows that our main predictions are robust to stochasticity. Rather than serve as a detailed predictive method, however, we feel the value of the present work lies in providing a better understanding of the general behavior of genetic diversity in crop metapopulations. It is our hope that this work will be the first step toward a more quantitative approach to the study of crop metapopulations, paving the way for explicit—rather than metaphorical—analysis of the role that farmers have in shaping genetic diversity.

Finally, our study provides an example of the benefits of incorporating information from well-defined systems to create more refined population genetic models. Although we have focused on the genetic structure within traditionally managed crops, we suggest that similar analytical evaluation of well-defined natural systems may also lead to interesting and potentially novel results.

Conflict of interest

The authors declare no conflict of interest.