Introduction

Little research has been directed towards understanding the evolutionary process behind plant evolution in agricultural ecosystems in traditional farming communities. Maize populations, like natural populations, are subject to migration and drift, to both natural selection and farmers' selection, and finally to local extinction and recolonization processes. Therefore, in order to gain a complete idea of the system, it is important to document in detail the genetic dynamics in these farmers' fields and the impact of their various practices.

Maize diversified first in the highlands of Mexico soon after domestication. Matsuoka et al (2002) show that the domestication of maize is based on a unique event, and that maize accessions from the highlands of Oaxaca are genetically the closest to the wild ancestors of maize (Zea mays ssp. parviglumis). They report that the basal-most maize in a phylogenetic analysis, including maize and its wild relative teosinte, are maize accessions collected in regions close to the Central Valleys of Oaxaca. Furthermore, archeological work (Benz, 2001; Piperno and Flannery, 2001) has revealed remains of the oldest known maize in the State of Oaxaca, at the Guila Naquitz cave, dating back to 4200 B.C. The Oaxaca region has also been reported to hold a large amount of variation in terms of phenotypic diversity (Bellon et al, 2003).

Today in Oaxaca, farmers still cultivate maize populations in a traditional manner. Saving seed from one season to the next is a well-defined practice. In addition, the Central Valleys of Oaxaca show very little presence of, or impact from, modern varieties (Smale et al, 1999; Bellon et al, 2003), with most of the area still planted in local landraces. Therefore, this region offers unique conditions for the study of the evolutionary processes that are key to maize evolution. The concept of ‘landrace’ is complex (Zeven, 1998) and its complete definition remains an issue of contention among some parties. For the purposes of this paper, the term landrace refers to a maize population cultivated in a traditional fashion and managed by a single farmer.

Small-scale Mexican farmer management practices are central to the evolution of maize and its diversity. Key practices include the planting of numerous maize populations within a small area. Consequently, even if desired, farmers are incapable of preventing the exchange of pollen between populations (Bellon and Brush, 1994; Louette et al, 1997). Furthermore, Mexican farmers commonly acquire seed from both local and distant farmers or sources, often mixing their seed with seed from other farmers or that purchased from markets (Louette and Smale, 2000).

In addition to farmer management, the biology of the species is expected to play a major role in structuring maize populations. Maize is a monoecious species that bears two inflorescence types, the staminate tassel and the pistillate ear. There is generally a delay between male and female flowering, with male flowering generally occurring before female flowering. If the delay between male and female flowering is short enough, it could allow assortative mating to occur. Homozygous excess, and its variation across loci and populations, has been reported for maize open-pollinated populations using both isozymes and RFLP markers (Brown and Allard, 1970; Kahler et al, 1986; Salanoubat and Pernes, 1986; Lefort-Buson et al, 1991; Garnier-Géré, 1992; Dubreuil and Charcosset, 1998). Assortative mating would produce a locus-dependent Wahlund effect (Nevo et al, 2000), which was tested in the research reported herein.

In this paper, we assess the genetic diversity and population structure of maize landraces from the Central Valleys of Oaxaca. Because 83% of the total maize in this area is white kernel maize (Smale et al, 1999), this study focuses on this maize type. We assess key agroecological factors and components of farmer management behind maize population dynamics. We also describe how this genetic diversity is structured, how management practices by farmers have an effect on population differentiation, and effects related to distance and seed exchange among Oaxacan farmers. In addition, we investigate the consequences of some flowering traits that could play a role in assortative mating in maize landrace populations. We use markers specifically linked to the QTLs of flowering traits, and other markers scattered throughout the genome.

Material and methods

Survey on farmer management

We selected six villages (Figure 1) for this study, because of their contrasting situations, in terms of ethnicity, and their maize production potential, based on an earlier study (Smale et al, 1999; Bellon et al, 2003). In all, 10 households per village were randomly chosen, giving a total of 60 farmers surveyed. We gathered information on farmer seed management and on seed exchange practices between and within villages.

Figure 1
figure 1

Location of sampled villages in the Central Valleys of Oaxaca around the city of Oaxaca de Juarez (Mexico). Villages are numbered from 1 to 6. Altitude, latitude, and longitude are also given: 1. Huitzo, 1730 masl (meters above sea level) 17°15′N 96°51′W; 2. Mazaltepec, 1700 masl 17°06′N 96°52′W; 3. San Lorenzo, 1830 masl 16°51′N 96°16′W; 4. Amatengo, 1310 masl 16°30′N 96°47′W; 5. Valdeflores, 1447 masl 16°45′N 96°49′W; 6. Santa Ana, 1520 masl 16°50′N 96°42′W.

Material used for the genetic analysis

Of the six studied villages, Santa Ana and Huitzo showed the highest contrast in ethnicity, number of maize populations per farmer, and potential in terms of maize production (Smale et al, 1999), and were therefore studied more extensively. A total of 31 populations were assayed, including field evaluation and genotyping. These included nine populations from Huitzo, three from Mazaltepec, three from San Lorenzo, three from Amatengo, three from Valdeflores, and 10 from Santa Ana. Households were selected randomly among the farmers cultivating a population of white kernel maize within each village. Sampling was carried out within a single generation. We randomly selected 20 open-pollinated families for each population and genotyped one individual per family; field evaluation for flowering traits was carried out on 18 of these families.

Simple sequence repeat genotyping

A total of 11 microsatellite markers were assayed. SSR primers were selected from the maizeDB database of public SSRs, and included the following: phi011, phi227562, phi96100, phi101049, phi029, phi093, phi024, phi452693, phi034, phi014, and umc1061. They consist of tri- or tetranucleotide repeats. Markers were selected according to their chromosomal locations, in order to provide for genome-wide coverage, and also by the size of the amplification product, to allow multiplexing on an automated DNA sequencer. Sequences and mapping positions can be downloaded at http://www.agron.missouri.edu/ssr.html. Of the 11 microsatellites used for this study, most did not exhibit a stepwise variation. Three of these markers (phi011, phi024, and phi452693) map close to genes or QTLs involved in flowering time or anthesis-silking interval (Veldboom et al, 1994; Ribaut et al, 1996; Gale and Devos, 1998; Thornsberry et al, 2001).

Cytoplasm genotyping

All populations were also characterized with chloroplastic markers. We used a polymorphic chloroplast set of primers surrounding a polyA repeat in the psbK/psbI intergenic region with the following sequences: zmcp7430-F: CGAAGCTGCTGTAAGTTTTCG and zmcp7430-R: AAGACTTCTCGGCTCTTATCCA (Provan et al, 1999).

Analysis of the among-population genetic structure

Because villages were not randomly selected, the genetic description given is specific to our sample. Overall, Fst=θ (Weir and Cockerham, 1984) was calculated for the entire set of 31 populations. Jackknifing over populations and loci was used to provide a confidence interval, according to Weir (1996). θ values were estimated using GDA 1.1 software (Lewis and Zaykin, 2002), which performs hierarchical F-statistics. Different levels of population subdivision were tested as suggested in Weir (1996). For a random mating population (within sample) or random distribution of individuals (between samples), F-statistics are expected to be null.

A matrix of pairwise Fst/(1-Fst) was estimated as well as a matrix of geographic distances ln(Dist) (Rousset, 1997) between villages (geographic coordinates provided in Figure 1) to test for isolation by distance. A Mantel test (Sokal and Rohlf, 1995) was used to test for the independence of the matrices.

The ratio of pollen to seed flow, according to Ennos (1994) is:

where FSTm is the FST calculated for microsatellite markers and FSTc is calculated for cytoplasmic markers.

Analysis of the genetic variation within populations

We estimated the expected unbiased heterozygosity (Nei, 1987) He=2n(1−Σipi2)/(2n−1) for microsatellite markers and He=n(1−Σipi2)/(n−1) for cytoplasmic markers. Homozygote excess was estimated according to Weir and Cockerham (1984) within population and for the whole set of populations. For pairwise linkage disequilibrium among loci, the within-population correlation coefficient R (Weir, 1979) was calculated and tested by permuting genotypes within locus within population using the GENETIX 4.02 software (Belkhir et al, 2001).

Within-population genetic variation in relation to temporality in flowering

In order to characterize populations for flowering, 18 open-pollinated families were sampled for each of the 31 populations. Up to 12 progenies per family were evaluated (giving a total of 31 × 18 × 12=6888 plants evaluated). Field layout was a two-replicate design with hierarchical structure (population plots randomly assigned and family plots randomly assigned within populations). The experiment was carried out at the CIMMYT experimental station at El Batán, Texcoco, Mexico. Days to silking (DS) and days to anthesis (DA) were assessed. Measurements were carried out under well-watered conditions and were averaged over family. Anthesis-silking interval (ASI) results from the operation ASI=DS-DA. Flowering range (FLOR) was estimated for each population FLOR=DSmax-DSmin where DSmax and DSmin are the maximum and minimum DS in a given population. The flowering range within a given population is an estimate of the heterogeneity in flowering within this population.

Genetic structure within populations was assessed by temporal autocorrelation analysis in the same manner as spatial autocorrelation analysis (Hardy and Vekemans, 1999). We used an estimate of Wright's coefficient of relationship, ρij, between pairs of individuals (Hardy and Vekemans, 1999) that corresponds to Moran's I-statistic using individual allele frequencies (Dewey and Heywood, 1988). The genotype of each individual (one individual per family) was used as the variable of interest to calculate ρij. The family mean flowering time was used in the same way as other authors have used spatial coordinates (Hardy and Vekemans, 2001). Therefore, we define as the divergence in flowering time between pairs of families Div=DA1-DA2 where DA1 and DA2 are the DA mean values of families 1 and 2, respectively. Regression analysis of ρij on Div was carried out using the SPAGeDi 1.0 software (Hardy and Vekemans, 2002). The probabilities under the hypothesis that there is no relation between Div and ρij, Pr values for obs=exp, were estimated after 10 000 random permutations of temporal locations. It is equivalent to carrying out a Mantel test.

Results

Farmer management of maize populations

Of the 60 farmers surveyed, six reported having participated in seed exchange between different villages during the last 10 years. Of these six reported cases, four correspond to the foundation of a new population and two to the mixing of seeds of the pre-existing population with seeds from another village. In all, 12 of the 60 farmers also reported that during the last 10 years they have mixed the seed of the pre-existing population with seed provided by another farmer from the same village, in order to provide enough seed for the next generation.

Principal coordinate analysis for molecular markers

All tested SSR loci were polymorphic in all populations. Principal coordinate analysis provides little evidence for population differentiation. Principal coordinates one and two together explain less then 6% of the total variation. Projection of the populations over the first two planes shows a uniform distribution and no grouping of populations. Furthermore, all populations show an overlapping of their distribution. Microsatellite polymorphism appears to be continuous and not related to geographic origin of the studied populations.

Among-population genetic structure

Fst values obtained both for nuclear and cytoplasmic (chloroplast) markers are indicated in Table 1. All populations, even those separated by up to 100 km, were found to share chloroplast DNA haplotypes. There was no statistical significance for isolation by distance for either microsatellites or cytoplasmic markers. Furthermore, we observed low among-village Fst values, significantly lower than among-population Fst values. This could be explained by long distance gene flow (ie, seed exchange between villages). Supporting the predominance of seed vs pollen flow is an estimate of the ratio of pollen to seed flow (Ennos, 1994), which is inferior to 1 with r=0.55 where FSTc=0.028 and FSTn=0.011. To estimate r, we did not consider among-village differentiation; we considered the inbreeding coefficient Fis=0, as we will show later in this paper that the observed excess of homozygotes does not correspond to consanguineous mating.

Table 1 Among-population genetic structure

Genetic variation within populations

Genetic diversity estimated over all populations is He=0.71. for microsatellite markers and He=0.49 for cytoplasmic markers. A significant departure from Hardy–Weinberg equilibrium is observed in almost all populations. Estimates of Fis over all populations by loci show significant variation of homozygote excess among loci (Table 2). To test the uniformity of the homozygote excess among loci, we checked the distribution of Fis. One empty region (gap) was found. The three loci that map close to genes or QTLs involved in flowering time or anthesis-silking interval (phi011, phi024, and phi452693) showed significantly higher Fis values than the others (Table 2).

Table 2 Population structure by loci

We evaluated the number of significant linkage disequilibria between pairs of loci. They are not higher than expected by chance alone at P5% (5.1% positive over 31 populations and 55 possible different pairs of loci). Using a Bonferroni correction, no significant linkage disequilibrium is observed.

Within-population genetic variation in relation to temporality in flowering

A summary of genetic variation and variation for flowering traits for all populations is presented in Table 3. Two populations show clear evidence of admixture of material of different flowering precocity. Population from farmer 235 (Table 3) shows a bimodal distribution of the flowering time, and the population from farmer 115 (Table 3) has a very wide flowering range compared with the distribution of the flowering range among all populations. These two populations will not be considered for regression analysis.

Table 3 Genetic and flowering variation within populations

The within population range in flowering time and anthesis-silking interval differs greatly from one population to the other. The regression of the overall homozygote excess (using all 11 microsatellite markers) on mean anthesis-silking interval and flowering range is highly significant (Figure 2 and Table 4).

Figure 2
figure 2

Homozygote excess (Fis) as a function of the flowering range (FLOR) and the anthesis-silking interval (ASI). Fis was calculated over all markers.

Table 4 Regression of homozygote excess Fis on flowering range (FLOR) and the mean anthesis-silking interval (ASI)

The population showing the most elevated Fis value (639 in Table 3) shows a significant correlation between the flowering distance and relatedness measured as Moran's I-statistic (temporal autocorrelogram shown in Figure 3). For this population, pairwise estimates of Moran's I-statistic (ρij) for the divergence in flowering (Div) class between 0.75 and 1.56 days significantly deviates from the expected value obtained by permuting locations as show in Figure 3 (Pr=0.029, star in Figure 3). In addition, the linear regression of ρij on Div is significant (Pr=0.0018). Regression slope is m=−0.0244 and the intercept, b=0.1136. These results suggest a clear case of assortative mating (correlation coefficient, r=−0.243) for this population. This population was tested for assortative mating because it is the only population showing a low anthesis-silking interval value, a high heterogeneity in flowering, and an elevated Fis value. To conduct the regression, we used only the three markers (described earlier) showing significantly higher Fis values than the others.

Figure 3
figure 3

Autocorrelogram showing the temporal segregation of the genotypes (Div is the divergence in flowering time) in function of the relatedness measured as Moran's I-statistic (ρij). Given for the population showing the most elevated overall Fis value and the lowest ASI value (population from farmer 639 in Table 3). Moran's I-values are computed using the individual genotypes as variables. Star symbol shows values that deviate significantly from zero (P<0.05). Moran's I-values are computed with SSR markers phi011, phi024, and phi452693. These loci showed significantly higher Fis values than the others.

Discussion

Low level of among-population differentiation

Village and distance do not appear to be determinants in population differentiation. The difference between cytoplasmic and nuclear Fst is not as large as expected when compared to other allogamous plants, and even less than that seen for some autogamous grasses (Ennos, 1994). Data from the survey on farmer management corroborate the genetic data analysis and indicate a large amount of seed-flow between maize populations within the Central Valleys of Oaxaca. Considerable seed exchange by farmers seems to be common in Mexico (Louette et al, 1997). Furthermore, while the proportion of farmers reporting seed flow is high, it could be underevaluated as farmers are keen to test new populations (Bellon et al, 2003), which often results in seed mixing as a consequence of storage practices. The maize ears in husks will frequently be kept after harvest in a single pile, regardless of whether they are from the tested population or a farmer's pre-existing population. We observed that populations from other villages are often tested, and, although the farmer may not adopt them, seed mixing may occur. Seed flow among farmers allows long distance gene flow within and among villages. These high levels of gene flow ensure the maintenance of high levels of genetic diversity. Levels of diversity are high when compared to those measured with maize accessions representing the entire maize genetic diversity in the Americas (Matsuoka et al, 2002). Therefore, a maize landrace should not be considered as a separate entity, but rather as an open genetic system. Furthermore, our results underline the importance of farmers' choices in determining gene flow among maize landrace populations, as seed flow results from farmers' decisions.

To the best of our knowledge, this population study in maize is the first to be conducted based on a small geographic area. In contrast to our research, other maize population studies (Sanou et al, 1996; Gauthier et al, 2002) looked at larger areas (Europe and Burkina-Faso) and showed much higher population differentiation. We believe more studies are needed on maize populations in Mexico and more generally the Americas, both at the regional and inter-regional scales, to investigate the patterns of population structure. Although distance does not seem to contribute to village isolation within small geographic areas, the Matsuoka et al (2002) results clearly suggest isolation by distance at a continental scale.

Temporal heterogeneity in allelic frequencies

While there is little among-population differentiation, a large amount of homozygote excess is observed within these populations. Usually, homozygote excess is attributed to consanguineous mating, population substructure, or to an artifact due to factors like null alleles. Enjalbert and David (2000) have inferred the outcrossing rate in wheat using molecular data at various loci. In a more recent study, Overall and Nichols (2001) have shown that it is possible to distinguish consanguinity from population substructure using multilocus genotype data. However, none of the above applies to the situation described in this paper. The biology of the species rejects the simple explanation of inbreeding, and the significant variation of homozygote excess among loci (Fis value differing considerably from one locus to another) does not correspond to what would be expected for population admixture. Nonrandom mating has been previously described in open-pollinated maize populations (Brown and Allard, 1970; Bijlsma et al, 1986). Kahler et al (1989) have shown that selfing did not contribute significantly to the inbreeding that occurred in the studied population. Here we present an unusual case of Wahlund effect and inbreeding, which is in some ways similar to that described in mole crickets (Nevo et al, 2000), and which corresponds to assortative mating. We show that in the populations with the most elevated homozygote excess, there is a significant correlation between relatedness and the mean family divergence in flowering time. It appears that overall Fis value depends on both the flowering range and on the anthesis-silking interval of a given population (FLOR and ASI together explain around 39% of the variation for homozygote excess). The existence of a large anthesis-silking interval has the consequence of preventing assortative mating because of the long delay between male and female flowering. A low anthesis-silking interval and large flowering range of the population will result in a temporal heterogeneity in allelic composition for maternal plants and the pollen pool. Work by Kahler et al (1989) had previously suggested that homozygote excess in an open-pollinated maize population could be the consequence of positive assortative mating resulting from an overlap in the flowering period of plants carrying alleles that are identical by descent from a recent common ancestor.

Marker loci with elevated Fis values map close to flowering genes

The three loci showing the most elevated Fis values are in regions known to be associated with flowering traits. Phi011 is within the interval between dwarf8 and indeterminate1. While dwarf8 has been shown to be associated with variation in flowering time (Thornsberry et al, 2001), Indeterminate1 is a putative transcriptional regulator of floral transition that is thought to be a major player in controlling flowering time. Phi024 is in a region that is syntenic to the dwarf8-indeterminate1 region and maps close to dwarf9, a possible duplication of dwarf8 on chromosome 5 (Gale and Devos, 1998). Phi452693 maps close to a major anthesis-silking interval QTL (Veldboom et al, 1994; Ribaut et al, 1996). A large flowering range within a given population will lower the effective population size of a given class of flowering time and therefore lead to interclass differentiation.

Regarding phenotypic evolution

Understanding population structure and its pattern are crucial to understanding phenotypic evolution. It makes possible association studies with clear assumptions about population structure and its origin. The Central Valleys of Oaxaca offer a unique model for the study of the impact of farmer management and selection on the phenotypic diversification and evolution of maize because of the large amounts of phenotypic variation (Bellon et al, 2003), the unique position of these populations in maize evolution and diversification (Matsuoka et al, 2002), and the patterns of population structure in this region described in our results. The observed variation in flowering range and in anthesis-silking interval between populations suggest that the pattern of population structure for these traits could be very different from that described for molecular markers. In a future paper, we will describe the pattern of population structure for quantitative traits and the impact of farmer management and selection on population differentiation for these traits, in order to understand the basis of phenotypic evolution in maize.