The past decade has seen progress in research about crop diversity linked to cultural diversity and social factors, but there is still much to understand about this complex relationship. Crop evolution and diversity depend on selection mediated both by the environment and by farmers (Harlan, 1975). Previous research has shown that social factors such as ritual use, identification as indigenous or mestizo, or even simple aesthetics, can contribute to the maintenance of particular landraces (Bellon, 1996; Zimmerer, 1996; Brush and Perales, 2007). We might thus expect that culturally determined preferences and perceptions have molded crop populations, but demonstrating the role of cultural diversity in generating crop diversity has been more difficult. Differentiation of maize populations and other crop populations by cultural variation was suggested long ago (Anderson, 1946; Hernández, 1972), but only recently has empirical data been reported (Perales et al., 2005; Labeyrie et al., 2014).

In México, most maize farmers select seed for use in each coming year, and this selection contributes toward maintaining distinctive morphological traits in the face of abundant pollen flow and extensive seed movement (Louette and Smale, 2000; Ortega-Paczka, 2003; Pressoir and Berthaud, 2004a). The diversity resulting from farmers’ management has conventionally been classified as races, and there are 59 maize races accepted based on research using extensive morphological and isozyme data (Sánchez et al., 2000). Cultural differences between groups may be expressed as preferences for colors, textures and uses for particular varieties (Ortega-Paczka, 2003), and also serve to erect barriers to the movement of seed (Hernández, 1972). Consistent with this, many workers have documented correlations between ethnolinguistic and biological diversity (Maffi, 2005), but the relationship between these factors is complex, in part, because ethnolinguistic groups often inhabit different environments and ecological niches (Brush, 2004). In contrast to the popular assumption that there is a direct relationship between ethnolinguistic diversity and maize diversity, there is little research that has formally and systematically addressed that interaction.

Genetic research describes continuous variation among maize, although regional clusters are apparent (Matsuoka et al., 2002; Vigouroux et al., 2008). Clustering is more evident in the use of morphological traits than with genetic markers to assess relationships among ecogeographic regions (Doebley et al., 1985; Sánchez and Goodman, 1992; Sánchez et al., 2000). Using morphological characteristics that are under farmer selection, social scientists and plant biologists have shown that farmers maintain morphologically distinct maize populations at much smaller regional scales (Pressoir and Berthaud, 2004a; Perales et al., 2005). Particularly, Perales et al. (2005) found that ethnolinguistic diversity in the same environment was linked to maize morphological diversity, but not to genetic differentiation. In contrast to the findings of Pressoir and Berthaud (2004b) and Perales et al. (2005); van Etten et al. (2008) found that maize populations from different villages within a small, culturally homogeneous region in Guatemala are both genetically and phenotypically separated. They confirmed, however, the central finding of the Mexican case studies that social origin has a significant role in determining the patterns of maize in the region. Interestingly, maize diversity in NW Guatemala was more discernible between communities than between regions, a finding that van Etten et al. (2008) attribute to patterns of seed exchange at local and regional levels and to diffusion of innovations at the regional level.

This paper examines maize diversity and population structure at the local scale and their relation to both ethnolinguistic variation and environmental variation (elevation) in southern Mexico. We studied farmers who speak either Chatino or Mixtec, two languages of the Oto-Manguean family that have been separated for ~4700 years (Kaufman, 1990). We collected maize samples from two environments—low and middle elevation—in two neighboring, indigenous municipalities with different language affiliations. We hypothesized that maize collections from the same municipality would be more similar than those from different municipalities, even though comparable environmental variation occurs within each. We found the effects of social origin (municipality) in structuring morphological and genetic diversity stronger than that of elevation. Application of a metapopulation model suggests that genetic differentiation is because of the lack of seed flow between municipalities.

Materials and methods

Study site

Fieldwork was carried out in the Sierra Sur of Oaxaca (Figure 1). This mountainous range extends along the Pacific Coast from southern Jalisco to the Isthmus of Tehuantepec in the state of Oaxaca. We worked in the Mixtec municipality of Santiago Amoltepec and the Chatino municipality of Santa Cruz Zenzontepec. These indigenous communities have an ancient and shared history: their townships are only 12 km apart and they have been affected by the same regional and historical dynamics. The topography is abrupt, with mountains, canyons and hills leading to elevation variation from 105 to 2150 meters above sea level (masl). The climate is hot in the lowlands and temperate in the higher elevations. The mean annual temperature is 26 °C in the lowlands and 18 °C in the upper elevations. The rainy season starts in May and ends in October, with an average from 1500 to 2000 mm precipitation per year (INEGI, 2013). Humidity differs strongly between the lowlands and highlands and is affected by the exposure in the hills. Soil diversity is high because of the complex geology; according to the most detailed available information, Litosol and Regosol eutrico are the most important soil types (INEGI, 2013). In general, soils present some level of erosion because of agricultural practices, runoff and wind. Municipalities are integrated internally through local governments and markets. Beyond some seed in backyard gardens, no hybrid or improved seed has been planted in the area. Crop management is similar in both municipalities, the only difference being the use of fertilizer in the Chatino municipality, Zenzontepec and not in the Mixtec municipality, Amoltepec.

Figure 1
figure 1

Map of villages and collections.

Maize collections and reciprocal common gardens

To test whether maize diversity and population structure are shaped by social origin and/or environmental factors, maize collections and common garden experiments with morphological characterization were performed. Four villages were selected: one Chatino and one Mixtec at middle elevation (1000–1300 masl) identified as Ch-M and M-M, respectively; and one Chatino and one Mixtec in the lowlands (400–600 masl), identified as Ch-L and M-L. A detailed description of the sampling and common gardens are described elsewhere (Orozco-Ramírez et al., 2014). We collected a total of 135 maize samples from the four villages (33 from M-M, 23 from M-L, 44 from Ch-M and 35 from Ch-L). Each maize sample consisted of 12 seed quality ears of each farmer-identified type that the household planted in the previous year. Ecological information and management of each sample was recorded by a survey, we considered that each sample represents one maize population. The samples of each village were grouped by local name and race, and organized according to variation in ear morphology. Five maize samples from each village were selected to plant in the common gardens; these samples resembled the total variation of maize in that particular village. In this analysis, we used only the data from two common gardens under fertilization treatment, one in the Chatino village low elevation (Ch-L) and another in Mixtec middle elevation (M-M). These were the fields with the best soil conditions to perform morphological characterization. Each of the common gardens had a complete random block design with three repetitions, 60 experimental units per garden (4 villages × 5 samples × 3 blocks). Each experimental unit had four furrows, each 5 m long and 0.8 m wide. Morphological data recorded from common gardens included days to anthesis, days to silking, tassel branches, tassel length, stem diameter, leaf length, leaves per plant, leaf width, ear height on the plant, plant height, ear diameter, ear length, grain rows, kernels per row and cob diameter. Twenty plants were measured from the two rows in the center. Flowering time was recorded when 50% of the plants had reached anthesis or were silking. For ear variables, all the plants of one row located in the center of the plot were harvested. We averaged each variable over plots.

Molecular analysis

Molecular analysis was carried out at the Colegio de Postgraduados, Mexico. We used the same 20 maize samples (but not the same physical individuals) as used in common gardens for microsatellite genotyping. DNA was extracted from 10 individuals randomly selected for each sample, using the standard protocol prescribed by the ChargeSwitch gDNA Plant Kit (Invitrogen, Carlsbad, CA, USA). We used 100–150 mg of seedling tissue. DNA extraction was made by a King Fisher Flex (Thermo Scientific, Waltham, MA, USA) automatic extractor. The DNA samples selected had a DNA concentration above 50 ng μl−1 and an absorbance ratio from 1.40 to 1.80 at 260/280 nm wavelength. The DNA was evaluated by a Nanodrop 2000 Spectrophotometer (Thermo Scientific). Extracted samples were genotyped for 15 microsatellite loci, listed in Supplementary Appendix Table A1. These markers were chosen from a larger group of SSRs (simple sequence repeats) optimized for multiplexing, and proved to be efficient to reveal genetic diversity in maize. For more information about these SRRs, see Fluorescently labeled primers (ROX, 6-FAM, HEX) were obtained for these loci (Invitrogen). Multiple PCRs were performed in a 25 μl reaction volume, containing 4 pmol μl−1 of R and F primer (Invitrogen), 0.16 mm of dNTP mix (Promega, Madison, WI, USA), 1.2 mm of MgCl2 (Promega), 0.8 × of GoTaq flexi buffer (Promega), 1 U of GoTaq flexi DNA polymerase (Promega) and 25 ng of DNA. The amplification program was: 95 °C for 4 min, followed by 25 cycles of 95 °C for 1 min, 55 °C for 2 min and 72 °C for 2 min, and then followed by extension at 72 °C for 60 min. PCR was performed in a GeneAmp PCR System 9700 (Applied Biosystems, Foster City, CA, USA). PCR products were analyzed by capillary electrophoresis in a 3130 Genetic Analyzer (Applied Biosystems). Fragment sizes were scored using GeneMapper v4.0 (Applied Biosystems).

Statistical analysis

Maize population structure based on morphological traits from the common gardens was analyzed by principal component analysis. We excluded days to silking and height of ear in the plant because these variables were highly correlated to days to tasseling and plant height, respectively. Data were standardized by subtracting the mean from each observation and dividing by the standard deviation. The unit of analysis was maize sample at each plot in the common gardens. To define the effect of elevation and social origin (different municipality) in structuring populations using morphological traits, we did a permutational multivariate analysis of variance using distance matrices (Anderson, 2001). This is a nonparametric method that partitions a distance matrix among sources of variation. For this analysis, we used the same variables as for the principal component analysis. The distance matrix was calculated using the Euclidian method on the standardized data and we allowed for 9999 permutations to calculate F statistics and to have an ample margin to reject the null hypothesis at an α-level of 0.05. For this analysis, we applied the function adonis in the package vegan (Oksanen et al., 2015) for R (R Core Team, 2014).

We used model-based clustering to evaluate population structure as implemented in the software STRUCTURE 2.3.4 (Pritchard et al., 2000), using the admixture model with correlated allele frequencies and allowing the model to use location information for the samples to assist the clustering. The estimated proportion of each cluster forming an individual genome (q) was calculated for K ranging from 1 to 10 populations, with 10 runs for each K-value. We used a burn-in period of 100 000 and 100 000 iterations for estimating the parameters. The criterion suggested by Evanno et al. (2005), based on the second-order rate of change in the log probability of data between successive K-values, was used to determine the most likely number of clusters (K).

Owing to low genetic differentiation among maize samples within village in the software STRUCTURE, we treat each village as a single population to then perform a locus-by-locus analysis of molecular variance (AMOVA) (Excoffier et al., 1992) grouping villages according to municipality. We preferred locus-by-locus AMOVA because there were some missing data and we included individual level in the calculations. Significance was calculated using 16 000 permutations and estimates of the proportion of variation at different levels were calculated as a weighted average across loci. Because each village within a municipality is located at different elevation, when testing within municipality, we are testing for the effect of environment in structuring the population. We also calculated a matrix of genetic distance (FST) (Wright, 1951) among villages based on the number of different alleles using Arlequin 3.5 (Excoffier et al., 2005), with significance (α=0.05) calculated after 1000 permutations.

Finally, we used the model proposed by van Heerwaarden et al. (2010) to estimate genetic differentiation because of seed management and seed flow. This approach models a collection of maize fields as a metapopulation and uses parameters from maize farmers’ practices in traditional agricultural systems to estimate FST following Slatkin (1991). We compared model-based estimates of FST to FST calculated from our SSR genotyping. FST was calculated for each village and globally using Arlequin 3.5 (Excoffier et al., 2005). FST confidence intervals were calculated over 2000 bootstraps. The model uses these parameters: number of demes (n), number of ears planted per deme (Nf), total number of plants per deme (N), number of migrating ears (Nfm), replacement probability (e), migration (mixture proportion) proportion (m=Nfm/Nf), proportion of seed mixture (pm) and proportion pollen migration (mg). Values used for each parameter are in Table 1, unless otherwise stated data used came from our field surveys (Orozco-Ramírez et al., 2014). Values for each parameter were obtained by village (Table 1) and were averaged or summed accordingly to obtain FST for several hierarchical levels: by municipality, by elevation and for all villages together as shown in Table 2. Number of demes (n) were calculated by multiplying number of households by mean number of samples over households for each village. Number of ears planted per deme (Nf) was calculated from average planted area, seed sown per hectare, average kernel weight in the region (Aragón-Cuevas et al., 2012) and the average number of kernels used as seed from each ear. The total number of plants per deme (N) was estimated by multiplying the average kernels per ear by Nf. The number of migrating ears (Nfm) was calculated from seed exchange averaged over farmers in each village (kg). First, we obtained the proportion of seed exchanged with respect to total seed planted and then multiplied that proportion by Nf. Replacement probability (e) was calculated as the proportion of new seed lots with respect to the total reported for the previous season. Initial values of the proportion of pollen migration (mg) were taken from the literature (Messeguer et al., 2006), but these were later fitted to the observed FST data.

Table 1 Seed management parameters for FST estimationa
Table 2 Results of van Heerwaarden et al. (2010) metapopulation model to estimate genetic structure based on seed management


Maize diversity in the area of study

On average, the number of landraces cultivated by a single farmer ranges from 1.33 in M-L to 1.89 in M-M. In M-M, the majority of farmers (67%) have two landraces. In the other three villages, most farmers had only one landrace. In M-L, fewer farmers had two landraces than in the other three villages. Only in Ch-M did farmers have four landraces, but the percentage of farmers with more than three landraces was very low, and most in that village have only one landrace. In total, we found seven racial groups in the four villages (Tuxpeño, Olotillo, Conej o, Tepecintle, Pepitilla, Elotes occidentales and Zapalote grande). A test for independence of race frequency by village finds little support for a dependence on elevation (P-value=0.06793, 1000 Monte Carlo simulations), but strong support for dependence on municipality (P-value=0.0001, 1000 Monte Carlo simulations). Mixtec communities have more of the Conejo and Tepecintle races, and Chatino villages have more of the Olotillo and Tuxpeño races (Figure 1 and see Supplementary Appendix Table A2).

Maize population structure based on morphological traits

Principal component analysis reveals a continuum of maize morphological diversity across the region. The plot of the first two principal components does not show clearly separate groups in either common garden (Figure 2). However, in both common gardens it is possible to see greater clustering of samples when labeled by municipality (Figures 2a and c) than by elevation (Figures 2b and d). The first component (PC1) for the M-M garden shows the most important variables are leaf number, leaf width, plant height, stem diameter and leaf length, meaning that PC1 accounts overall for plant size, separating large and small plants (Supplementary Appendix Table A3). For the second component (PC2), tasseling, ear diameter, row number and cob diameter have a large loading. PC2 differentiates earlier maturing plants and fatter ears. Principal component analysis for the Ch-L common garden shows the first component (PC1) accounting in general for plant size and the second for fatter ears (PC2).

Figure 2
figure 2

Principal component analysis plot (PC1 vs PC2) for morphological traits sorted by municipality, and elevation, data from middle and lowlands common gardens.

According to permutational multivariate analyses of variance, the main effect of municipality was significant in structuring morphological variation, but neither elevation nor the interaction between elevation and municipality were significant (Table 3). Comparison of the mean sum of squares (Anderson, 2001) suggests that municipality has a stronger effect than elevation in structuring maize populations (Table 3). Nonetheless, there are some differences between common gardens. The effect of municipality was weaker in the low elevation garden, perhaps because of the lower overall morphological variation observed.

Table 3 Permutational multivariate analysis of variance on Euclidean distances matrices for plant morphology traits for each common garden (M-M, Ch-L)a

Maize population structure based on molecular markers

Results from software STRUCTURE suggested the existence of two clusters defined by municipality (Figure 3). The highest value of ΔK was found at K=2, but at higher values of K the Mixtec area shows separation between villages. Chatino villages are not similarly separated. Most individuals show evidence of admixture, and there is no correspondence between maize race name and STRUCTURE results. The only exception is a sample from M-M of the Conejo race, which at higher values of K (4) forms a stable cluster with low admixture. Samples never cluster by elevation.

Figure 3
figure 3

STRUCTURE graphical results assuming two (a), three (b) and four (c) groups, after a burning period of 30 000 iterations and 1 000 000 replications for estimations. Each individual plant is represented by a vertical line. Each color represents the membership to each cluster (k). Labels in the x axis show the village of origin.

Our AMOVA finds relatively strong population structure (Table 4). Most of the genetic variation (73%) was found within populations, with less variation assigned to municipality (4%) and elevation (1.75%). The results of the AMOVA are confirmed by the matrix of pairwise genetic distance (FST) among villages, although all values are low. FST between villages of the same municipality (0.019 Mixtec, 0.021 Chatino) are lower than between populations of the same elevation but different municipalities (0.041 middle, 0.066 low). FST between villages from different municipalities and different elevations are also large (M-M vs Ch-L is 0.059, M-L vs Ch-M 0.045), but the largest difference is between Mixtec Lowlands and Chatino Lowlands (0.066). All FST values are significant (α=0.05) calculated after 1000 permutations.

Table 4 Genetic structure as revealed by AMOVA and FST

We tested population structure due to race by AMOVA. We confirmed our STRUCTURE results, finding no evidence of differentiation between racial groups (FCT=0.004, P-value=0.295), but there was important variation among populations within racial groups (FSC=0.108, P-value=0.000). According to SSR's racial grouping seems artificial (See Supplementary Appendix Table A.4).

Metapopulation model

We used the metapopulation model of van Heerwaarden et al. (2010) to estimate FST and theoretical pollen migration based on seed management and exchange from our surveys. Within villages, to obtain similar modeled values to FST to those calculated by SSR's, we had to assume similar pollen migration rates to those in the literature (Messeguer et al., 2006), with the exception of the M-M village where a very low pollen migration proportion (0.0089) was needed, the range in the other three villages was from 0.010 to 0.018 (Table 2). We next asked whether the model can be extended to villages within municipalities, and were able to find a reasonable fit of the model using pollen flow values within the range used for within-village comparisons (0.0112), supporting the idea that gene flow between villages of the same ethnolinguistic group is similar to that than within villages. In contrast, fitting the model to elevation groupings or the entire study region, required much lower pollen flow estimates: 0.0087 for lowlands, 0.0085 for middle elevation and 0.0083 for the region (Table 2).


Results from our comparison of genetic and morphological variation among maize landraces cultivated by Mixtec and Chatino farmers support the hypothesis that ethnolinguistic differences could shape diversity as much or more than the environment in this particular region. Plant morphological characteristics measured in the common gardens display considerable variation without strong clustering. Nevertheless, it is possible to see the effect of municipality but not elevation in structuring the population (Table 3). All statistical methods used to analyze molecular markers show greater support for structure because of social origin than environment (elevation).

In a previous paper, we have shown that adaptation of local landraces is not a primary reason for landrace distribution in the region. We found that landraces from Ch-L yield better in all four villages (Orozco-Ramírez et al., 2014). Other factors, besides environment and ethnicity, help determine landrace distribution, such as infrastructure (roads) and markets. From our ethnographic work, we know roads are relatively recent to the region. Previously, the region’s villages were connected by foot traffic (Orozco-Ramírez, 2014). From our surveys, we know of seed exchange among villages with no road access, and the average distance between those villages is 8.5 km. Seed exchange involving moving by both foot and vehicles occurred over an average distance of 70 km. Most external seed exchange happened among villages of the same ethnicity (65%, averaged over all villages) or with Spanish-speaking towns (31%). The distance by foot between villages of different ethnicities is actually smaller in some cases than the distance between villages of the same ethnicity having seed exchange, consequently we cannot argue that maize population differentiation is a consequence of ease of transport or access. In relatively rare instances (4%), seed exchange occurred among villages of different ethnicities, suggesting that ethnicity matters to defining seed acquisition (Orozco-Ramírez, 2014).

Our results contrast with previous findings in the Oaxaca Valley (Pressoir and Berthaud, 2004a) and Chiapas (Perales et al., 2005) that suggest social origin only impacts morphological variation directly selected by farmers. Pressoir and Berthaud (2004a) argued that cultivation in different villages and farmer's selection contribute to morphological differentiation, but that pollen migration among populations reduces genetic separation. Similarly, Perales et al. (2005) found morphological but no genetic differentiation between neighboring ethnolinguistic groups. Their surveys found that a large majority (>70%) of farmers were interested in receiving seed from villages of a different ethnic group, suggesting that seed movement may explain the extremely low FST values they observed. Comparing highland and lowland maize samples from four states in east-central Mexico, van Heerwaarden (2007) reports genetic differentiation according to altitude but not according to social origin within altitudes. Most research, therefore, suggests gene flow is important among maize populations from different villages and that farmers’ selection is important in maintaining morphological differentiation. In contrast, we find modest morphological and genetic differentiation between ethnolinguistic areas (FCT=0.040) (Table 4) that are geographically quite close, which is larger than among village differentiation (0.003) reported by Pressoir and Berthaud (2004b) and by van Heerwaarden (2007) (0.026 for highlands and 0.027 for lowlands). Among these studies, our study focused on the smallest region followed by Pressoir and Berthaud (2004b) and van Heerwaarden (2007). Also, we found greater global differentiation, over all markers, (FST=0.111 (0.085–0.138, 95% confidence interval by bootstrapping)) (Table 2) than values reported by Pressoir and Berthaud (2004b) (FST=0.011±0.002, 95% confidence intervals by jackknifing). Our study was carried out in a region with no roads crossing it and with no common local or regional markets. This contrasts to the situation in the Central Valleys of Oaxaca, where roads and markets unify the region and exchange commonly occurs between villages. In contrast, our results suggest that ethnolinguistic differences could effectively isolate maize populations in this region that are otherwise under similar natural and artificial selection pressures. We conclude that, at least in some cases, ethnolinguistic affiliation can reduce gene flow more than the environmental obstacles posed by altitude differences. We posit that ethnically based seed networks foster both morphological and genetic separation, an idea similar to that of Hernández (1972), who suggested that indigenous groups isolate maize populations in a way similar to geographic barriers. An important next step is to expand the research to a contiguous Zapotec municipality to improve the test of the effect of ethnicity in structuring maize populations.

Previous work has found that both genetic and morphological variation are strongly structured by elevation (Doebley et al., 1985; Benz, 1986; Bretting and Goodman, 1989; Vigouroux et al., 2008; van Heerwaarden et al., 2011). Research on the distribution of maize races in central Mexico (Perales et al., 2003) and Chiapas (Brush and Perales, 2007) found that maize races are distributed according to elevation, and common garden experiments suggest local adaptation to elevation (Mercer et al., 2008). van Heerwaarden (2007) showed close association between maize genetic structure and elevation at a regional scale in east-central Mexico, and genetic analyses find a significant impact of elevation on genome-wide diversity in both maize and its wild relative teosinte (Bradburd et al., 2013; Pyhäjärvi et al., 2013; Takuno et al., 2015). Contrary to these findings, we found no differentiation of maize populations by elevation (races, morphological traits and molecular markers), likely because of the much smaller geographic scale of our population sampling.

Models of metapopulation structure based on our survey data support a possible role for ethnicity in patterning genetic diversity in our study area. The metapopulation model of van Heerwaarden et al. (2010) is able to fit FST values within most villages using pollen migration values similar to direct estimates reported in the literature (Messeguer et al., 2006) (Table 2). The strikingly high FST and correspondingly low pollen migration required to fit the model in M-M was because of the presence of a highly distinct early maturing landrace in part of the village (Figure 3); because of differences in flowering time, we hypothesize there is likely very little pollen migration between this landrace and others. The model also fits well observed FST between ethnolinguistic groups using pollen migration parameters within the range seen in individual villages (Table 2). Within elevation regimens (between ethnolinguistic groups) or among all villages, however, the model was only able to fit observed FST values with a substantial reduction in the pollen migration parameter or with a smaller reduction in pollen migration and setting seed migration frequency to zero. Because of the good fit of the model within villages and between villages within an ethnic group, the decrease in migration (pollen or seed) required to fit the model to elevation groups or the entire data is consistent with the idea that ethnolinguistic group is a limitation to maize gene flow in this region.

Previous studies that have found morphological differences among maize from different villages have not found much differentiation at the genetic level, suggesting that selection for a particular maize ideotype cannot explain the genetic differentiation observed in our villages. Instead, we suggest that a reduction of gene flow by limited seed and pollen migration among villages of different ethnolinguistic groups has effected genetic structure both in morphological traits and in genome-wide markers. We propose that detailed investigation of seed networks is an important next step to understanding the processes that pattern genetic diversity in maize.

Data archiving

Data available from the Dryad Digital Repository: