Abstract
The domestication of all major crop plants occurred during a brief period in human history about 10,000 years ago1. During this time, ancient agriculturalists selected seed of preferred forms and culled out seed of undesirable types to produce each subsequent generation. Consequently, favoured alleles at genes controlling traits of interest increased in frequency, ultimately reaching fixation. When selection is strong, domestication has the potential to drastically reduce genetic diversity in a crop. To understand the impact of selection during maize domestication, we examined nucleotide polymorphism in teosinte branched1, a gene involved in maize evolution2. Here we show that the effects of selection were limited to the gene's regulatory region and cannot be detected in the protein-coding region. Although selection was apparently strong, high rates of recombination and a prolonged domestication period probably limited its effects. Our results help to explain why maize is such a variable crop. They also suggest that maize domestication required hundreds of years, and confirm previous evidence that maize was domesticated from Balsas teosinte of southwestern Mexico.
Several lines of evidence indicate that maize is a domesticated form of the wild Mexican grass teosinte (Zea mays ssp. parviglumis or spp. mexicana)3, 4, 5. Archaeological evidence places the time of maize domestication between 5,000 and 10,000 BP6. Despite the recent derivation of maize from teosinte, these plants differ profoundly in morphology5. One major difference is that teosinte typically has long branches with tassels at their tips whereas maize possesses short branches tipped by ears. Genetic analyses have identified teosinte branched1 (tb1) as the gene that largely controls this difference2. Recent cloning of tb1 ( ref. 7) provides the first opportunity to examine the effects of selection on a 'domestication gene' and to infer from these effects the nature of the domestication process.
During development, tb1 acts as a repressor of organ growth in those organs in which its messenger RNA accumulates. Consistent with this interpretation, plants carrying the maize allele accumulate more tb1 mRNA in lateral-branch primordia and have shorter branches (that is, greater repression of branch elongation) than plants carrying the teosinte allele, which accumulate less tb1 mRNA and have longer branches7. This difference in message accumulation between the maize and teosinte alleles suggests that the evolutionary switch from teosinte to maize involved changes in the regulatory regions of tb1.
Domestication should strongly reduce sequence diversity at genes controlling
traits of human interest. To test this expectation for tb1, we sampled
a 2.9-kilobase (kb) region (Fig. 1) including most of
the predicted transcriptional unit (TU) and 1.1 kb of the 5'
non-transcribed region (NTR) from a diverse sample of maize and teosinte (Table 1). Two measures of genetic diversity were calculated:
,
the expected heterozygosity per nucleotide site, and, an estimate of 4
N e
, where N e is the effective population
size and
the mutation rate per nucleotide8. Within the
TU, maize possesses 39% of the diversity found in teosinte, which is not significantly
lower than that (71%) seen for the neutral gene, Adh1 (
Table 2). However, within the NTR, maize possesses only 3% of the
diversity found in teosinte. Thus, selection during domestication is associated
with strongly reduced diversity in the NTR where regulatory sequences are
typically found, but more modestly reduced diversity in the TU.
Figure 1: Predicted structure of teosinte branched1 (ref.25) and sliding-window
analysis of polymorphism (
) in maize and teosinte.
![Figure 1 : Predicted structure of teosinte branched1 (ref.25) and sliding-window
analysis of polymorphism (|[pi]|) in maize and teosinte. Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com](/nature/journal/v398/n6724/images/398236aa.eps.0.gif)
For the sliding-window analysis,
was calculated for segments of 300 bp
at 50-bp intervals. Sequences used in the analysis were the subset of the
-cloned
sequences for which we isolated the 3' end by PCR (
Table 1). The position of the Hin dIII (H) restriction endonuclease
sites used in
cloning are shown, as are the predicted exons (rectangles)
and coding region (stippled).
If tb1 contributed to the morphological evolution of maize, then in the tb1 phylogeny for maize and teosinte, maize sequences should form a single clade with only minor differentiation among them. Moreover, the type of teosinte most closely related to the direct ancestor of maize should be associated with the maize clade. In contrast, previous research with neutral genes not involved in maize evolution has shown that maize sequences for such genes are dispersed among multiple clades owing to the effects of lineage sorting9, 10, 11, 12, 13. A phylogeny for the tb1 TU fits the expectation of a neutral gene, with maize sequences falling into multiple clades (Fig.2a). However, the phylogeny for the NTR shows all maize on a single, well-supported clade (I) ( Fig. 2b), as predicted for a gene involved in maize evolution. There are five teosinte sequences tightly associated with the maize sequences and all belong to ssp. parviglumis. The teosinte (sample 16L) basal to clade I also belongs to ssp. parviglumis. This phylogeny and previous research14 provide compelling evidence that ssp. parviglumis (Balsas teosinte) is the progenitor of maize and suggest that maize arose in the Balsas river valley of southwestern Mexico, where this subspecies is native.
Figure 2: Neighbour-joining trees for tb1 based on the 1,729-bp transcribed region (a) and the 1,143-bp 5' non-transcribed reg.
![Figure 2 : Neighbour-joining trees for tb1 based on the 1,729-bp transcribed
region (a) and the 1,143-bp 5|[prime]| non-transcribed reg. Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com](/nature/journal/v398/n6724/images/398236ab.eps.0.gif)
ion (b). Taxa include maize, ssp. parviglumis (PAR), ssp.
mexicana (MEX) and the outgroup Z. diploperennis (DIP). Sample
numbers follow the taxon names. Scale bars indicate the number of substitutions
per site using Kimura's 2-parameter distances. Clade I was supported in 200
of 200 bootstrap resamplings of the original data. An initial analysis of
the 5' non-transcribed region using the 22 maize and teosinte
-cloned
sequences indicated that all maize alleles were derived from ssp. parviglumis
. To confirm whether this result would be sustained with a larger sample,
we isolated 16 additional sequences for a more comprehensive analysis (see
Methods).
Although the phylogenies and the relative amount of nucleotide variation in maize suggest that selection has acted on tb1, we collected data on nucleotide polymorphism specifically to determine whether tb1 has experienced a recent selective sweep. This determination was made using the HKA test15, in which the ratio of polymorphism within a species (maize) to divergence from an outgroup (Z. diploperennis) for tb1 was compared with this ratio for neutral genes. A recent selective sweep in maize would be expected to reduce this ratio for tb1 relative to neutral genes. The HKA test was not significant for the TU, indicating that there is no evidence for selection on the coding region ( Table 3). However, the test was highly significant for the NTR, indicating that selection has strongly reduced variation here. Remarkably, the HKA test for the NTR was significant even if the TU was used as the control. This shows that the 'hitchhiking' effect was so small that it did not even affect the entire gene.
The relative impact of selection on the NTR and TU can be readily seen
in a plot of polymorphism (
) for maize and teosinte along the length of
tb1 (Fig. 1). Throughout the NTR,
is substantially
lower in maize than in teosinte, reflecting the impact of selection. At the
boundary between the NTR and the TU,
for teosinte drops precipitously,
as expected, because there is greater constraint on coding regions; however,
for maize rises until it is nearly equal to
for teosinte. Finally, approaching
the stop codon at the 3' end of the gene,
rises steeply in both
maize and teosinte, reflecting reduced constraint in this non-translated region. Figure 1 shows graphically how the impact of selection on
polymorphism was narrowly focused on the NTR.
As further evidence that regulatory changes in the NTR rather than changes
in protein function were involved in maize evolution, we examined the predicted
amino-acid sequence of our maize and teosinte sequences. Because our maize
and teosinte
clones did not include the 3' end of the TU unit,
we isolated and sequenced an additional segment spanning the end of the
clones to
170 base pairs (bp) downstream of the stop codon (
Table 1). Over the entire coding region, there are no fixed differences
in the predicted amino-acid sequences between maize and teosinte.
Our analysis of nucleotide polymorphism in tb1 provides compelling evidence that selection during maize domestication was aimed at the NTR, where regulatory elements are typically found. We had previously observed that the tb1 mRNA for the maize allele accumulates at twice the level of that for the teosinte allele and proposed that changes in tb1 regulation underlie maize evolution7. Combined evidence from polymorphism analysis and previous work on tb1 mRNA levels are thus congruent, providing strong evidence that the short, ear-tipped lateral branches of maize evolved from the long, tassel-tipped branches of teosinte by human selection for novel regulatory elements in the NTR.
Although our data implicate selection on regulatory sequences during maize evolution, we found no fixed differences between maize and teosinte within the 1.1 kb of NTR that we analysed. In fact, some maize and teosinte sequences (for example, maize 1P and parviglumis 18L) are nearly identical within this region, differing by only 1 bp in the length of a poly(A) track. This may indicate either that the selected site lies further upstream or that the differences between maize and teosinte are complex and depend on a group of polymorphisms rather than a single site16. Recombination between an upstream selected site and the region we sequenced could explain why maize is not fully separated from teosinte in the phylogeny ( Fig.2b).
Selection intensities during domestication are expected to be high because
crop evolution involves dramatic changes in morphology within a short time.
Under directional selection, the selection coefficient (s) is measured
as the difference in relative fitness of the most fit and least fit genotypes,
where fitness is the contribution of a genotype to the next generation. For
example, s = 0.01 would indicate that 100 maize alleles
would be passed to the next generation for every 99 teosinte alleles. A rough
estimate of s requires a knowledge of the recombination rate (c
, crossovers per bp per generation) and the distance (d) in bp
from the selected site over which there has been a substantial reduction in
nucleotide variation17:

For
maize, recombination rates have been empirically measured for several genes,
giving a mean value for c of
4
10-7
(refs 18–20). Two observations
allow a preliminary estimate of d : first, the substantial reduction
in nucleotide variation is restricted to the NTR or promoter and does not
extend into the TU, and, second, plant promoters are normally 2 kb
or less in size (see Methods). Thus, the selected site is likely to be less
than 2 kb from the TU and must be at least 1.1 kb from the TU
since it does not appear to lie in the 1.1 kb of NTR that we sequenced.
Using these values, s is estimated to be between 0.04 and 0.08. This
estimate can be refined in the future by obtaining direct estimates of
c in tb1 and identifying the precise position of the selected site.
When s is known, one can estimate the time (T f) required to bring the maize allele to fixation22. We assume that the initial frequency of the maize allele was 1/2N, where N is the population size, and that gene action was additive2. We considered two population sizes during the time of selection: 1,000, which assumes teosinte was grown like a horticultural crop in gardens, and 100,000, which assumes it was grown like an agriculture crop but still over a limited geographical area. We assume values for s of 0.04 and 0.08 (see above). For these values, T f ranges from 315 to 1,023 years. Thus, the morphological evolution of maize as controlled by tb1 could have been rapid, over just several hundred years.
We were surprised that maize remained polymorphic for tb1, even within the NTR. To assess whether the observed level of variation at tb1 in maize is consistent with previous estimates of mutation and recombination rates, population sizes and the time of maize domestication, we carried out coalescent simulations23. The simulations included a selective sweep modelled on a range of estimates for T s (time since the selective sweep) and s (Table 4). To measure the effect of a selective sweep so that it reflects the present context of not knowing the actual site of selection, we measured the longest segment with zero, one or two polymorphic sites. The simulated mean values of these lengths are remarkably close to the observed data. Thus, even for genes under strong selection, domestication need not remove all variation. The ability of maize to remain polymorphic at tb1 probably reflects high recombination rates over the hundreds of years required to bring the maize allele to fixation such that there was substantial recombination between the allele that initially harboured the selected site and other alleles in the population. By this means, considerable polymorphism was maintained in the coding region in the face of strong selection on the NTR.
Table 4: Results from coalescent simulations of the effect of the selective sweep during maize domestication on polymorphism in tb1
Population-genetic analysis of domestication genes can provide a new view of the processes that sculpted the formation of crop species. For tb1, such analysis indicates that ancient agriculturalists exerted a strong selective force on tb1 that has drastically reduced polymorphism in its regulatory region but not in its coding region. This observation is consistent with previous evidence that alterations in the regulation of tb1 brought about the change from teosinte to maize plant architecture. We also infer that it took at least several hundred years to bring the maize allele of tb1 to fixation. Finally, these analyses indicate that Balsas teosinte is the ancestor of maize, since all maize alleles sampled show a close and statistically robust phylogenetic association with this teosinte.
Methods
Gene isolation. All sequences for population-genetic analysis were
cloned into
-ZAP (Stratagene) as Hin dIII fragments (
Fig. 1). Isolation of the 3' end ofthe gene for the sliding-window
analysis (Fig. 1) and determination of thecomplete amino-acid
sequence were accomplished by the PCR reaction(primers: TAGTTCATCGTCACACAGCC
and CAATAACGCACACCAGGTCC). PCR was performed using PCR Supermix (Life Technologies)
with one step of 4 min at 95 °C followed by 30 cycles of
1 min at 95 °C, 1 min at 60 °C and 3 min
at 72 °C followed by 10 min at 72 °C. PCR products
were cloned using the TOPO TA cloning kit (Invitrogen). Additional sequences
of the NTRfor phylogenetic analysis (Fig. 2) were isolated
by PCR (primers: GCTATTGGCTACAAGTGACC and GGATAATGTGCACCAGGTGT). All sequences
were deposited in GenBank (accession nos AF131649 to AF131705).
Statistics. Calculation of the range of reasonable values for
s requires an estimate of the position of the selected site relative to
the TU. The selected site is expected to lie within the NTR region in which
regulatory sequences occur. Such regions typically extend 2 kb or less
upstream of the TU in plants. For example, average gene density in Arabidopsis
is one gene for every 4.8 kb24. With an average
gene being about 2.5 kb long, this leaves about 2 kb for flanking
regulatory sequences. Moreover, many reports in the literature reveal that
5' regulatory sequences are usually within 1 kb of the transcription
start site. For the coalescent simulations, the middle of the 4,000-bp sampled
chromosomes was set as the point of a selective sweep. For the moderate values
of s modelled, the use of deterministic allele frequency change will
closely follow a stochastic selective sweep17. The population
mutation rate (
) was set to 0.0262 per bp, which is the estimated
value from teosinte in the tb1 NTR. The effective population size was
set to the value (700,000) estimated for maize at Adh1, based on estimates
of
and
for Adh1 (ref. 9).
The recombination rate (c) was set to 4
10-7
as described in the text. Sample size was 12, the same as the number of maize
clones, and 200 runs were performed.

^) per bp (
