The origin of maize (Zea mays mays) in the US Southwest remains contentious, with conflicting archaeological data supporting either coastal1,
Documenting ancient diffusion routes of domesticates and how they were modified when introduced into new regions has long been a challenge. For example, hybridization and gene flow have long confounded attempts to understand the origins of either indica rice8 in the Indian subcontinent or maize in southern Mexico9. The origin and adaptation of maize in the US Southwest is a similarly difficult case. Following its initial domestication from the wild grass teosinte in southern Mexico10,11, maize diffused throughout the Americas, spreading through much of the continental United States after its introduction to the Southwest around 4,100 calendar years before present (BP)7. There has been considerable debate about the arrival of maize into the Southwest, however, as early archaeological samples suggested a highland route5,6, whereas more recent samples1,2 and morphological similarity to extant Mexican maize support a lowland, Pacific coast route3,4. And while temporal variation in Southwest maize cob morphology has been described2, the genetic changes responsible for adaptation to the Southwest environment during the last 4,000 years are still uncharacterized.
In order to resolve questions about the diffusion of maize into the Southwest as well as to track genetic changes in Southwest maize through time, we sampled DNA from archaeological specimens dating to around 4,000–3,000, 2,000 and 750 BP (SW3K, SW2K and SW750 hereafter), as well as four ancient Mexican samples dating to around 5,910 BP, 5,280 BP and 1,410 BP (Table 1) and a single modern open-pollinated highland Mexican maize accession (Supplementary Table 5). We generated sequence data from ancient samples using a hybridization target capture approach that was enriched for the exons of 348 genes (depth of covered sites ∼10× on target and ∼2× elsewhere; selection criteria are in Supplementary Tables 8, 9 and 11); our modern highland sample was sequenced using a whole-genome shotgun approach. To these data we added published sequence data from an additional ancient sample from Mexico12 and modern samples of teosinte subspecies, Zea mays parviglumis and Zea mays mexicana, as well as Southwest and Mexican maize13.
Comparison of shared derived alleles between ancient Southwest samples and the Mexican highland landrace Palomero de Jalisco or the Mexican lowland landrace Chapalote using D statistics14 argues for a highland origin of the earliest Southwest maize (SW3K; Fig. 1a), consistent with low-density single nucleotide polymorphism data15 from a sample of more than 2,000 modern maize landraces and teosinte (Supplementary Fig. 6). In contrast, values of D in SW2K support gene flow from Chapalote (Fig. 1a). TreeMix16 also identifies introgression from lowland maize to the SW2K population (Fig 1b) and agrees with previous evidence for introgression from the teosinte Z. mays. mexicana into Mexican highland landraces17. Finally, admixture analysis (Fig. 1c, and Supplementary Fig. 5) reveals evidence of teosinte admixture in all ancient Southwest maize. As there is no history of teosinte in the Southwest, this is consistent with a highland origin. Assignment to the group that includes the lowland samples Chapalote and Reventador, however, increases in the SW2K and SW750 samples; we interpret the lack of observed admixture with teosinte or Mexican maize in the extant Southwest Santo Domingo landrace (USA17) to be a result of recent extensive genetic exchange with other American landraces (Supplementary Fig. 5). Together, these results argue for a complex origin of Southwest maize, originally entering the United States via a highland route by 4,000 BP and subsequently receiving gene flow from lowland maize via the Pacific coastal corridor starting around 2,000 BP.
Maize was faced with a number of environmental challenges upon arrival in the Southwest, from extreme aridity to new dietary preferences7. Our population-level samples corresponding to temporally distinct occupations of the same cave site (Tularosa cave: SW2K, n = 10; SW750, n = 12), combined with published genomic data of the maize progenitor Z. m. parviglumis (Supplementary Table 4), allow us to distinguish evidence for these more recent adaptations from selection that occurred during maize domestication. We first used the population branch statistic PBS18 to identify genes with the highest dissimilarity between teosinte and our ancient Southwest landraces (Fig. 2a). These genes were likely to be early targets of maize domestication that preceded arrival in the Southwest. Many of these genes also show a very negative Tajima's D, consistent with the effects of strong selection (Fig. 2a), and seven of the top ten genes (Supplementary Table 1) are located in previously identified selected regions19. The top gene, zagl1, corresponds to a MADS-box transcription factor associated with shattering, a key domestication feature strongly selected for by human harvesting20. Several other genes are also well known for their roles in domestication: ba1 has a major role in the architecture of maize21, zcn1 and gi are associated with the regulation of flowering20,22 and tga1 controls the change from encased to exposed kernels23.
Comparison of the ancient maize population samples from Tularosa cave then let us assess changes between 2,000 and 750 years BP, a period of ongoing adaptation to the Southwest. Median values of Tajima's D in the SW750 population are higher than in the SW2K (Supplementary Fig. 8 and Supplementary Table 2), consistent with model-based estimates suggesting a smaller effective population (Supplementary Fig. 9). Nonetheless, we find several genes showing evidence of selection. The top PBS outlier in the SW750 population is a dehydration-responsive element-binding protein shown to be upregulated as much as 50-fold in maize roots under drought conditions24, perhaps a signature of adaptation to arid Southwest conditions (Supplementary Fig. 10). Analysis of genes in the starch biosynthesis pathway provides perhaps the best example of the power of our population-sampling approach. While the reduction of diversity at ae1 is seen in all Southwest maize, consistent with selection during domestication, diversity at sugary1 (su1) is reduced more than 60% between the SW2K and SW750 populations (Fig. 3). su1 also shows an elevated PBS and a negative Tajima's D (Fig. 2) consistent with strong selection. The timing of selection on su1 appears to correlate with a shift towards larger cobs and floury kernel endosperm in archaeological maize around 800–1000 AD2. Both ae1 and su1 affect the structure of amylopectin25, which is involved in the pasting properties of maize tortillas and porridge26. Furthermore, it has been shown that storing non-structural carbohydrates can be beneficial in a drought scenario, consistent with adaptation to the Southwest climate27. The su1 mutation with the highest allele frequency difference between SW2K and modern individuals (Supplementary Fig. 3) is known to cause the partial replacement of starch by sugar in sweetcorn28. Several Native American tribes grew sweetcorn before the arrival of Europeans and the high frequency of a su1 mutation in Southwest maize could help explain the early appearance and maintenance of sweetcorn varieties by Native Americans.
The study of domestication and early crop evolution has largely been limited to the identification of key phenotypic, morphological and genetic changes between extant crops and their wild relatives. As demonstrated here, the application of new paleogenomic approaches to well-documented temporal sequences of archaeological assemblages opens a new chapter in the study of domestication: it is now possible to move beyond a simple distinction of ‘wild’ versus ‘domesticated’29,30 and track sequence changes in a wide range of genes over the course of thousands of years.
Twenty-five archaeological maize cob samples from the Southwest United States dating from 4,300 to 740 years BP, three from Mexico dating from 5,910 to 1,410 BP, and four ancient Arica samples were obtained from the repositories and individuals listed in Supplementary Table 7 following established policies and procedures for destructive sampling. In addition, previously published sequence data12 corresponding to an ancient sample from Mexico, was also used (Supplementary Table 7).
With the exception of the Turkey House Ruin sample, all of the archaeological cob samples from the Southwest United States and Mexico were recovered from dry cave contexts, and the Chilean (Arica) samples came from the dry desert coast of South America. All of the archaeological samples were desiccated, uncarbonized and in an excellent state of preservation. The cobs recovered from sites in the Southwest United States fall into two distinct morphological and temporal categories. These two temporally separated and morphologically distinct forms of maize correlate quite closely with the structural analysis groupings based on aDNA. The early southwestern maize, including samples from McEuen and Bat Caves, and from the early occupation at Tularosa Cave (1,850–1,750 BP), variously labelled as ‘Chapalote’ or ‘small cob maize’4 is a small cob, small kernel form having a thick midsection (1.9–2.5 cm diameter) and tapered ends (Pineapple shape) and 10–12 rows of kernels. The maize from the later occupation at Tularosa Cave (700–900 BP), as well as the Turkey House Ruin sample (670 BP), is a larger cob, larger kernel form, having parallel sides (cylinder shape), eight to ten rows of kernels, and a much smaller diameter than the earlier form (1.3–1.6 cm) (Table 1).
Data for modern samples (maize landraces, Z. m. parviglumis and tripsacum) were obtained from the HapMap2 set and downloaded from Panzea's website (www.panzea.org). Additionally, we generated shotgun data from an individual from the highlands of northern Mexico. Information about modern samples can be found in Supplementary Tables 4 and 5.
Reads mapping to the target regions were extracted from HapMap2 bam files and remapped and filtered in the same way as the ancient maize samples (Supplementary Table 4).
Target selection and bait design
A total of 348 genes were targeted: 318 genes were chosen because their similarity to sorghum was between 70% and 95% (a conservation level that is indicative of high functional relevance, and avoiding genes that are potentially invariable in maize), and they had some kind of functional annotation (Supplementary Table 9). The other 30 genes have been suggested to have an important role in traits selected during maize domestication20,22,31,32 (Supplementary Table 8). Maize gene sequences were downloaded from ENSEMBL (annotation version ZmB73_5b). An extra 120 base pairs (bp) flanking region was added to each bait; 120 bp probes were designed with 20 bp tiling, resulting in a final number of 53,063 probes.
Archaeological maize remains were processed at a dedicated clean laboratory facility at the Centre for GeoGenetics, University of Copenhagen. All steps prior to library amplification were conducted in an isolated laboratory that utilizes nightly UV radiation and air filtration systems to avoid contamination, thereby conforming to the requirements of aDNA research33.
To minimize modern DNA contamination, maize kernels were washed in 5% commercial bleach solution (NaClO) and rinsed in molecular grade water before extraction. Maize cobs could not be washed with bleach because they would absorb the solution, potentially leading to degradation of endogenous DNA. Instead, sterile scalpels were used to remove the external surface of cobs to expose material with presumably lower levels of contamination. Maize kernels were pulverized using a sterilized hammer and maize cob samples were sliced into fine slivers using a sterile scalpel. Either one kernel or ∼0.1 g of cob shavings were used for an extraction.
DNA extractions were conducted according to an established protocol originally designed for extracting DNA from ancient hair samples34, but which has also been applied to ancient grape pips and maize12,35. Recent testing has demonstrated the method generally outperforms other extraction techniques for a broad range of archaeobotanical remains, including maize cobs and kernels36. Pulverized samples were placed in 750 µl of extraction buffer (850 µl for cobs), as described previously12, and incubated overnight at 55 °C. The following day, a phenol and chloroform extraction was conducted, followed by purification in Qiagen MinElute silica spin columns.
Library construction and amplification
DNA extracts were converted to Illumina-compatible DNA libraries using NEBNext library building kits for second-generation sequencing (New England Biolabs, Ipswich, MA; catalogue numbers: E6070L, E6090S). Libraries were prepared according to manufacturer's directions, except that no DNA size selection or fragmentation steps were undertaken.
Libraries were amplified with either Phusion High-Fidelity PCR Master Mix (Thermo Fisher Scientific, Waltham, MA) or AmpliTaq Gold (Life Technologies, Carlsbad, CA). Libraries constructed in the later phases of the project were always first amplified using AmpliTaq Gold to incorporate molecules with damaged nucleotides. Apparent C to T transitions at the 5′ and 3′ ends of aDNA molecules resulting from the paring of adenine with deaminated cytosine (uracil) can thereby be used to investigate for characteristic aDNA damage patterns and help authenticate the presence of endogenous aDNA37. Nonetheless, libraries amplified during the earlier phases of the project were overall similar to those amplified with AmpliTaq Gold, and therefore should not lead to biases in analyses. Libraries were amplified 12–18 initial cycles, depending on the sample.
To reach DNA concentrations required for in-solution hybridization captures, libraries were amplified again, using a subset of the first amplification. These second amplifications were exclusively done with Phusion High-Fidelity PCR Master Mix because the polymerase replicates DNA with higher fidelity than AmpliTaq Gold, thereby reducing erroneous sequence polymorphisms. The second amplifications were conducted using 10–18 cycles. When necessary, libraries were size selected on a 2% agarose gel to remove adapter dimers. Libraries were characterized on a Qubit 2.0 fluorometer (Life Technologies) and Agilent 2100 Bioanalyzer (Santa Clara, CA).
Enrichment of relevant genetic loci38 was conducted using a custom-designed MYBait-3 target enrichment kit (MYcroarry, Ann Arbor, MI; 120 bp length RNA baits). The manufacturer of the kit recommends 100–500 ng of amplified library to be used for a capture, and all were performed at the higher end of this range, generally 300–500 ng of DNA. Libraries were hybridized for 24 hours at 65 °C in an Applied Biosystems Veriti thermal cycler (Life Technologies) using a heated lid to prevent condensation. Following hybridization with RNA probes, the samples were processed according to the manufacturer's protocol. Post-capture amplification was done with Phusion High-Fidelity PCR Master Mix, using 12–18 cycles. Samples were sequenced on an HiSeq 2000 in the single read 100 bp mode, three samples per lane.
This procedure resulted in a depth within the target regions of around 10×, a fivefold increase relative to other sites in the genome (Table 1).
See the Supplementary Information for more Methods.
The authors acknowledge the following grants: Marie Curie Actions IEF 272927 and COFUND DFF-1325-00136, Danish National Research Foundation DNRF94, Danish Council for Independent Research 10-081390 and 1325-00136, Lundbeck Foundation grant R52-A5062, Vand Fondecyt Grant 1130261, a grant from the UC Davis Genome Center for the highland maize sequence and NSF IOS-1238014. R.F. is supported by a Young Investigator grant (VKR023446) from Villum Fonden. P.S. was funded by the Wenner-Gren foundation. The authors thank Ângela Ribeiro, Shohei Takuno and Philip Johnson for comments and discussion and staff at the Danish National High-Throughput DNA Sequencing for technical support.