Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass

Lovell, John T.; MacQueen, Alice H.; Mamidi, Sujan; Bonnette, Jason; Jenkins, Jerry; Napier, Joseph D.; Sreedasyam, Avinash; Healey, Adam; Session, Adam; Shu, Shengqiang; Barry, Kerrie; Bonos, Stacy; Boston, LoriBeth; Daum, Christopher; Deshpande, Shweta; Ewing, Aren; Grabowski, Paul P.; Haque, Taslima; Harrison, Melanie; Jiang, Jiming; Kudrna, Dave; Lipzen, Anna; Pendergast, Thomas H.; Plott, Chris; Qi, Peng; Saski, Christopher A.; Shakirov, Eugene V.; Sims, David; Sharma, Manoj; Sharma, Rita; Stewart, Ada; Singan, Vasanth R.; Tang, Yuhong; Thibivillier, Sandra; Webber, Jenell; Weng, Xiaoyu; Williams, Melissa; Wu, Guohong Albert; Yoshinaga, Yuko; Zane, Matthew; Zhang, Li; Zhang, Jiyi; Behrman, Kathrine D.; Boe, Arvid R.; Fay, Philip A.; Fritschi, Felix B.; Jastrow, Julie D.; Lloyd-Reilley, John; Martínez-Reyna, Juan Manuel; Matamala, Roser; Mitchell, Robert B.; Rouquette, Francis M.; Ronald, Pamela; Saha, Malay; Tobias, Christian M.; Udvardi, Michael; Wing, Rod A.; Wu, Yanqi; Bartley, Laura E.; Casler, Michael; Devos, Katrien M.; Lowry, David B.; Rokhsar, Daniel S.; Grimwood, Jane; Juenger, Thomas E.; Schmutz, Jeremy

doi:10.1038/s41586-020-03127-1

Download PDF

Article
Open access
Published: 27 January 2021

Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass

Nature volume 590, pages 438–444 (2021)Cite this article

29k Accesses
114 Citations
190 Altmetric
Metrics details

Subjects

Abstract

Long-term climate change and periodic environmental extremes threaten food and fuel security¹ and global crop productivity^2,3,4. Although molecular and adaptive breeding strategies can buffer the effects of climatic stress and improve crop resilience⁵, these approaches require sufficient knowledge of the genes that underlie productivity and adaptation⁶—knowledge that has been limited to a small number of well-studied model systems. Here we present the assembly and annotation of the large and complex genome of the polyploid bioenergy crop switchgrass (Panicum virgatum). Analysis of biomass and survival among 732 resequenced genotypes, which were grown across 10 common gardens that span 1,800 km of latitude, jointly revealed extensive genomic evidence of climate adaptation. Climate–gene–biomass associations were abundant but varied considerably among deeply diverged gene pools. Furthermore, we found that gene flow accelerated climate adaptation during the postglacial colonization of northern habitats through introgression of alleles from a pre-adapted northern gene pool. The polyploid nature of switchgrass also enhanced adaptive potential through the fractionation of gene function, as there was an increased level of heritable genetic diversity on the nondominant subgenome. In addition to investigating patterns of climate adaptation, the genome resources and gene–trait associations developed here provide breeders with the necessary tools to increase switchgrass yield for the sustainable production of bioenergy.

Genomic divergence during feralization reveals both conserved and distinct mechanisms of parallel weediness evolution

Article Open access 10 August 2021

Preadapted to adapt: underpinnings of adaptive plasticity revealed by the downy brome genome

Article Open access 27 March 2023

Genomic signatures of seed mass adaptation to global precipitation gradients in sorghum

Article Open access 17 July 2019

Main

Switchgrass (P. virgatum) is both a promising biofuel crop and an important component of the North American tallgrass prairie. Historically, tallgrass prairies were one of the largest temperate biomes on Earth, and they remain important sinks for atmospheric carbon^7,8. However, most extant natural switchgrass populations are restricted to ‘relic’ sites, which represent crucial but dwindling genetic resources for the future conservation and breeding of tallgrass prairie.

Biomass production is the principal breeding target for switchgrass as a forage and bioenergy crop⁹ and is a strong proxy for seed production and evolutionary fitness¹⁰. Since the US Department of Energy named switchgrass a model herbaceous biofuel feedstock, biomass yield trials have demonstrated the economic viability of switchgrass bioenergy production, and cultivars have been bred that substantially out-produce maize and other cellulosic feedstocks¹¹. However, individual cultivars tend to be productive across only a narrow climatic niche. Therefore, to maximize gains, switchgrass breeding and biotechnology should focus on developing climate–genotype matches^12,13 through the identification of the genomic basis of biomass accumulation and climate adaptation in breeding panels. This will bolster future yields¹⁴ and cement switchgrass as an economically and environmentally sustainable bioenergy product.

The tetraploid switchgrass genome

Although abundant quantitative genetic variation underlies climate-associated stress tolerance and biomass production^15,16, the fragmented and incomplete nature of previous switchgrass genome sequences have impeded the discovery of candidate genes and other molecular breeding efforts. The genome of the AP13 switchgrass genotype is large (haploid genome size = 1,129.9 megabases (Mb)), repetitive (56.9% repeats) (Fig. 1a, Extended Data Fig. 1) and polyploid. In contrast to some other outcrossing species such as maize (which is represented by the inbred B73 reference genome), AP13 is outbred. Its genome retains a commensurate level of heterozygosity within the range of naturally outcrossing populations (Extended Data Fig. 1). Despite this complexity, our deep PacBio long-read sequencing coupled with deep short-read polishing and bacterial artificial chromosome (BAC) clone validation produced a highly contiguous ‘v5’ AP13 genome assembly (Extended Data Fig. 1; data are available from Phytozome at https://phytozome-next.jgi.doe.gov). We pruned the resulting large contigs (N₅₀ = 5.5 Mb) to a single representative haplotype, and then oriented and ordered into chromosome pseudomolecules using the consensus of two high-density genetic maps (Supplementary Data 1). Chromosomes were assigned to subgenomes via genetic distance to Panicum rudgeii¹⁷ (the sister taxa to the K subgenome of P. virgatum), and via de novo repeat clustering. The final assembly contains only 0.4% gaps, a 75-fold decrease relative to a previous v4 release from 2016 (https://phytozome-next.jgi.doe.gov/info/Pvirgatum_v4_1). Importantly, the genome assembly was co-linear with three sources of genetic information, despite being assembled independently from all three: the assembly of a close diploid relative (Panicum hallii), the marker order of a pseudo-F₂ genetic map and the gene order of the alternative subgenome (Fig. 1a, Extended Data Fig. 1, Supplementary Data 2). These co-linearities demonstrated that we have developed a single haploid assembly and annotation for each subgenome.

**Fig. 1: The structure and evolution of the subgenomes of tetraploid switchgrass.**

Crucially, we were able to distinguish gene and repeat sequences between the two subgenomes. The gene annotation—which is derived from Illumina RNA sequencing (n_libraries = 88, n_conditions = 18, >3 billion reads) and PacBio Iso-Seq (n_conditions = 9, > 4.5 million reads, Supplementary Data 3)—encompasses 80,278 primary and 49,664 alternative transcripts and is as complete as the genome assembly (BUSCO = 99.4%) (Extended Data Fig. 1). We leveraged these annotations to build multiple sequence alignments and time-scaled phylogenetic trees, which date subgenome–progenitor species divergence to about 6.7 million years ago (Ma). Long-terminal repeat sequence analysis of subgenome-specific proliferation of retrotransposons sets an upper bound of the polyploidy event that formed switchgrass at ≤4.6 Ma (Fig. 1b), which indicates that tetraploid switchgrass arose during the Pliocene, or the glacial–interglacial cycles of the early Pleistocene epoch.

Climate adaptation drives biomass yield

Although there are two reproductively isolated¹⁸ switchgrass cytotypes (tetraploid (4×) and octoploid (8×)), tetraploids represent the majority of cultivars¹⁹ and span a broader geographical range than octoploids²⁰. To investigate the genetic basis of climate adaptation, stress tolerance and biomass production, we therefore developed a diversity panel of 732 exclusively tetraploid genotypes (Supplementary Data 4). We clonally propagated and transplanted this panel in up to 10 common gardens that spanned 1,862 km of latitude, from southern Texas to South Dakota (USA) (n_plants = 5,521) (Fig. 2a) and resequenced each genotype via deep (median = 59×) coverage 2 × 150-bp paired-end PCR-free Illumina libraries. Importantly, resequencing coverage was not biased towards either subgenome (likelihood ratio test χ² = 1.32, degrees of freedom = 1, P = 0.25). Our resequencing yielded 33.8 million single-nucleotide polymorphisms (SNPs) (minor allele frequency ≥ 0.5%) mapped against the genome. We also de novo-assembled a 252-genotype subset of these deeply resequenced libraries and called presence–absence and structural variants (for example, 100–1,500-bp insertions and deletions) on the resulting contigs. To connect trait and molecular variation with climate, we extracted 46 climate variables^21,22 from the georeferenced collection location of each genotype and clustered these data into seven groups that explained the majority of climatic variation across the diversity panel (Extended Data Fig. 2).

**Fig. 2: Climatic adaptation within and among switchgrass ecotypes.**

Climate-associated adaptation in switchgrass has previously been hypothesized to underscore divergence between northern upland and southern lowland ecotypes and is exemplified by divergent leaf and whole-plant morphologies^{13,23,24,25,26}. In silico classification from morphological data, coupled with ecotype assignments by experts across our diversity panel (Supplementary Data 5), revealed upland (n = 268), lowland (n = 99) and a third, coastal ecotype (n = 184). The coastal ecotype was broadly sympatric with the lowland ecotype but displayed upland leaf characters and lowland plant architecture (Fig. 2a, Extended Data Fig. 2).

We observed strong evidence that adaptive evolution has contributed to ecotype divergence. Whereas winter-kill mortality was rare among northern upland plants (2.4%), nearly half of all coastal (42.1%) and lowland (42.8%) genotypes perished during the winter of 2018–2019 across the 4 northernmost gardens (Fig. 2b). Winter kill was especially severe in the three northwestern plains sites, probably owing to a period of severe cold from late January to early March 2019 (Extended Data Fig. 2). In total, genotypes from the northern 30% of the panel were 218× (Fisher’s test odds ratio = 218.17, P < 1 × 10⁻¹⁵) more likely to survive the winter of 2018–2019 in the northern 4 sites than the southernmost 30% of the genotypes.

The latitude gradient across our common gardens also served as the major axis of biomass variation. Among the seven groups of correlated climatic variables, the strongest predictors of biomass variation were always related to temperature (Extended Data Fig. 2). We observed particularly strong signals of extreme 30-year-minimum temperature as a predictor of biomass in the winter-kill-susceptible lowland and coastal ecotypes (Fig. 2c). For both ecotypes, genotypes collected from sites with colder historical extreme minimum temperatures out-performed genotypes from sites with a milder climate in the northern gardens. However, no climate-of-origin-dependent trade-off was observed in the winter-kill-tolerant upland ecotype. It is possible that a more intensely cold winter than that of 2018–2019 could introduce differential survival in the upland genotypes and produce a trade-off similar to that observed within the two more southern ecotypes. These results add support to our observation that susceptibility to cold temperatures acts both as an agent of natural selection and as a limiter of northern range expansion.

Furthermore, biomass yield for each genotype was generally maximized in the gardens with climates that were most similar to their collection locations (Fig. 2d). As such, local adaptation is manifest not only through survival and stress tolerance, but also through higher biomass accumulation in climates similar to those in which each genotype evolved.

Ecotype convergence among gene pools

Knowledge of the structure and diversity of gene pools within switchgrass is critical to projecting future gains from molecular breeding and understanding the genetic basis of climate adaptation^12,13. Several previous population genetic studies of switchgrass assumed that there should be strong correspondence between population genetic structure and the morphological clustering that is used to define ecotypes^20,27,28. Analysis of our 33.8-million genome-wide SNP database revealed that our diversity panel is strongly subdivided into three major genetic subpopulations that are, in general, geographically distinct (which we refer to as Midwest, Atlantic and Gulf) (total F_ST = 0.27) (Fig. 3a). The clustering of presence–absence and structural variants largely recapitulates SNP-based subpopulation structure (Extended Data Fig. 3), providing consistent evidence of subpopulation differentiation that may include large-effect mutations at several molecular scales.

**Fig. 3: Population and quantitative genomics of climate-associated adaptation.**

Population genetic structure was discordant with variation in morphological ecotype, which segregated strongly within genetic subpopulations. Plants with upland ecotype traits were present in both the Atlantic (37%) and Midwest (63%) gene pools. Similarly, 54% and 46% of coastal ecotype accessions were assigned to Atlantic and Gulf subpopulations, respectively (Fig. 3a). All plants with lowland morphology were clustered within the Gulf subpopulation. However, these Gulf lowland plants had approximately equal proportions of individuals that survived and perished during the northern winter (Fig. 2c). Thus, important genetic diversity for breeding was present within genetic subpopulations—a pattern that was validated through realized genetic gains of biomass and winter survival within several switchgrass breeding populations^29,30.

Despite ecotypic convergence among subpopulations, coalescent simulations dated the divergence of the subpopulations to the mid-Pleistocene epoch (>358,000 generations (0.7–1.4 Ma, assuming a 2–4 year generation time)) (Extended Data Fig. 3). Thus, extant switchgrass gene pools have been diverging for nearly half of the evolutionary history of polyploid switchgrass. In contrast to the deep sequence divergence among subpopulations, we observed very little molecular genetic differentiation between upland and coastal ecotypes within the Atlantic subpopulation (F_ST = 0.03), or between lowland and coastal ecotypes within the Gulf subpopulation (F_ST = 0.03) (Extended Data Fig. 3).

Admixture appears to be common between the Gulf and Atlantic subpopulations; comparisons of plants with coastal ecotype traits from both of these subpopulations were molecularly more similar (F_ST = 0.19) than for noncoastal Gulf and Atlantic plants (F_ST = 0.24). By contrast, the plants with upland morphologies in the Midwest and Atlantic subpopulations were no more similar than other plants from those subpopulations (F_ST for both = 0.30). This convergence of upland morphologies in two highly differentiated genetic subpopulations could be the result of independent genetic origins of the upland ecotype or rare but evolutionarily important³¹ admixture events. We evaluate these hypotheses below.

Genetic targets for yield improvement

To detect the genetic basis of climate adaptation and fitness within the diversity panel, we conducted multivariate adaptive shrinkage³² on genome-wide association mapping (GWAS) results within and across genetic subpopulations. Multivariate adaptive shrinkage shares GWAS peak effect size and direction between univariate tests to improve power to detect significant, shared results. Multivariate adaptive shrinkage results were determined for both fitness GWAS (which mapped winter survival and biomass in the three largest common gardens (MI, MO and TX₂)), and climate GWAS (which detected associations between SNP variation and the climate of origin (seven representative climate variables)). To make direct comparisons among subpopulations (which have different segregating SNPs), we summarized the 12,239 significant linkage-disequilibrium block ‘peaks’ of multivariate adaptive shrinkage (log₁₀-transformed Bayes factor > 2)³³ into 10,090 20-kb regions (20 kb represents the inflection point at which linkage disequilibrium decay flattens) (Extended Data Fig. 3) for climate (n_regions = 9,856) and fitness (n_regions = 332) GWAS (Supplementary Data 6). A weighted list of candidate genes—including putative SNP effects, the existence of presence–absence or structural variants, gene co-expression and physical proximity to the GWAS peaks—can be found in Supplementary Data 7.

GWAS peaks explained the majority of heritable phenotypic and climatic variation (SNP–heritability) both across and within gene pools (Fig. 3b). SNP–heritability of fitness (h²= 51.5 ± 15.4% (mean ± s.e.m.)) and climate-associated peaks (h² = 70.5 ± 14.0%) collectively explained over threefold-more variation than the polygenic background (fitness = 19.5 ± 9.1%, climate = 18.2 ± 9.5%) (Extended Data Table 1). The high heritability of these climate and biomass associations indicated that relatedness at a small subset of all variants out-predicted overall relatedness and provides breeders with genetic diversity to target for switchgrass improvement in local environments.

Loci that are associated with both fitness and climate of origin are probably involved in local adaptation³⁴, and are strong targets for the breeding of locally adapted cultivars. Overall, we observed nearly 2× more overlap of 20-kb regions associated with both climate and fitness than expected by chance (Fisher’s test odds ratio = 1.92, P < 1 × 10⁻⁶). This overlap was especially strong within the two northern subpopulations (Midwest, odds = 11.5× and P < 1 × 10⁻¹⁵; Atlantic, odds = 17.8× and P < 1 × 10⁻¹⁵) (Fig. 3c), where we expected to see the strongest effect of selection on survival during cold winters.

Many regions of climate and fitness overlap were polymorphic only within a single genetic subpopulation, which highlights several, possibly independent, genetic paths to climate adaptation in switchgrass. However, 9.5% (940) of the 20-kb climate intervals were polymorphic in several genetic subpopulations. Given the substantial evidence of admixture between the Gulf and Atlantic subpopulations (Fig. 3a), we expected that contemporary gene flow would be the major contributor to shared polymorphisms. Contrary to this hypothesis, the majority (511 regions) of all multi-subpopulation GWAS intervals were shared between the two most genetically distinct gene pools (Atlantic and Midwest). Given the deep divergence time between these subpopulations, rare or ancient gene flow³⁵ may have created these shared adaptive polymorphic regions.

Evolutionary convergence via introgression

To explicitly address how introgressions may have shaped the distribution of climate–SNP associations, we investigated physically contiguous regions of admixture across the genome using a hidden Markov model³⁶. Introgressions between subpopulations represented 2.98% of the content of our resequenced genomes (Fig. 4a), but were >1.5× more likely to contain shared GWAS intervals across subpopulations than expected by chance (Fisher’s test odds ratio = 1.55, P < 1 × 10⁻⁸), indicating that adaptive introgressions underlie at least a portion of heritable variants shared among subpopulations.

**Fig. 4: Mapping the location and effect of Midwest introgressions in the Atlantic subpopulation.**

Of particular interest were a suite of introgressions from the Midwest to the Atlantic subpopulation that dated to about 8,700 generations before present (17–34 thousand years ago (ka)), which coincides with a northern range expansion after the Last Glacial Maximum (about 22 ka). Atlantic genotypes with higher levels of Midwest introgressions exhibited a more-upland suite of traits (Fig. 4b) and were overrepresented along the northern margin of the otherwise subtropical and temperate range of the Atlantic subpopulation (Fig. 4c). Consistent with adaptive roles for genomic introgressions in other systems^31,37, these findings suggest that introgression of putatively northern-adapted alleles from the Midwest into the Atlantic subpopulation could have facilitated the post-glacial colonization by switchgrass of colder habitats in the northeastern coastal region of the USA. To test this hypothesis, we conducted redundancy analyses to relate the presence of introgression blocks with climatic, geographical and phenotypic factors. Overall, Midwest introgressions in the Atlantic subpopulation were over four times more strongly associated with climate (percentage of variance explained = 46.5%) than geography (11.5%). Although 532 and 651 introgressions from the Midwest to the Atlantic subpopulation were associated with climate of origin or biomass, respectively, 254 introgressions were outliers for both analyses—representing a nearly 7-fold enrichment over expectations of independence between each set (odds ratio = 6.99, P < 1 × 10⁻¹⁵). These results reinforce the hypothesis that Midwest introgressions have shaped the climatic niche and phenotypic distribution of the northern Atlantic genotypes and support a growing body of evidence that demonstrates that adaptive introgressions can facilitate both range expansion and ecotype evolution^38,39.

Reduced heritability of dominant subgenomes

Polyploidy is common among lineages of flowering plants and can increase the genetic diversity available to selection^40,41, which can lead to adaptive evolution or sorting that alters ecological niche characteristics⁴². This process may explain the generally greater prevalence of polyploids in poleward latitudes and higher elevations that were once covered by ice sheets during glacial cycles⁴³.

Genes duplicated during the formation of a polyploid can subfunctionalize (divide ancestral gene functions among paralogous genes), neofunctionalize (evolve new gene function for paralogues) or simply be lost⁴⁴. Following polyploid speciation, one subgenome commonly retains more genes and exhibits, on average, higher expression levels than the other subgenome, a phenomenon known as subgenome dominance⁴⁵. As with other polyploids^46,47,48, subgenome dominance and subfunctionalization were clear in switchgrass. Relative to the N subgenome, the K subgenome had higher gene density (77.4 versus 68.0 genes per Mb, binomial P < 1 × 10⁻¹⁵), more upregulated genes (5,445 versus 4,477, binomial P < 1 × 10⁻¹⁵) and lower rates of mutation accumulation (5,255 genes in the K subgenome with a synonymous mutation rate (K_s) greater than that in the N subgenome, versus 6,751 genes in the N subgenome with K_s greater than that in the K subgenome, binomial P < 1 × 10⁻¹⁵). Combined, all 11 of our subgenome statistics (Extended Data Fig. 4) point to stronger evolutionary constraint of and bias towards the K subgenome, which suggests that the potential for adaptive evolution may be differentially partitioned between subgenomes.

Given the evolutionary biases towards retention of the K subgenome, we expected to see stronger signals of climate adaptation⁴⁴, biomass and survival among SNPs on the K subgenome. Instead, 75.9% of biomass SNP–heritability was attributable to the N subgenome, and only 24.1% to the K subgenome across the 10 common gardens (Extended Data Fig. 4). Furthermore, 54.3% of Midwest introgressions into the Atlantic subpopulation were found on the N subgenome, a significant enrichment (binomial test P < 1 × 10⁻⁷), even when correcting for the 7.5% expansion of the N subgenome (binomial test P = 0.0012). The abundance of introgressions and heritable biomass variation attributable to the N subgenome may appear to be at odds with subgenome evolutionary biases towards the K subgenome. One potential explanation for this counterintuitive finding is that relaxed evolutionary constraint (reduced purifying selection) on the N subgenome may have allowed for accumulation of adaptive genetic variation through directional or diversifying selection. As such, the N subgenome has accumulated heritable variation⁴⁹ that future breeding regimes can target to shape natural switchgrass populations and improve biofuel yield.

Discussion

As the climate and the natural environment change, it is increasingly critical to qualify expectations of genetic improvements in domesticated species and the adaptive potential of wild populations⁵⁰. Indeed, plant genomes offer glimpses into the past and future of crop and wild plant populations. Adaptation to glacial–interglacial cycles offers an instructive analogue for current and future environmental change, one that we explore here to investigate the past, present and future genomic mechanisms of climate adaptation and yield improvement in switchgrass.

However, the complexity of plant genomes has also presented a major barrier to the development of genetic resources that facilitate fast and effective molecular breeding. Our methodology and success in sequencing the complex genome of switchgrass will facilitate ecological and agricultural genomics in nearly any system. For example, our results demonstrate that adaptation to northern climates has been facilitated by introgressions between anciently diverged subpopulations, which provides further support for the hypothesis that admixture between divergent genomes can enhance adaptation to novel environments³⁷. Such adaptive introgressions and heritable subgenome-specific genetic variation⁴⁹ may provide the genetic paths of least resistance that permit colonization of novel habitats during periods of environmental variability. Combined, obligate outcrossing and polyploidy—traits that are often consciously avoided when selecting genomic study systems—are the primary drivers of switchgrass adaptation in nature and the sources of genetic variation available for selection to improve biofuel yield through a changing future.

Methods

No statistical methods were used to predetermine sample size. The experiments were completely randomized, and investigators were not aware of genotype identifiers while conducting experiments or sequencing.

Plant collections, propagation, cultivation and phenotyping

To form the diversity panel, seeds, rhizomes and clonal propagules from natural and common garden sources were collected from 2010 to 2018. Plants grown from seed followed a standard growth procedure¹⁶. In brief, 10–15 seeds were sown in 9-cm square pots containing a mixture of ProMix BX potting soil (Premier Tech Horticulture) and Turface MVP calcined clay (Turface Athletics) and vernalized for 7 days at 4 °C. Pots were then placed in a lit greenhouse with 14-h day length and 30-°C/22-°C day/night temperature. Seedlings were thinned at the 3-leaf stage to 1 plant per pot and allowed to grow until the 5-tiller stage. Rhizome propagules and 5-tiller seedlings were transferred to 5-gallon pots containing finely ground pine bark mulch (Lone Star Mulch) and time-release fertilizer (Osmocote 14-14-14, ScottsMiracleGro). All individual plants were propagated in Austin by clonal division from 2016 to 2018, targeting >10 clones per unique accession. Cleary 3336F systemic fungicide (Cleary Chemicals) was applied to the plants as necessary to control fungal pathogens. Plants were placed in 1-gallon pots for the final propagation.

Planting in the field sites occurred from 15 May to 10 July 2018 and followed previously published methods¹⁶. In brief, plants were transported to each site by truck, where each field was covered with one layer of DeWitt weed cloth. Plants were placed in holes that were cut into the weed cloth into a honeycomb design in which each plant had four nearest neighbours, all located 1.56 m from one another. To prevent edge effects, the lowland Blackwell cultivar was planted at every edge position. Plants were hand-watered following transplantation. Aboveground portions of all plants were left to stand over the winter of 2018–2019 and removed in the spring of 2019 before spring tiller emergence. At the end of the 2019 season, plants were tied upright as a bunch and harvested with sickle bar mowers.

We generated two measures of fitness for the 2019 growing season: log-transformed biomass (kg) and proportion of winter survival (Supplementary Data 8). Biomass data were obtained from all living individuals during harvest in October and November 2019. Plants with an estimated mass <750 g were placed in paper bags and dried whole at 60 °C until no additional moisture loss occurred, then weighed for total dry biomass. Plants with an estimated mass >750 g were weighed in the field for wet biomass on a hanging scale with a ±5-g resolution. To determine biomass of these plants, approximately 500 g of whole tillers were subsampled from each plant, weighed, dried as above and reweighed. The wet biomass of the whole-plant sample was then multiplied by the per cent moisture in the subsample to approximate total dry biomass. Plants were considered to have experienced winter mortality during the 2018–2019 winter season when no new growth was seen from plant crowns by 1 June 2019. The dead plant crowns were excised from the experiment and replaced with plants of the Blackwell cultivar in July or September 2019.

Genome assembly and polishing

We sequenced the Alamo switchgrass genotype AP13 using a whole genome shotgun sequencing strategy and standard sequencing protocols at the Department of Energy Joint Genome Institute and the HudsonAlpha Institute for Biotechnology. The genome was assembled and polished from 4,520,785 PacBio reads (121.66× raw sequence coverage from a total of 59 P6C4 2.0 and 2.1 chemistry cells with 10-h movie times and a p-read yield of 91.76 Gb) (Extended Data Fig. 1) using the MECAT assembler⁵² and ARROW polisher⁵³. Final genome polishing and error correction was conducted with one 400 bp insert 2 × 150 bp Illumina HiSeq fragment library (177.1×). Reads with >95% simple sequence repeats and reads <50 bp after trimming for adaptor and quality (q < 20, 5-bp window average) were removed. The final read set consisted of 1,259,053,614 reads for a total of 168× coverage of high-quality Illumina bases. This produced an initial diploid assembly of 6,600 scaffolds (6,600 contigs), with a contig N₅₀ of 1.1 Mb, 3,489 scaffolds larger than 100 kb and a total 2C (diploid) genome size of 2,013.4 Mb.

Assembling a haploid genome in an outbred individual, such as AP13, will generally yield both haploid copies in heterozygous regions, necessitating computational steps to represent each chromosome as a single-copy haplotype without duplicate copies being unnecessarily repeated. Our initial assembly was approximately double the expected haploid (1C) genome size of 1.2 Gb. Therefore, to detect putative meiotically homologous haplotypes, we identified and counted shared 24-mers that occurred exactly twice in the assembly and binned contigs accordingly. A total of 3,152 shorter and redundant alternative haplotypes and 2,387 overlapping contig ends were identified, comprising a total sequence of 871.2 Mb. The remaining 1,142.2 Mb of sequence was ordered and oriented into 18 chromosomes by aligning genetic markers from 2 available maps (Supplementary Data 1) to the MECAT assembly; 563 joins and 57 breaks were made, with 10,000 Ns representing the unsized gap sequence. Overall, 97.2% of the assembled sequence was contained in the chromosomes. Telomeric sequence was identified using the (TTTAGGG)_n repeat and properly oriented. The remaining scaffolds were screened against GenBank bacterial proteins and organelle sequences and removed if found to match these sequences. To resolve minor overlapping regions on contig ends, adjacent contig ends were aligned to one another using BLAT⁵⁴; a total of 47 adjacent duplicate contig pairs were collapsed.

We conducted two rounds of error correction. First, we corrected homozygous SNPs and insertions and/or deletions (indels) by aligning the Illumina 2 × 150 bp library to the release consensus sequence using bwa mem⁵⁵ and identifying homozygous SNPs and indels with the UnifiedGenotyper tool of GATK⁵⁶. A total of 690 homozygous SNPs and 80,199 homozygous indels were corrected in the release. Second, we computationally finished 11,343 assembled contigs sequenced from BAC clones with a combination of ABI 3730XL capillary sequencers⁵⁷ and single index Illumina clone pools and aligned this set of switchgrass clones to the SNP-fixed genome to find heterozygous SNPs that were out of phase with their neighbours. To resolve these phase-switched alleles, the full set of the raw PacBio reads was aligned to the assembly. For each read, the phase of each heterozygous site was determined and 62,732 out-of-phase heterozygous sites were corrected.

To distinguish the N and K subgenomes, we used a de novo repeat-clustering method and validated this with phylogenetic distances to a related species. We searched for ‘diagnostic’ 15-mers via Jellyfish⁵⁸ in LTR regions of Gypsy, Copia and Pao insertions (identified by RepeatMasker⁵⁹ and LTRHarvest⁶⁰) that distinguished each set of homologous chromosomes (≤1 hit in one homologue and ≥100 in the other). The LTR sequences that shared common 15-mers were grouped as superfamilies and were aligned within each superfamily by BLAST. Superfamily members with significant BLAST hits (e < 0.01, ≥90% length) were assigned into families and aligned by Mafft⁶¹. Jukes–Cantor distances between LTR families were computed by the R ape package⁶², and clustered into two distinct sets of subgenomes. Clustering was identical between LTRs and alignments to P. rudgei (K.M.D. and E. Kellogg, unpublished data), which is an ancient relative of the K subgenome¹⁷, giving high confidence that we have effectively assigned all chromosomes to the correct subgenomes. Finally, we assigned chromosome identifiers and oriented each chromosome pseudomolecule via synteny with Setaria italica⁶³. The final haploid version 5.0 release contained 1,125.2 Mb of sequence, consisting of 626 contigs with a contig N50 of 5.5 Mb and a total of 97.2% of assembled bases in chromosomes.

Gene annotation

Transcript assemblies were made from about 2 billion pairs of 2 × 150-bp stranded paired-end Illumina RNA-seq reads, about 1 billion pairs of 2 × 100-bp paired-end Illumina RNA-seq reads and 454 reads (Supplementary Data 3) using PERTRAN (details of which have previously been published⁶⁴). In brief, PERTRAN conducts genome-guided transcriptome short-read assembly via GSNAP⁶⁵ and builds splice alignment graphs after alignment validation, realignment and correction. In total, around 4.5 million PacBio Iso-Seq circular consensus sequences⁶⁶ were corrected and collapsed, resulting in approximately 677,000 putative full-length transcript assemblies. Subsequently, 668,176 transcript assemblies were constructed using PASA⁶⁷ from RNA-seq reads, full-length cDNA, Sanger expressed sequence tags, and corrected and collapsed PacBio circular consensus sequence reads. Loci were determined by EXONERATE⁶⁸ alignments of switchgrass transcript assemblies and proteins from Arabidopsis thaliana⁶⁹, soybean⁷⁰, Kitaake rice⁷¹, Setaria viridis⁷², P. hallii var. hallii⁶⁴, Sorghum bicolor⁷³, Brachypodium distachyon⁷⁴, grape and Swiss-Prot⁷⁵ proteomes. These alignments were accomplished against a repeat-soft-masked switchgrass genome using RepeatMasker⁵⁹ (repeat library from RepeatModeler⁷⁶ and RepBase⁷⁷) with up to 2,000-bp extension on both ends unless extending into another locus on the same strand. Incomplete gene models, which had low homology support without full transcriptome support, or short single exon genes (<300-bp coding DNA sequences (CDS)) without protein domain or good expression were removed.

Comparative genomics

Syntenic orthologues and paralogues were inferred for the two switchgrass subgenomes via the GENESPACE pipeline⁶⁴, using default parameters and two outgroups: P. hallii var. hallii⁶⁴ and S. bicolor⁷³. In brief, GENESPACE parses protein similarity scores into syntenic blocks and runs orthofinder⁷⁸ on synteny-constrained blast results. The resulting block coordinates and syntenic orthology networks give high-confidence anchors for evolutionary inference.

To calculate the ancestral states of CDS regions, we first determined sequences that share common ancestry using genomes from Phytozome⁷⁹. The final number of hits to the switchgrass genome were 38,960 and 33,772 for P. hallii, and S. bicolor, respectively. For any given orthology network, we built two multiple sequence alignments in mafft⁶¹, one excluding the focal switchgrass sequence (msa₀) and one forcing msa₀ to align to the coordinate system of the focal sequence via the --keeplength parameter. We then extracted marginal character states with the maximum likelihood algorithm in Phangorn⁸⁰. For each reconstruction, only the internal node closest to the switchgrass branch was used as the ancestral state. Overall, we analysed 40,943 switchgrass gene models (216,157 exons) covering 54.95 Mb (Supplementary Data 9).

Subgenome evolution and dating

To infer the ages of the subgenomes and tetraploid switchgrass, we took a conservative set of orthologues with simple 2:1:1 networks between P. virgatum, P. hallii and S. italica. This yielded 45,045 switchgrass proteins aligning to 24,549 P. hallii proteins, resulting in 20,496 homologue pairs and 4,053 singletons (2,396 for K subgenome and 1,660 for N subgenome) from the cross-species analysis. We aligned the translated CDS of these sequences using Dialign-TX⁸¹. The aligned CDS sequences were concatenated and fed to Gblocks⁸² using default parameters. Gblocks filtered the alignment of 18,044,244 CDS nucleotides to 16,321,302 positions, in 50,334 blocks. The resulting alignment was then used in PhyML⁸³ to build a maximum-likelihood tree using the general-time reversible model. This tree was used as an input to r8s⁸⁴, to compute a time tree and calibrate the Panicum–Setaria node of the tree to 13.1 Ma⁶³. To date subgenome divergence and therefore the timing of polyploid switchgrass speciation, we leveraged burst distances, which refer to all distances within an LTR family (whereas pairwise distances refer to the distance between the 5′ and 3′ LTRs of the same insertion). The 5′ versus 3′ distances of the N- or K-subgenome-specific retrotransposons were used to date the insertion times of those elements. This method cannot be used for the P. virgatum-specific or Panicum-specific families because the more recent expansions of those elements dominate the distributions. Instead, we relied on comparing the best cross-species alignments to estimate the LTR distances of the P. virgatum–P. hallii and Panicum–Setaria nodes. This way, we have calibration points to compare the LTR distances to the more confident protein-coding gene divergences between species.

Subfunctionalization and gene expression analyses

To assess whether the subgenome evolution biases observed at the protein-coding sequence scale were manifest in phenotypes, we explored gene expression biases between homologues from biologically replicated AP13 leaf tissue (n ≥ 5) collected at two sites (TX₂ and MI). Illumina paired-end RNA-seq 150-bp reads were quality trimmed (Q ≥ 25) and reads shorter than 50 bp after trimming were discarded. High-quality sequences were aligned to P. virgatum v5.1 reference genome using GSNAP⁶⁵ and counts of reads uniquely mapping to annotated genes were obtained using HTSeq v.0.11.2⁸⁵. The test for differential expression was conducted through a likelihood ratio test in DESeq2⁸⁶. Library sizes were calculated before splitting the reads by subgenome; these sizes were used as the size factors in the analysis of differential expression. Subfunctionalization was defined as a significant subgenome-by-environment interaction from the likelihood ratio test. Subgenome expression bias was tested for both the field gardens and annotation libraries using post hoc Wald-test contrasts between subgenomes within conditions. Significant bias was defined as differential expression false-discovery-rate-adjusted P < 0.05. Weighted gene coexpression clustering of AP13 gene annotation RNA-seq libraries was conducted with WGCNA⁸⁷ with a power of 6. Raw counts can be found in Supplementary Data 10.

Ploidy assessment

We used a LSRFortessa SORP Flow Cytometer (BD Biosciences) to determine ploidy levels of the resequenced accessions. For each plant, 200–300 mg of young leaf tissue was macerated in a Petri dish with a razor blade and treated for 15 min with 1 ml Cystain PI Absolute P nuclei extraction buffer (Sysmex Flow Cytometry) mixed with 1 μl 2-mercaptoethanol. Samples were filtered to isolate free nuclei with a CellTrics 30-μm filter (Sysmex) and treated for 20 min on wet ice with 2 ml of Cystain PI Absolute P staining buffer (Sysmex), 12 μl of propidium iodide and 6 μl of RNase A. Samples were run on the flow cytometer to determine nuclei size with a minimum of 10,000 nuclei analysed per sample. Output from the flow cytometer was analysed with FlowJo software (BD Biosciences) and samples were binned into three categories on the basis of the average units of fluorescence per nuclei (Supplementary Fig. 1). Ploidy level of the sample was considered 4× if the cell population had 40,000–80,000 units of fluorescence, 6× for 80,000–100,000 units and 8× for 100,000–140,000 units. The binning parameters were established with flow cytometry data from several P. virgatum accessions of known ploidy.

We also assessed ploidy of the samples via the distribution of variant allele frequency at biallelic SNPs (as described in ‘Variant calling’). This method assumes that tetraploids and octoploids follow different allele frequency distribution patterns, with tetraploids having 0.5/0.5 (reference and variant depths) and octoploids having a mixture of 0.75/0.25 and 0.5/0.5. If the proportion of hits with 0.48 ≤ x ≤ 0.52 was <0.035, the library was considered octoploid and if it was ≥0.035, tetraploid; 837 out of 870 samples (96.2%) that had flow cytometry data matched with these results.

Variant calling

A total of 789 tetraploid diversity samples were resequenced at a median depth of 59× (range 20×–140×). Of these, 732 were used for further analysis after filtering for missing data, outlier elevated heterozygosity and collection site discrepancies. The samples were sequenced using Illumina HiSeq X10 and Illumina NovaSeq 6000 paired-end sequencing (2 × 150 bp) at HudsonAlpha Institute for Biotechnology and the Joint Genome Institute. To account for different library sizes, reads were pruned to ≤50× coverage, then mapped to the v5 assembly using bwa-mem⁵⁵.

SNPs were called by aligning Illumina reads to the AP13 reference with BWA-mem. The resulting .bam file was filtered for duplicates using Picard (http://broadinstitute.github.io/picard) and realigned around indels using GATK 3.0⁵⁶. Multi-sample SNP calling was done using SAMtools mpileup⁸⁸ and Varscan V2.4.0⁸⁹ with a minimum coverage of eight and a minimum alternate allele count of four. Genotypes were called via a binomial test. SNPs within 25 bp of a 24-mer repeat were removed from further analyses. Only SNPs with ≤20% missing data and minor allele frequencies >0.005 were retained, resulting in 33,905,042 SNPs across 75% of the genome at a coverage depth between 8× and 500×. Phasing was performed using SHAPEIT3⁹⁰. F_ST calculations were accomplished via vcftools⁹¹. We tested for subgenome read-mapping bias by generating mean coverage per Mb for each of the 732 libraries and 18 chromosomes. We then fit a mixed effects linear model to these data in lme4⁹² in which the chromosome number (1–9) was a random effect, to test the main effect of subgenome. Models with and without the main effect term were compared via a likelihood ratio test.

Individual de novo assemblies for the 732 short read libraries were constructed using HipMer⁹³ with a k-mer size of 101 to maximize haplotype splitting among contigs. As the assemblies varied in quality and contiguity, the sample set considered for gene presence–absence and structural variant detection was narrowed to 251 samples (pan-genome set) based on total assembly size, contig N₅₀ length and total gene alignments per library.

To assess presence–absence variation of genes across the pan-genome, we aligned all AP13 proteins and a unique set of 6,161 proteins from Oropetium thomaeum (n_proteins = 1,476)⁹⁴, S. italica (n = 1,085)⁶³, Setaria viridis (n = 891)⁷², P. hallii var. filipes (n = 1,048)⁶⁴, S. bicolor (n = 878)⁹⁵ and P. hallii var. hallii (n = 772)⁶⁴. These unique genes were extracted from single-copy orthology networks inferred via orthofinder⁷⁸ and selection owing to a lack of orthology to switchgrass. All proteins (≥100 amino acids) were aligned to all de novo assemblies using BLAT⁵⁴. Gene alignments from AP13 proteins were considered present if they aligned with greater than or equal to 80% identity and 75% coverage, whereas other grass proteins were considered present with alignments greater than 70% identity and 75% coverage (to allow greater divergence among species). Variable (pan-genome shell) genes (considered present across 40–60% of the population; n = 5,432) were extracted from the presence–absence variation matrix and used to visualize differences among non-admixed individuals from the Atlantic, Gulf and Midwest subpopulations. Testing genes that were significantly over- or under-represented within each subpopulation was conducted with a χ² test with a Benjamini–Hochberg multiple testing correction (P ≤ 0.05).

To detect structural variants across the pan-genome, contigs (≥2 kb) from each library were aligned to the AP13 reference genome using ngmlr⁹⁶ with default settings for PacBio reads. The resulting .bam file was sorted using samtools⁸⁸ and used for calling structural variants with sniffles⁹⁶. Individual structural variant calls were merged across samples using SURVIVOR⁹⁷, with a maximum allowed distance of 1 kb. The resulting .vcf file was filtered using bcftools⁸⁸ using a minimum minor allele frequency of 0.1, and considering only insertions and deletions between 100 and 1,500 bp in length.

Population genomics

To assess the genetic population structure of the 732 tetraploid libraries (Supplementary Data 4), we extracted all fourfold degenerate sites (putatively neutral) with ancestral state calls (Supplementary Data 9) from the ancestral state alignments. This list of sites, which represents our highest confidence neutral loci, was then linkage-disequilibrium-pruned using a threshold of |r| ≤ 0.6, resulting in 59,789 sites for downstream analyses in the R package SNPRelate⁹⁸.

The extent of linkage disequilibrium for the population was determined from SNPs⁹⁹ in PLINK¹⁰⁰. Linkage disequilibrium (r²) was calculated using plink (--ld-window 500--ld-window-kb 2000). The r² value was averaged every 500 bp. A nonlinear model was fit for this data in R using the nls function, and the extent was determined as to when the linkage disequilibrium (r²) nonlinear curve stabilized.

Population genetic structure was assessed hierarchically. Given the presence of highly divergent ecotypes across the study range, we first analysed the broadest genetic population structure using discriminant analysis of principal components (DAPC)¹⁰¹ in adegenet v.2.0.1¹⁰². This method does not rely on common assumptions (for example, Hardy–Weinberg equilibrium and linkage disequilibrium) that underlie many population clustering approaches and therefore provides a valuable tool to look at broad structural divisions. DAPC demonstrated a strong set of gene pools and separated Midwest genotypes from all others. We then evaluated the genetic population structure and potential admixture of the remaining non-Midwest individuals using a Bayesian clustering algorithm implemented in STRUCTURE v.2.3.4¹⁰³ via the admixture model with correlated allele frequencies. The analysis consisted of 20,000 burn‐in steps and 30,000 replicates of 1–6 genotypic groups, each of which was run 10 times. Ancestry coefficients across all subpopulations were assigned post hoc through eigenvector decomposition in SNPRelate.

We inferred the demographic history of the switchgrass samples using Multiple Sequentially Markovian Coalescent (MSMCv.2.0¹⁰⁴), which is a population genetic method used to infer demographic history and population structure through time from sequence data. This method models an approximate version of the coalescent under recombination, and produces tests of both population size and divergence time. MSMC was run using four haplotypes for each subpopulation, skipping ambiguous sites, an estimated rhoOverMu of 0.25 and a time segment pattern of 10 × 2 + 20 × 5 + 10 × 2. We estimated rhoOverMu as 0.25 as the mean value from 100 iterations without the fixed recombination parameter for 5 sets of 4 haplotypes in each subpopulation and averaged them. To estimate scaled divergence time in generations, we assumed a mutation rate of 6.5 × 10⁻⁸. To make estimates of initial divergence time, we compared adjacent relative cross-coalescence rate (RCCR) values (past to present) (Supplementary Data 11). If there was a decline, either at a single time segment or within contiguous segments or within two interleaved time segments (>0.01; observed range 0.01–0.28), and the following neighbours were nearly zero (≤0.009; observed range: −0.1–0.009), we considered that to be a starting point for population separation. However, if there was another decline within five time segments, we considered the latter as the start of population separation. We replicated the analyses with 16 sets of different individuals for each subpopulation contrast.

Population structure was visualized across SNPs, structural variants and presence–absence variants via eigenvector decomposition of a distance matrix. First, a Euclidean distance matrix was calculated among 0/1/2 (reference homozygote, heterozygous, alternative homozygote) library × marker matrices for each of the three variant call types. The Euclidean matrix was then scaled and centred to remove among-library coverage variance via Gower’s centred similarity matrix, implemented in the R package MDMR¹⁰⁵.

Ecotype classification

Mature switchgrass accessions at or near anthesis were surveyed for 16 plant traits (leaf: length, width, length/width ratio, area, lamina thickness and lamina/midrib thickness ratio; whole plant: number of tillers, tiller height, product of tiller height × number, tiller height/count ratio, panicle height, panicle height/count ratio, leaf canopy height and tiller/leaf height ratio; phenology: date of green-up and date of panicle emergence) to determine ecotype identity during the summer of 2019 at the University of Texas J. J. Pickle Research Campus (PKLE; or TX₂ (Austin, Texas, USA) and Michigan State University Kellogg Biological Station (KBSM; or MI (Hickory Corners, Michigan, USA)) common gardens (see Supplementary Data 5 for detailed descriptions of these variables). The phenology measurements, including green-up (when the first green vegetative structures emerge from the rhizome crown) and panicle emergence (when the first reproductive structures emerge from the tiller), were assayed daily. Detailed leaf morphology was assessed on a representative leaf of each plant by measuring length and width (in mm), midrib and lamina thickness (in μm) (Mitutoyo 547-500S caliper) and leaf area (in mm²) (Licor 3100C leaf area meter). In addition to these quantitative traits, we also generated a qualitative upland–lowland index for both the leaf and whole-plant appearance, collected at the end of the summer 2019 in Austin (TX₂ site). Each plant characteristic was assessed on a 1–5 scale from most lowland-like to most upland-like. The established cultivars Alamo and Dacotah were used for baseline measurements of lowland and upland characters, respectively. Plant characters assessed included: tiller appearance, from thickest and most lowland-like to thinnest and most upland-like; leaf appearance, from widest, longest and most lowland-like to shortest, thinnest and most upland-like; canopy colour from bluest and most typically lowland to darkest green and most typically upland. This visual approach is akin to basic selection criteria often used by switchgrass breeders.

To assess phenotypic structure in these data, we used a DAPC¹⁰¹. Prior groups were determined by first transforming the phenotypic data using principal component analysis (PCA), then the first 10 principal components were used in a k-means algorithm to classify individuals into 3 possible groupings aiming to maximize the variation between groups. Next, DAPC was implemented on the 10 retained principal components to provide an efficient description of the ecotypic clusters using two synthetic variables, which are linear combinations of the original phenotypic variables that have the largest between-group variance and the smallest within-group variance (that is, the discriminant functions).

We classified each of the 651 tetraploid genotypes surveyed for the 16 traits at the MI and TX₂ gardens (34 total features, 32 quantitative and 2 qualitative ordinal traits) to 1 of the 3 ecotypes through a low-capacity neural network with 1 hidden layer and 5 units (Supplementary Data 5). The neural network was implemented in caret¹⁰⁶ and was trained on seven cultivars with known ecotypes (lowland: Kanlow and Alamo; coastal: High Tide and Stuart; upland: Summer, Dacotah and Sunburst) and 78 additional genotypes that were in the same SNP-based genetic cluster (Extended Data Fig. 3), collected in the same states and clustered most closely in phenotypic PCA space with the exemplar cultivars. These high-affinity exemplar genotypes are printed in Supplementary Data 5. Ecotypes for the remaining 582 genotypes that were phenotyped for the ecotype classification traits were predicted with caret¹⁰⁶. By using traits collected at gardens representing both the northern and southern switchgrass range, we hoped to avoid local climate bias on plant phenotype and subsequent ecotype classification. Furthermore, the neural network classification approach offers one notable advantage over both DAPC and expert’s qualification: because the neural network is anchored to known and published genotypes, experimentation that includes these common cultivars will be able to more effectively recapitulate our assignments.

Admixture and introgression block calculation and dating

We built a database of admixture-informative SNPs through a two-step pipeline. First, ancestry coefficients were calculated as in ‘Population Genomics’ from fourfold degenerate sites that had associated ancestral-state calls. The 30 samples with the least missing data and proportion of genome-wide admixture ≤ 0.001 for each subpopulation were used to define subpopulation-specific allele frequencies. These libraries were used to find SNPs with at least one pairwise F_ST value >0.4, as calculated with the ‘W&C84’ method in the snpRelate function snpGdsFst. Second, these global ancestry-informative sites were parsed within each subpopulation to those with minor allele frequencies > 0.05 and missingness < 0.05. These sites were further pruned within subpopulations first to sites with |r| < 0.9 (10 SNPs or 1,000-bp windows), then to |r| < 0.95 (1,000 SNPs or 10,000-bp windows) in snpRelate. This process resulted in the following SNP and library counts for each subpopulation: Atlantic, 579,468 SNPs and 284 libraries; GULF, 641,975 SNPs and 215 libraries; and Midwest, 481,563 SNPs and 196 libraries.

To test for the physical locations of admixture blocks between each pair of subpopulations, we used Ancestry_HMM^36,107. This approach leverages allele frequencies in putative parental populations to determine regions of likely introgressions in a test population. For each of the three subpopulations, we sought to determine the timing, extent and current positions of admixture block introgressions. In each case, we permitted two pulses from each of the other two subpopulations. Ancestry_HMM can optimize the number of generations before present when an ancestry pulse occurred and the proportion of individuals involved in the admixture pulse. However, 8-parameter optimization with >480,000 sites and >150 libraries was not computationally feasible. Therefore, we optimized parameters using 40 randomly sampled libraries with admixture coefficients within the 0.2–0.8 quantiles of the admixture proportion distribution and SNPs only on chromosome 4 of the N subgenome. We chose this chromosome as representative of others because of a lack of obvious large high-frequency introgressions. The resulting ancestry pulse parameter optimizations were founded on an initially unadmixed population 10,000 generations before present, and two subsequent admixture pulses for each of the other two subpopulations; the optimized pulses are as follows (source–reference): Midwest–Atlantic (n_generations = 8,658 and P_admixed = 0.001%; 67 and 0.7%), Gulf–Atlantic (85 and 1.1%; 17 and 0.25%), Atlantic–Gulf (79 and 1.9%; 11 and 0.38%), Midwest–Gulf (79 and 0.86%; 11 and 0.14%), Atlantic–Midwest (66 and 0.27%; 14 and 0.036%), and Gulf–Midwest (71 and 0.15%; 14 and 0.033%). These pulses were supplied to the full model with all individuals and chromosomes, along with an error probability of 0.001, maximum number of generations before present of 10,000 and effective population size of 100,000. Posterior ancestry probabilities were decoded into haplotype blocks and blocks were binned into clusters of similarly positioned blocks.

Landscape genomics

Geographical maps were made with publicly available layers downloaded from Natural Earth (https://www.naturalearthdata.com/). Various plotting routines rely on the sf¹⁰⁸ and raster¹⁰⁹ packages in the R environment for statistical computing¹¹⁰. Climate data were downloaded from WorldClim²² (19 bioclimatic variables, 0.5-arcmin resolution 1960–2000) and ClimateNA²¹. The distribution of climate variables across collections sites was explored via dynamic clustering¹¹¹ followed by partitioning around medoids clustering¹¹² with k = 7. The most representative climate variables were defined as those most correlated with the first eigenvector of variation within each cluster. Six of the seven clusters included WorldClim variables.

Weather data were downloaded from the NOAA portal for the most proximate weather station to each garden site that had complete daily temperature (minimum–maximum), and precipitation data from 1 September 2018 to 31 October 2019. The NOAA weather station identifiers used for each garden are as follows: IL (USC00110338), MI (USW00014815), MO (USW00003945), NE (USC00255362), OK (USW00053926), SD (USC00391076), TX₁ (USC00414810), TX₂ (USC00410433), TX₃ (USC00418862) and TX₄ (USW00003901).

Climate–phenotype associations across gardens were conducted on both raw data and imputed data. Latitude–survival associations (Fig. 2b) were accomplished on raw data with logistic regressions via glm with a binomial family in R. Imputations, which were accomplished in base R using nearest neighbours across all available phenotypes (k = 5), were used exclusively for tests of the rank order of gardens (Fig. 2c, d). Climate similarity–biomass associations were accomplished in mixed linear models via lmer⁹², comparing the full model (fixed = climate distance + intercept, random = genotype identifier) to a reduced model without the climate distance fixed effect using a likelihood ratio test.

Species distribution modelling (SDM) was used to simulate modern-day potential ranges for all ecotypes (upland, lowland and coastal) of P. virgatum. The final datasets used to build the SDMs comprised 277 (upland), 199 (coastal) and 121 (lowland) occurrence records. Six environmental predictors were used in our final SDM modelling (BIO1 = annual mean temperature, BIO2 = mean diurnal range, BIO4 = temperature seasonality, BIO5 = maximum temperature of warmest month, BIO16 = precipitation of wettest quarter and BIO17 = precipitation of driest quarter). SDMs were then generated with BIOMOD2 v.3.3¹¹³ with seven modelling algorithms: generalized linear models, boosted regression trees, artificial neural networks, flexible discriminant analysis, random forest, classification tree analysis and multivariate adaptive regression splines. For each model, the occurrence data were coupled with 500 pseudo-absence data generated randomly within the modelled study area with equal weighting for presences and pseudo-absences¹¹⁴. Models were trained with 80% of the coupled occurrences and pseudo-absence data and tested with the remaining 20%. Each modelling algorithm was run 100 times for a total of 700 models, which were evaluated via true skill statistics (TSS)¹¹⁵. TSS values ranging from 0.2 to 0.5 were considered poor, from 0.6 to 0.8 useful, and >0.8 good to excellent¹¹⁶. Unique ensemble SDMs were computed from approximately the 50 best SDMs out of 700 models for the three ecotypes on the basis of TSS threshold values (upland TSS threshold = 0.96, lowland TSS threshold = 0.93 and coastal TSS threshold = 0.965). The final ensemble SDMs were projected onto present climate layers to visualize modern-day potential ranges (Supplementary Data 12).

We examined how the presence of Midwest introgressions in the Atlantic subpopulation were associated with the independent and joint influences of climate, geography and kinship, by implementing redundancy analysis in vegan^{117,118,119,120}. To partition explainable variance in introgression presence attributable to climate, kinship and geography, we ran four models: one full model with introgression presence (potential introgression blocks were coded as 0 for Atlantic inheritance or 1 for Midwest introgression) explained by climate (that is, the seven representative climate variables), kinship (the first two principal components calculated from the set of putatively neutral markers) and geography (latitude and longitude), and three models for each of these three factors conditioned on the other two. The inertia (that is, variance) values from the constrained matrix of each model were compared to determine the relative importance of climate, kinship, geography and their joint effect. Furthermore, to find introgression regions strongly linked to climate and survival-corrected biomass, we extracted the loadings for the redundancy analysis axes from two additional models: (1) one predicted by only climate and (2) one predicted only by survival-corrected biomass. Both models were significant according to permutation tests (n = 999; P < 0.001 for both), and all axes were approximately normally distributed. SNPs loading at the tails of each axis were more likely to indicate selection related to the predictors (that is, climate or survival-corrected biomass), so we identified all markers that were at least 2.5 s.d. (two-tailed P = 0.012) from the centre as introgressions putatively under selection¹¹⁹.

GWAS

Owing to the large sizes of our common garden datasets, we developed a pipeline—the switchgrassGWAS R package (https://github.com/Alice-MacQueen/switchgrassGWAS)—to allow fast, less-memory-intensive GWAS on the diversity panel, and to analyse the extent to which SNP effects were similar or different for phenotypes measured at different sites. This package leverages bigsnpr¹²¹ to perform fast (>300× faster than TASSEL) statistical analysis of massive SNP arrays encoded as matrices. It also incorporates current gold standards in the human genetics literature for SNP quality control, pruning and imputation, as well as population structure correction in GWAS. To test the significance of many effects in many conditions (for example, multiple sites, climate variables and so on), we used mashr³², a flexible, data-driven method that shares information on patterns of effect size and sign in any dataset for which effects can be estimated on a condition-by-condition basis for many conditions and SNPs. We determined which SNPs had evidence of significant phenotypic effects using local false sign rates, which are analogous to false discovery rates but more conservative (in that they also reflect the uncertainty in the estimation of the sign of the effect)¹²². We used these values to find SNPs with log₁₀-transformed Bayes factors > 2. Here, the Bayes factor was the ratio of the likelihood of one or more significant phenotypic effects at a SNP to the likelihood that the SNP had only null effects. Following previous work³³, a Bayes factor of >10² is considered decisive evidence in favour of the hypothesis that a SNP has one or more significant phenotypic effects.

To calculate regional heritability for climate- and fitness-associated SNPs we followed a previously described two-step method¹²³. Variance component analysis was accomplished with ASReml (VSN International), using genomic relationship matrices calculated using the van Raden method¹²⁴. Genomic relationship matrices were calculated within each subpopulation and for the full diversity panel. A kinship matrix based on all SNPs used in the univariate GWAS was calculated (G), as well as a kinship matrix based on SNPs significantly associated with climate in that subpopulation (log₁₀-transformed Bayes factor > 2; Q_climate) and a kinship matrix based on SNPs significantly associated with biomass or winter survival in that subpopulation (log₁₀-transformed Bayes factor > 2, or >1.385 for Gulf subpopulation; Q_fitness). These kinship matrices were used for regional heritability mapping¹²³ as in a previous publication¹²⁵, using mixed models of the form:

$${\bf{y}}=1+Zu+Zv+e$$

$${\rm{V}}{\rm{a}}{\rm{r}}(u)=G{\sigma }_{u}^{2}$$

$${\rm{V}}{\rm{a}}{\rm{r}}(v)=Q{\sigma }_{v}^{2}$$

$${\rm{V}}{\rm{a}}{\rm{r}}(e)=I{\sigma }_{e}^{2}$$

in which the vector y represents the biomass values, Z is the design matrix for random effects, u is the whole genomic additive genetic effect, v is the regional genomic additive genetic effect and e is the residual. Matrix G is the whole genomic relationship matrix using all SNPs for the whole genome additive effect. Matrix Q is the regional genomic relationship obtained as above: one of Q_climate or Q_fitness. I is the rank-y identity matrix, in which y is equal to the number of biomass values. Whole genomic, regional genomic and residual variances are ${\sigma }_{u}^{2}$, ${\sigma }_{v}^{2}$ and ${\sigma }_{e}^{2}$, respectively. Phenotypic variance (${\sigma }_{{\rm{p}}}^{2}$) is ${\sigma }_{u}^{2}$ + ${\sigma }_{v}^{2}$ + ${\sigma }_{e}^{2}$. Whole genomic heritability, regional heritability and total heritability are ${h}_{u}^{2}$ = (${\sigma }_{u}^{2}$/${\sigma }_{{\rm{p}}}^{2}$), ${h}_{v}^{2}$ = (${\sigma }_{v}^{2}$/${\sigma }_{{\rm{p}}}^{2}$) and ${h}_{u+v}^{2}$ = (${\sigma }_{u}^{2}$ + ${\sigma }_{v}^{2}$/${\sigma }_{{\rm{p}}}^{2}$), respectively.

These models were run for the three locations where subpopulation GWAS were conducted: Columbia, Missouri; Hickory Corners, Michigan; and Austin, Texas. This resulted in 80 models: 4 sets of populations (the full diversity panel and 3 subpopulations), 2 model types (one model with G only and a G + Q model), for 10 phenotypes (biomass at 3 sites and 7 environmental variables).

Variance component analyses were also used to partition variance between the K- and N-subgenomes. Only SNPs with ancestral state calls (Supplementary Data 9) were used in this analysis, resulting in 460,429 SNPs used for each population subset. Kinship matrices based on all SNPs on a particular chromosome were calculated (Q_Chr01K to Q_Chr09K, and Q_Chr01N to Q_Chr09N), resulting in 18 kinship matrices. These kinship matrices were used for regional heritability mapping, using mixed models of the form:

$${\bf{y}}=1+Z{v}_{1{\rm{K}}}+Z{v}_{1{\rm{N}}}+Z{v}_{2{\rm{K}}}+\ldots +Z{v}_{9{\rm{N}}}+e$$

$${\rm{V}}{\rm{a}}{\rm{r}}({v}_{i})={Q}_{i}{\sigma }_{{v}_{i}}^{2}$$

$${\rm{V}}{\rm{a}}{\rm{r}}(e)=I{\sigma }_{e}^{2}$$

in which the vector y represents the biomass values, Z is the design matrix for random effects, v_1K (to v_9K) or v_1N (to v_9N) (collectively designated v_i) are the chromosome-specific genomic additive genetic effects and e is the residual. Matrices Q_i are the chromosome-specific genomic relationship matrices for the nine chromosomes of the N and K subgenomes. Chromosome-specific and residual variances are ${\sigma }_{{v}_{i}}^{2}$ and ${\sigma }_{e}^{2}$, respectively. Chromosome-specific heritability is ${h}_{{v}_{i}}^{2}$ = (${\sigma }_{{v}_{i}}^{2}$ /${\sigma }_{{\rm{p}}}^{2}$), and subgenome-specific heritability is the sum of these variances across the nine chromosomes within each subgenome.

Candidate gene exploration

We integrated multiple data structures to rank and provide meaningful culling criteria for candidate genes within introgression intervals and physical proximity to quantitative trait loci peaks. In the case of GWAS peaks, candidate genes were defined as those loci within a 20-kb interval surrounding the mashr peak. Candidate genes for genomic introgressions must have at least partially overlapped the introgression interval. As inference of GWAS and introgressions were conducted within genetic subpopulations, all statistics reported in Supplementary Data 7 (candidate gene lists) are also subpopulation-specific, with the exception of gene co-expression analysis (which was conducted only on AP13 RNA-sequencing libraries used for annotation purposes (Supplementary Data 3)). For a given interval, we present a set of statistics. First, the physical proximity to the peak location was calculated as the midpoint of the gene to the midpoint of the interval (introgression) or GWAS peak position. Second, as the causal locus underlying GWAS peaks within a subpopulation must necessarily be variable within that subpopulation, we extracted all SNPs within and proximate to candidate gene models. These variants were annotated with SNPeff¹²⁶ and the weighted sum of three main categories of variants (high, moderate and low; a description of these can be found at https://pcingola.github.io/SnpEff/se_inputoutput/#effect-prediction-details) for each gene were calculated as SNPeff_score = high × 20 + moderate × 5 + low × 1. Third, for each gene, we calculated the minor allele frequency of structural and presence–absence variants. Fourth, we include a vector of the identity of the WGCNA clusters for each gene. Finally, if the candidate was a homologue of flowering-time GWAS candidate genes from a previous publication¹²⁷, the identity of the overlapping interval or gene is included.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Sequence Read Archive accession codes for all RNA and DNA sequencing libraries can be found in Supplementary Data 3 and 4, respectively. The v5 AP13 genome has been deposited at DDBJ/ENA/GenBank under the accession JABWAI000000000. The genome, gene and repeat annotations can also be downloaded directly from Phytozome at https://phytozome-next.jgi.doe.gov/info/Pvirgatum_v5_1. Whenever possible, plant material will be shared upon request. Source data are provided with this paper.

Code availability

Custom pipelines for GWAS and other analyses are available from dataverse at https://doi.org/10.18738/T8/J377KE.

References

Lobell, D. B., Schlenker, W. & Costa-Roberts, J. Climate trends and global crop production since 1980. Science 333, 616–620 (2011).
Article ADS CAS PubMed Google Scholar
Challinor, A. J. et al. A meta-analysis of crop yield under climate change and adaptation. Nat. Clim. Chang. 4, 287–291 (2014).
Article ADS Google Scholar
Rosenzweig, C. et al. Assessing agricultural risks of climate change in the 21st century in a global gridded crop model intercomparison. Proc. Natl Acad. Sci. USA 111, 3268–3273 (2014).
Article ADS CAS PubMed Google Scholar
Porter, J. R. et al. in Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects (Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change) (eds Field, C. B. et al.) 485–533 (Cambridge Univ. Press, 2014).
Bevan, M. W. et al. Genomic innovation for crop improvement. Nature 543, 346–354 (2017).
Article ADS CAS PubMed Google Scholar
Nelson, R., Wiesner-Hanks, T., Wisser, R. & Balint-Kurti, P. Navigating complexity to breed disease-resistant crops. Nat. Rev. Genet. 19, 21–33 (2018).
Article CAS PubMed Google Scholar
Risser, P. G., Birney, E. C. & Blocker, H. D. The True Prairie Ecosystem (Dowden, Hutchinson and Ross, 1981).
Suyker, A. E. & Verma, S. B. Year‐round observations of the net ecosystem exchange of carbon dioxide in a native tallgrass prairie. Glob. Change Biol. 7, 279–289 (2001).
Article ADS Google Scholar
Schmer, M. R., Vogel, K. P., Mitchell, R. B. & Perrin, R. K. Net energy of cellulosic ethanol from switchgrass. Proc. Natl Acad. Sci. USA 105, 464–469 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Palik, D. J., Snow, A. A., Stottlemyer, A. L., Miriti, M. N. & Heaton, E. A. Relative performance of non-local cultivars and local, wild populations of switchgrass (Panicum virgatum) in competition experiments. PLoS ONE 11, e0154444 (2016).
Article CAS PubMed PubMed Central Google Scholar
McLaughlin, S. et al. in Perspectives on New Crops and New Uses (ed. Janick, J.) 282–299 (ASHS, 1999).
Vogel, K. P., Schmer, M. R. & Mitchell, R. B. Plant adaptation regions: ecological and climatic classification of plant materials. Rangeland Ecol. Manag. 58, 315–319 (2005).
Article Google Scholar
Casler, M. D. et al. Latitudinal and longitudinal adaptation of switchgrass populations. Crop Sci. 47, 2249–2260 (2007).
Article Google Scholar
Lipka, A. E. et al. Accelerating the switchgrass (Panicum virgatum L.) breeding cycle using genomic selection approaches. PLoS ONE 9, e112227 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Poudel, H. P., Sanciangco, M. D., Kaeppler, S. M., Buell, C. R. & Casler, M. D. Genomic prediction for winter survival of lowland switchgrass in the northern USA. G3 9, 1921–1931 (2019).
Article PubMed PubMed Central Google Scholar
Lowry, D. B. et al. QTL × environment interactions underlie adaptive divergence in switchgrass across a large latitudinal gradient. Proc. Natl Acad. Sci. USA 116, 12933–12941 (2019).
Article CAS PubMed PubMed Central Google Scholar
Triplett, J. K., Wang, Y., Zhong, J. & Kellogg, E. A. Five nuclear loci resolve the polyploid history of switchgrass (Panicum virgatum L.) and relatives. PLoS ONE 7, e38702 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Martínez-Reyna, J. M. & Vogel, K. P. Incompatibility systems in switchgrass. Crop Sci. 42, 1800–1805 (2002).
Article Google Scholar
Casler, M. D., Vogel, K. P. & Harrison, M. Switchgrass germplasm resources. Crop Sci. 55, 2463–2478 (2015).
Article CAS Google Scholar
Evans, J. et al. Extensive genetic diversity is present within North American switchgrass germplasm. Plant Genome 11, 1–16 (2018).
Article CAS Google Scholar
Wang, T., Hamann, A., Spittlehouse, D. & Carroll, C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE 11, e0156720 (2016).
Article PubMed PubMed Central CAS Google Scholar
Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315 (2017).
Article Google Scholar
Lowry, D. B. et al. Adaptations between ecotypes and along environmental gradients in Panicum virgatum. Am. Nat. 183, 682–692 (2014).
Article PubMed Google Scholar
Casler, M. D., Vogel, K. P., Taliaferro, C. M. & Wynia, R. L. Latitudinal adaptation of switchgrass populations. Crop Sci. 44, 293–303 (2004).
Article Google Scholar
Porter, C. L. An analysis of variation between upland and lowland switchgrass Panicum virgatum L in central Oklahoma. Ecology 47, 980–992 (1966).
Article Google Scholar
McMillan, C. Ecotypic differentiation within four North American prairie grasses. I. Morphological variation within transplanted community fractions. Am. J. Bot. 51, 1119–1128 (1964).
Article Google Scholar
Grabowski, P. P., Morris, G. P., Casler, M. D. & Borevitz, J. O. Population genomic variation reveals roles of history, adaptation and ploidy in switchgrass. Mol. Ecol. 23, 4059–4073 (2014).
Article PubMed PubMed Central Google Scholar
Lu, F. et al. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 9, e1003215 (2013).
Article CAS PubMed PubMed Central Google Scholar
Casler, M. D. et al. 30 years of progress toward increased biomass yield of switchgrass and big bluestem. Crop Sci. 58, 1242–1254 (2018).
Article Google Scholar
Casler, M. D. & Vogel, K. P. Selection for biomass yield in upland, lowland, and hybrid switchgrass. Crop Sci. 54, 626–636 (2014).
Article Google Scholar
Suarez-Gonzalez, A., Lexer, C. & Cronk, Q. C. B. Adaptive introgression: a plant perspective. Biol. Lett. 14, 20170688 (2018).
Article PubMed PubMed Central Google Scholar
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
Article CAS PubMed Google Scholar
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Article MathSciNet MATH Google Scholar
Fournier-Level, A. et al. A map of local adaptation in Arabidopsis thaliana. Science 334, 86–89 (2011).
Article ADS CAS PubMed Google Scholar
Zhang, Y. et al. Post-glacial evolution of Panicum virgatum: centers of diversity and gene pools revealed by SSR markers and cpDNA sequences. Genetica 139, 933–948 (2011).
Article PubMed Google Scholar
Corbett-Detig, R. & Nielsen, R. A hidden Markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy. PLoS Genet. 13, e1006529 (2017).
Article PubMed PubMed Central CAS Google Scholar
Todesco, M. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020).
Article ADS CAS PubMed Google Scholar
Lucek, K., Lemoine, M. & Seehausen, O. Contemporary ecotypic divergence during a recent range expansion was facilitated by adaptive introgression. J. Evol. Biol. 27, 2233–2248 (2014).
Article CAS PubMed Google Scholar
Whitney, K. D. et al. Quantitative trait locus mapping identifies candidate alleles involved in adaptive introgression and range expansion in a wild sunflower. Mol. Ecol. 24, 2194–2211 (2015).
Article PubMed PubMed Central Google Scholar
Comai, L. The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6, 836–846 (2005).
Article CAS PubMed Google Scholar
Mattenberger, F., Sabater-Muñoz, B., Toft, C. & Fares, M. A. The phenotypic plasticity of duplicated genes in Saccharomyces cerevisiae and the origin of adaptations. G3 7, 63–75 (2017).
Article CAS PubMed Google Scholar
Clark, J. W. & Donoghue, P. C. J. Whole-genome duplication and plant macroevolution. Trends Plant Sci. 23, 933–945 (2018).
Article CAS PubMed Google Scholar
Stebbins, G. L. Polyploidy, hybridization, and the invasion of new habitats. Ann. Mo. Bot. Gard. 72, 824 (1985).
Article Google Scholar
Bird, K. A., VanBuren, R., Puzey, J. R. & Edger, P. P. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 220, 87–93 (2018).
Article PubMed Google Scholar
Flagel, L. E. & Wendel, J. F. Evolutionary rate variation, genomic dominance and duplicate gene expression evolution during allotetraploid cotton speciation. New Phytol. 186, 184–193 (2010).
Article CAS PubMed Google Scholar
Edger, P. P. et al. Origin and evolution of the octoploid strawberry genome. Nat. Genet. 51, 541–547 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
Article CAS PubMed PubMed Central Google Scholar
Session, A. M. et al. Genome evolution in the allotetraploid frog Xenopus laevis. Nature 538, 336–343 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Nieto Feliner, G., Casacuberta, J. & Wendel, J. F. Genomics of evolutionary novelty in hybrids and polyploids. Front. Genet. 11, 792 (2020).
Article PubMed PubMed Central CAS Google Scholar
Davis, M. B. & Shaw, R. G. Range shifts and adaptive responses to Quaternary climate change. Science 292, 673–679 (2001).
Article ADS CAS PubMed Google Scholar
South, A. rnaturalearthdata: World Vector Map Data from Natural Earth Used in ‘rnaturalearth’. R package version 0.1.0. https://CRAN.R-project.org/package=rnaturalearthdata (2017).
Xiao, C.-L. et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat. Methods 14, 1072–1074 (2017).
Article CAS PubMed Google Scholar
Chin, C.-S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
Article CAS PubMed Google Scholar
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sharma, M. K. et al. Targeted switchgrass BAC library screening and sequence analysis identifies predicted biomass and stress response-related genes. Bioenerg. Res. 9, 109–122 (2016).
Article CAS Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central CAS Google Scholar
Smit, A. F., Hubley, R. & Green, P. RepeatMasker, http://www.repeatmasker.org/ (1996).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Article PubMed PubMed Central CAS Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Article CAS PubMed Google Scholar
Bennetzen, J. L. et al. Reference genome sequence of the model plant Setaria. Nat. Biotechnol. 30, 555–561 (2012).
Article CAS PubMed Google Scholar
Lovell, J. T. et al. The genomic landscape of molecular responses to natural drought stress in Panicum hallii. Nat. Commun. 9, 5213 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zuo, C. et al. Revealing the transcriptomic complexity of switchgrass by PacBio long-read sequencing. Biotechnol. Biofuels 11, 170 (2018).
Article PubMed PubMed Central CAS Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
Article PubMed PubMed Central CAS Google Scholar
Lamesch, P. et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40, D1202–D1210 (2012).
Article CAS PubMed Google Scholar
Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).
Article ADS CAS PubMed Google Scholar
Jain, R. et al. Genome sequence of the model rice variety KitaakeX. BMC Genomics 20, 905 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mamidi, S. et al. A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci. Nat. Biotechnol. 38, 1203–1210 (2020).
Article CAS PubMed PubMed Central Google Scholar
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Article ADS CAS PubMed Google Scholar
Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Article CAS Google Scholar
Smit, A. & Hubley, R. RepeatModeler Open-1.0, http://www.repeatmasker.org/ (2010).
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Article PubMed PubMed Central CAS Google Scholar
Goodstein, D. M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).
Article CAS PubMed Google Scholar
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).
Article CAS PubMed Google Scholar
Subramanian, A. R., Kaufmann, M. & Morgenstern, B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3, 6 (2008).
Article PubMed PubMed Central CAS Google Scholar
Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 (2007).
Article CAS PubMed Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS PubMed Google Scholar
Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302 (2003).
Article CAS PubMed Google Scholar
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Article CAS PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central CAS Google Scholar
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Article PubMed PubMed Central CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Article CAS PubMed PubMed Central Google Scholar
O’Connell, J. et al. Haplotype estimation for biobank-scale data sets. Nat. Genet. 48, 817–820 (2016).
Article PubMed PubMed Central CAS Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, https://doi.org/10.18637/jss.v067.i01 (2015).
Azad, A., Pavlopoulos, G. A., Ouzounis, C. A., Kyrpides, N. C. & Buluç, A. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res. 46, e33 (2018).
Article CAS PubMed PubMed Central Google Scholar
VanBuren, R. et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature 527, 508–511 (2015).
Article ADS CAS PubMed Google Scholar
McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).
Article CAS PubMed Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Article CAS PubMed PubMed Central Google Scholar
Remington, D. L. et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl Acad. Sci. USA 98, 11479–11484 (2001).
Article ADS CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Jombart, T., Devillard, S. & Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 11, 94 (2010).
Article PubMed PubMed Central Google Scholar
Jombart, T. & Ahmed, I. adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27, 3070–3071 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Article CAS PubMed PubMed Central Google Scholar
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
Article CAS PubMed PubMed Central Google Scholar
McArtor, D. B., Lubke, G. H. & Bergeman, C. S. Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic. Psychometrika 82, 1052–1077 (2017).
Article MathSciNet PubMed MATH Google Scholar
Kuhn, M. et al. Caret: Classification and Regression Training. R package version 6.0-78. https://CRAN.R-project.org/package=caret (2016).
Medina, P., Thornlow, B., Nielsen, R. & Corbett-Detig, R. Estimating the timing of multiple admixture pulses during local ancestry inference. Genetics 210, 1089–1107 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pebesma, E. Simple features for R: standardized support for spatial vector data. R J. 10, 439 (2018).
Article Google Scholar
Hijmans, R. J. et al. raster: Geographic Data Analysis and Modeling. R package version 3.4-5. https://CRAN.R-project.org/package=raster (2015).
R Core Team. R: A Language and Environment for Statistical Computing, https://www.r-project.org/ (R Foundation for Statistical Computing, 2013).
Langfelder, P., Zhang, B. & Horvath, S. dynamicTreeCut: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms. R package version 1.63-1. https://CRAN.R-project.org/package=dynamicTreeCut (2014).
Maechler, M. et al. Cluster: Cluster Analysis Basics and Extensions. R package version 1-56. https://CRAN.R-project.org/package=cluster (2012).
Thuiller, W., Georges, D., Engler, R., & Breiner, F. biomod2: Ensemble Platform for Species Distribution Modeling. R package version 3.3-7. https://CRAN.R-project.org/package=biomod2 (2016).
Barbet-Massin, M., Jiguet, F., Albert, C. H. & Thuiller, W. Selecting pseudo-absences for species distribution models: how, where and how many? Methods Ecol. Evol. 3, 327–338 (2012).
Article Google Scholar
Allouche, O., Tsoar, A. & Kadmon, R. Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol. 43, 1223–1232 (2006).
Article Google Scholar
Coetzee, B. W. T., Robertson, M. P., Erasmus, B. F. N., van Rensburg, B. J. & Thuiller, W. Ensemble models predict Important Bird Areas in southern Africa will become less effective for conserving endemic birds under climate change. Glob. Ecol. Biogeogr. 18, 701–710 (2009).
Article Google Scholar
Oksanen, J., Blanchet, F., Kindt, R., Legendre, P. & Minchin, R. vegan: Community Ecology Package. R package version 2.0-10. https://CRAN.R-project.org/package=vegan (2013).
Gugger, P. F., Ikegami, M. & Sork, V. L. Influence of late Quaternary climate change on present patterns of genetic variation in valley oak, Quercus lobata Née. Mol. Ecol. 22, 3598–3612 (2013).
Article PubMed Google Scholar
Napier, J. D., de Lafontaine, G. & Hu, F. S. Exploring genomic variation associated with drought stress in Picea mariana populations. Ecol. Evol. 10, 9271–9282 (2020).
Article PubMed PubMed Central Google Scholar
Forester, B. R., Lasky, J. R., Wagner, H. H. & Urban, D. L. Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations. Mol. Ecol. 27, 2215–2233 (2018).
Article CAS PubMed Google Scholar
Privé, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).
Article PubMed PubMed Central CAS Google Scholar
Stephens, M. False discovery rates: a new deal. Biostat. 60, kxw041 (2016).
Article Google Scholar
George, A. W., Visscher, P. M. & Haley, C. S. Mapping quantitative trait loci in complex pedigrees: a two-step variance component approach. Genetics 156, 2081–2092 (2000).
Article CAS PubMed PubMed Central Google Scholar
VanRaden, P. M. et al. Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92, 16–24 (2009).
Article CAS PubMed Google Scholar
Santantonio, N., Jannink, J.-L. & Sorrells, M. A low resolution epistasis mapping approach to identify chromosome arm interactions in allohexaploid wheat. G3 9, 675–684 (2018).
Article PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6, 80–92 (2012).
Article CAS Google Scholar
Grabowski, P. P. et al. Genome-wide associations with flowering time in switchgrass using exome-capture sequencing data. New Phytol. 213, 154–169 (2017).
Article CAS PubMed Google Scholar
Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009).
Article ADS CAS PubMed Google Scholar
Maccaferri, M. et al. Durum wheat genome highlights past domestication signatures and future improvement targets. Nat. Genet. 51, 885–895 (2019).
Article CAS PubMed Google Scholar
Zou, C. et al. The genome of broomcorn millet. Nat. Commun. 10, 436 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
VanBuren, R. et al. Exceptional subgenome stability and functional divergence in the allotetraploid Ethiopian cereal teff. Nat. Commun. 11, 884 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hofmeister, B. T. et al. A genome assembly and the somatic genetic and epigenetic mutation rate in a wild long-lived perennial Populus trichocarpa. Genome Biol. 21, 259 (2020).
Article CAS PubMed PubMed Central Google Scholar
Marrano, A. et al. High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome. Gigascience 9, giaa050 (2020).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

Plant collecting was conducted in collaboration with J. Randall (North Carolina Botanical Garden) through the Seeds for Success programme, A. Stottlemeyer (OSU and the USDA-NIFA Biotechnology Risk Assessment Grant Program, no. 2010-33522-21703), T. Quedensley, M. Donahue, D. Schemske and J. M. M. Reyna. We thank the Brackenridge Field laboratory, the Ladybird Johnson Wildflower Center and the Juenger laboratory for support with plant care and propagation. M. Donahue led the curation, propagation and maintenance of the diversity panel. Fieldwork was also conducted by P. Duberney, S. Reeder, K. Turner, M. Carey, T. Arredondo, N. Ryan, B. Watson, B. Battershell, N. Albert, H. Wilson, L. Simon, J. Sanley, L. Vormwald, T. Bortnem, S. Hofmann, M. Iceberg, C. Lamb and T. Vugteveen. Advice from J. G. Monroe, D. Hoover, P. Edger, J. Lasky, E. Kellogg, J. Vogel, G. Sarath and J. Tuskan helped to craft experimental designs, sequencing strategies and earlier versions of this text. R. VanBuren, P. Edger, H. Zheng, D. Ware and L. Cattivelli provided genome comparison information. We thank the HudsonAlpha Genomic Services Lab for loading Illumina X10 sequencing runs. This research was supported by the US Department of Energy Awards DESC0014156 to T.E.J., DE-SC0017883 to D.B.L. and DE-SC0010743 to K.M.D., the Great Lakes Bioenergy Research Center (Awards DESC0018409 and DE-FC02-07ER64494) and the Center for Bioenergy Innovation (Award DE-AC05-000R22725). Funding was provided by National Science Foundation PGRP Awards IOS0922457 and IOS1444533 to T.E.J. and IOS1402393 to J.T.L. The work conducted by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US Department of Energy under Contract No DE-AC02-05CH11231. The work conducted by the Joint BioEnergy Institute is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231. The work conducted by Argonne National Laboratory is supported by the Office of Science of the US Department of Energy under contract DE-AC02-06CH11357. J.S. thanks T. Marsh for transferring his passion for ecological science to J.S. T.E.J. thanks K. Robertson for introducing him to prairie habitats and plant diversity.

Author information

These authors contributed equally: John T. Lovell, Alice H. MacQueen, Sujan Mamidi, Jason Bonnette, Jerry Jenkins

Authors and Affiliations

Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
John T. Lovell, Sujan Mamidi, Jerry Jenkins, Avinash Sreedasyam, Adam Healey, LoriBeth Boston, Paul P. Grabowski, Chris Plott, David Sims, Ada Stewart, Jenell Webber, Melissa Williams, Jane Grimwood & Jeremy Schmutz
Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
Alice H. MacQueen, Jason Bonnette, Joseph D. Napier, Taslima Haque, Eugene V. Shakirov, Xiaoyu Weng, Li Zhang, Kathrine D. Behrman & Thomas E. Juenger
Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Adam Session, Shengqiang Shu, Kerrie Barry, Christopher Daum, Shweta Deshpande, Aren Ewing, Anna Lipzen, Vasanth R. Singan, Guohong Albert Wu, Yuko Yoshinaga, Matthew Zane, Daniel S. Rokhsar & Jeremy Schmutz
Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
Adam Session & Daniel S. Rokhsar
Department of Plant Biology, Rutgers University, New Brunswick, NJ, USA
Stacy Bonos
Plant Genetic Resources Conservation Unit, USDA-ARS, Griffin, GA, USA
Melanie Harrison
Department of Plant Biology, Michigan State University, East Lansing, MI, USA
Jiming Jiang & David B. Lowry
Arizona Genomics Institute, University of Arizona, Tucson, AZ, USA
Dave Kudrna & Rod A. Wing
Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Athens, GA, USA
Thomas H. Pendergast IV, Peng Qi & Katrien M. Devos
Department of Crop and Soil Sciences, University of Georgia, Athens, GA, USA
Thomas H. Pendergast IV & Katrien M. Devos
Department of Plant Biology, University of Georgia, Athens, GA, USA
Thomas H. Pendergast IV & Katrien M. Devos
Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, USA
Christopher A. Saski
Department of Biological Sciences, Marshall University, Huntington, WV, USA
Eugene V. Shakirov
School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
Manoj Sharma
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
Rita Sharma
Noble Research Institute LLC, Ardmore, OK, USA
Yuhong Tang, Jiyi Zhang, Malay Saha & Michael Udvardi
Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, USA
Sandra Thibivillier
Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
Arvid R. Boe
Grassland, Soil and Water Research Laboratory, USDA-ARS, Temple, TX, USA
Philip A. Fay
Division of Plant Sciences, University of Missouri, Columbia, MO, USA
Felix B. Fritschi
Environmental Science Division, Argonne National Laboratory, Lemont, IL, USA
Julie D. Jastrow & Roser Matamala
Kika de la Garza Plant Materials Center, USDA-NRCS, Kingsville, TX, USA
John Lloyd-Reilley
Plant Breeding Department, Antonio Narro Agrarian Autonomous University, Saltillo, Mexico
Juan Manuel Martínez-Reyna
Wheat, Sorghum, and Forage Research Unit, USDA-ARS, Lincoln, NE, USA
Robert B. Mitchell
Texas A&M AgriLife Research and Extension Center, Texas A&M University, Overton, TX, USA
Francis M. Rouquette Jr
Department of Plant Pathology and the Genome Center, University of California, Davis, Davis, CA, USA
Pamela Ronald
Joint BioEnergy Institute, Emeryville, CA, USA
Pamela Ronald
Western Regional Research Center, USDA-ARS, Albany, CA, USA
Christian M. Tobias
Department of Plant and Soil Sciences, Oklahoma State University, Stillwater, OK, USA
Yanqi Wu
Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, USA
Laura E. Bartley
Institute of Biological Chemistry, Washington State University, Pullman, WA, USA
Laura E. Bartley
US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA
Michael Casler
DOE Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI, USA
Michael Casler
DOE Center for Bioenergy Innovation, Oak Ridge, TN, USA
Katrien M. Devos
DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, USA
David B. Lowry
Center for Advanced Bioenergy and Bioproducts Innovation, Berkeley, CA, USA
Daniel S. Rokhsar
Chan-Zuckerberg Biohub, San Francisco, CA, USA
Daniel S. Rokhsar

Authors

John T. Lovell
View author publications
You can also search for this author in PubMed Google Scholar
Alice H. MacQueen
View author publications
You can also search for this author in PubMed Google Scholar
Sujan Mamidi
View author publications
You can also search for this author in PubMed Google Scholar
Jason Bonnette
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Jenkins
View author publications
You can also search for this author in PubMed Google Scholar
Joseph D. Napier
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Sreedasyam
View author publications
You can also search for this author in PubMed Google Scholar
Adam Healey
View author publications
You can also search for this author in PubMed Google Scholar
Adam Session
View author publications
You can also search for this author in PubMed Google Scholar
Shengqiang Shu
View author publications
You can also search for this author in PubMed Google Scholar
Kerrie Barry
View author publications
You can also search for this author in PubMed Google Scholar
Stacy Bonos
View author publications
You can also search for this author in PubMed Google Scholar
LoriBeth Boston
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Daum
View author publications
You can also search for this author in PubMed Google Scholar
Shweta Deshpande
View author publications
You can also search for this author in PubMed Google Scholar
Aren Ewing
View author publications
You can also search for this author in PubMed Google Scholar
Paul P. Grabowski
View author publications
You can also search for this author in PubMed Google Scholar
Taslima Haque
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Jiming Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Dave Kudrna
View author publications
You can also search for this author in PubMed Google Scholar
Anna Lipzen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas H. Pendergast IV
View author publications
You can also search for this author in PubMed Google Scholar
Chris Plott
View author publications
You can also search for this author in PubMed Google Scholar
Peng Qi
View author publications
You can also search for this author in PubMed Google Scholar
Christopher A. Saski
View author publications
You can also search for this author in PubMed Google Scholar
Eugene V. Shakirov
View author publications
You can also search for this author in PubMed Google Scholar
David Sims
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Rita Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Ada Stewart
View author publications
You can also search for this author in PubMed Google Scholar
Vasanth R. Singan
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Thibivillier
View author publications
You can also search for this author in PubMed Google Scholar
Jenell Webber
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Weng
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Williams
View author publications
You can also search for this author in PubMed Google Scholar
Guohong Albert Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuko Yoshinaga
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Zane
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kathrine D. Behrman
View author publications
You can also search for this author in PubMed Google Scholar
Arvid R. Boe
View author publications
You can also search for this author in PubMed Google Scholar
Philip A. Fay
View author publications
You can also search for this author in PubMed Google Scholar
Felix B. Fritschi
View author publications
You can also search for this author in PubMed Google Scholar
Julie D. Jastrow
View author publications
You can also search for this author in PubMed Google Scholar
John Lloyd-Reilley
View author publications
You can also search for this author in PubMed Google Scholar
Juan Manuel Martínez-Reyna
View author publications
You can also search for this author in PubMed Google Scholar
Roser Matamala
View author publications
You can also search for this author in PubMed Google Scholar
Robert B. Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Francis M. Rouquette Jr
View author publications
You can also search for this author in PubMed Google Scholar
Pamela Ronald
View author publications
You can also search for this author in PubMed Google Scholar
Malay Saha
View author publications
You can also search for this author in PubMed Google Scholar
Christian M. Tobias
View author publications
You can also search for this author in PubMed Google Scholar
Michael Udvardi
View author publications
You can also search for this author in PubMed Google Scholar
Rod A. Wing
View author publications
You can also search for this author in PubMed Google Scholar
Yanqi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Laura E. Bartley
View author publications
You can also search for this author in PubMed Google Scholar
Michael Casler
View author publications
You can also search for this author in PubMed Google Scholar
Katrien M. Devos
View author publications
You can also search for this author in PubMed Google Scholar
David B. Lowry
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Rokhsar
View author publications
You can also search for this author in PubMed Google Scholar
Jane Grimwood
View author publications
You can also search for this author in PubMed Google Scholar
Thomas E. Juenger
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Schmutz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.D.B., P.R., M. Saha, L.E.B., M.C., K.M.D., D.B.L., D.S.R., J.G., T.E.J. and J.S. designed research. S.B., M.H., J. Jiang, T.H.P. IV, S.T., J.Z., J.M.M.-R., P.R., C.M.T., M.U., M.C., K.M.D. and D.B.L. contributed plant material and resources. J. Jenkins, C.P. and S.S. assembled and annotated the genome. J.B., A.R.B., P.A.F., F.B.F., J.D.J., D.B.L., J.L.-R., R.M., R.B.M., F.M.R. Jr, M. Saha, Y.W. and T.E.J. designed and executed field experiments. K.B., L.B., C.D., S.D., A.E., D.K., A.L., E.V.S., D.S., M. Sharma, R.S., A. Stewart, V.R.S., Y.T., J.W., X.W., M.W., Y.Y., M.Z. and R.A.W. conducted sequencing and data acquisition. J.T.L., A.H.M., S.M., J.D.N., A. Session, A. Sreedasyam, P.P.G., T.H., A.H., P.Q., C.A.S., G.A.W. and L.Z. conducted statistical and computational analyses. The manuscript was written by J.T.L., A.H.M., T.E.J. and J.S. with contributions from all authors.

Corresponding authors

Correspondence to John T. Lovell, Thomas E. Juenger or Jeremy Schmutz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Emily Heaton, Todd Michael, Ian Stavness and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Genome assembly and annotation.

a–c, Genome contiguity (a) and library coverage (b) demonstrate that the v5 release is a very complete genome and that it is among the best available plant reference genomes (c), compared to maize¹²⁸ durum wheat¹²⁹, broomcorn millet¹³⁰, teff¹³¹, poplar¹³², soybean⁷⁰, cotton⁴⁷, walnut¹³³ and strawberry⁴⁶. d, Complete collinearity between marker order in both crosses (number of markers = 4,701) of a 4-way mapping population is evident. e, Genome annotation statistics present a gene annotation that is as complete as the assembly. f, Observed heterozygosity ranges from <4 to >10% among our 732-library resequencing panel. g, Nearly the entire single-copy genome of P. hallii is syntenic with both switchgrass subgenomes; pale blue polygons represent syntenic blocks between subgenomes and P. hallii. The one exception is a previously known over-retained region representing the ρ duplication on Chr. 03 and 08⁶⁴.

Source data

Extended Data Fig. 2 Phenotypic and climatic gradients among common gardens and ecotypes.

a, The ten common gardens span much of the geographical distribution of, and elicit very different phenotypic responses among, our switchgrass diversity panel. For each garden, we present the georeferenced location and some basic quantitative genetic attributes of the plants grown there. b, To illustrate the climate context of winter mortality, we present a seven-day rolling mean of minimum daily temperature across the study period. Line colours match the colour key in a. c, To investigate the climatic attributes of each garden, we clustered 46 climatic variables from WorldClim (variables are named bio1–19²²) and ClimateNA²¹ using the georeferenced locations for the diversity panel; the identifiers (left) and description (right) accompany each row. These seven clusters, separated by breaks in the heat map, are represented by the seven climate variables that most closely correlated with the first principal component eigenvector of each cluster (labelled in bold). d, To investigate ecotype evolution, we probabilistically assigned each member of the diversity panel to one of three ecotypes (n_upland = 221, n_coastal = 157, n_lowland = 129) using a set of morphological (n = 16 at 2 gardens) and qualitative (n = 2) phenotypes; the linear discriminant functions that distinguish the ecotypes are presented here along with the eigenvectors of the two qualitative ecotype categorizations. Each point represents a single genotype grown in both TX₂ and MI gardens (n = 509). LDA, linear discriminant analysis. e, Qualitative ecotype assessments from experts are presented for the TX₂ garden in 2019. The y-axis scale is ordinal with five categories, but points are jittered so that the density of observations is more obvious. Points are coloured by neural network classification following d. f, Loadings for the other 16 variables (across 2 gardens) are plotted on the same scale and axes as d. To distinguish variables, we clustered each into one of four groups, representing variation in leaf (dark green) (3), whole plant (red) (1) and combinations of these. g, The table presents a legend for the labels in f, in which each variable was measured in both MI and TX₂ gardens. More detailed descriptions of the phenotypes can be found in Supplementary Data 5. h, For each of the seven climate variables, we corrected climate distance between the collection site and each common garden. The quadratic model fit (r²) for each variable and ecotype are presented.

Source data

Extended Data Fig. 3 Population and quantitative genetic divergence between and evolution within subpopulations and ecotypes.

a, Pairwise F-statistics between each subpopulation-by-ecotype combination and across all ecotypes for each subpopulation. b, Cross coalescence (RCCR) represents an alternative method to define divergence. Here, 16 bootstraps of RCCR profiles were converted to generation time at which divergence occurred. Statistics across the bootstraps are presented. c, Linkage-disequilibrium nonlinear function of physical distance and predicted correlation coefficients among markers for the entire sample. The linear model prediction for each 500-bp interval is plotted as black open points; 2-bp-interval mean r² values are the light grey points in the background. d–f, Population genetic structure is displayed as the principal coordinates from a scaled and centred distance matrix of structural variants (d), presence–absence variants (e) and SNPs (f), colour-coded by subpopulation assignments in Fig. 3. g, Positions and −log₁₀(P values) of the top 2,000 GWAS hits are presented for 2 gardens, the 3 subpopulations (coloured as in d–f) and an overall run (black points).

Source data

Extended Data Fig. 4 Subgenome biases across DNA, expression and quantitative traits.

a, Difference in biomass SNP–heritability (h²) estimates between subgenomes for each garden-by-subpopulation combination. Garden-by-subpopulation combinations with empty cells indicate that the model did not converge. b, Subgenome bias for all sets of genome analyses conducted here. Colours indicate the dataset used. c, Counts and ratios used to build b, with longer descriptions of the variables. d, Density distributions of nonsynonymous (K_a), synonymous (K_s) and fourfold-degenerate transversion substation rates (4DTv) for each subgenome relative to P. hallii. e, Summation of the number of genes in each colour bin of f. f, A heat map of expression in which K > N (blue) and N > K (red) is shown for each tissue in the genome-annotation RNA-seq dataset.

Source data

Extended Data Table 1 Heritability due to SNPs and background kinship

Full size table

Supplementary information

Supplementary Figure

Examples of gating strategy in estimates of ploidy from flow cytometry.

Reporting Summary

Supplemental Data 1

Mapping positions of two genetic maps. The genetic linkage group (LG) and centimorgan (cM) mapping positions are in the first two columns. The v5 genome physical chromosome and basepair (bp) positions are the third and fourth columns. The identity of the marker and genetic map is found in the fifth and sixth (final) column. The two maps are stacked, where the Noble map (553,377 markers) is above the UGA map (4,252 markers).

Supplemental Data 2

Syntenic orthology database for the switchgrass K and N subgenomes, and the P. hallii and S. bicolor outgroups. This is an annotated blast formatted file, where each pairwise combination of (sub)genomes is stacked. In addition to standard blast format columns, the following data are presented for both the query (1) and target (2): genome, gene ID (‘id’), chromosome (‘chr’), and gene ‘start’ (bp), ‘end’ (bp), ‘strand’ and gene ‘order’ within each genome. The last two columns are ‘orthology.network’, which defines the subgraph (orthogroup), and ‘hit.type’, which here, specifies the 418,768 orthologous and 58,430 paralogous hits.

Supplemental Data 3

Metadata for the 84 RNA-sequencing libraries. Here, we present accession numbers, experimental design, coverage and related statistics. The experiment, tissue and replicate are specified in the ‘Sample’ column. The ‘Library’ column matches the raw read counts in Supplementary Data 10. SRA identifiers can be found in the last two columns.

Supplemental Data 4

Metadata for the 732 DNA resequencing libraries. Here, we present accession numbers, georeferenced collection locations, subpopulation classifications, ecotype assignments, coverage and related statistics. SRA identifiers are presented in the final column.

Supplemental Data 5

Model specification, trait descriptions and variable importance for in silico ecotype classification. Genotypes used for training (first two columns), neural network layers (columns 5-6) and variable dictionary (4^th column) were extracted from the neural network model training. LDA and DAPC variable loadings accompany the neural network results.

Supplemental Data 6

Mash output statistics of GWAS peak positions and effects for 627,563, 523,501 and 541,889 GWAS hits within the ATLANTIC, GULF and MIDWEST subpopulations, respectively. The physical chromosome and position are found in the first two columns. Test classifications of subpopulation, are in the third, fourth and fifth columns and mash effect estimates (‘Bhat’), standard errors (‘Shat’) and significance (‘log10bf’) are presented in the last three columns.

Supplemental Data 7

Composite database of candidate genes for introgression intervals and GWAS peaks. All genes within 10kb of a GWAS peak or within an introgression interval are listed along with their genome coordinates (columns 1-5) along with the subpopulation in which the significant association was found, the statistical test, the start and end coordinates of the candidate interval and the physical distance between the candidate gene and the center of the interval (columns 6-10). Columns 11-13 contain subpopulation-specific statistics for the weighted SNPeff score, proportion of libraries with gene absences and the size and type of SVs within 100bp (if any) of the candidate gene. The gene coexpression module (from WGCNA) identity is printed in the 14^th column. Those genes with orthologs to candidate genes in a recent flowering time GWAS paper are flagged in the 15^th column.

Supplemental Data 8

Biomass and survival by genotype and common garden. The first column (‘plant_id’) corresponds to the ‘plant_ID’ column in Supplemental Data 4. The other column names follow garden ID _ phenotype. Garden IDs correspond to the first column in Extended Data Fig. 2a. phenotypes are either ‘BIOMASS’ (dry biomass, g) natural log of biomass “LOGBIOMASS” or survival “SRV” (0 = dead, 1 = alive). Missing values indicate cases where that plant was not grown in that particular site. Where multiple clonal replicates of a genotype were grown at a single site, the values represent a mean of all observed data. Hence, some survival values are > 0 or < 1.

Supplemental Data 9

Gene sequence ancestral state reconstructions. File format is gzip-compressed fasta, where each sequence header is the primary transcript ID for single-copy genes in the switchgrass genome.

Supplemental Data 10

RNA-sequencing raw counts. Libraries (columns) match those in the metadata file (Supplemental Data 3). Counts for the 80,278 genes are presented in each row.

Supplemental Data 11

Relative cross-coalescence data from MCMS runs. For each step (row) in the run, the calculated generations (4^th column) and relative cross coalescence (RCCR) are presented. These are transformed into differences between steps and the divergence time is flagged in the seventh column.

Supplemental Data 12

Spatial distribution models for each switchgrass ecotypes. These are saved as R objects compressed into a single archive.

Peer Review File

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lovell, J.T., MacQueen, A.H., Mamidi, S. et al. Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass. Nature 590, 438–444 (2021). https://doi.org/10.1038/s41586-020-03127-1

Download citation

Received: 01 July 2020
Accepted: 16 December 2020
Published: 27 January 2021
Issue Date: 18 February 2021
DOI: https://doi.org/10.1038/s41586-020-03127-1

This article is cited by

Genome-wide profiling of histone (H3) lysine 4 (K4) tri-methylation (me3) under drought, heat, and combined stresses in switchgrass
- Vasudevan Ayyappan
- Venkateswara R. Sripathi
- Venu Kal Kalavacharla
BMC Genomics (2024)
Plant pangenomes for crop improvement, biodiversity and evolution
- Mona Schreiber
- Murukarthick Jayakodi
- Martin Mascher
Nature Reviews Genetics (2024)
Enhancing sheepgrass through genomic exploration and targeted editing
- Miao Sun
- Sanwen Huang
- Yao Zhou
Science China Life Sciences (2024)
Transposon signatures of allopolyploid genome evolution
- Adam M. Session
- Daniel S. Rokhsar
Nature Communications (2023)
Legacies of precipitation influence primary production in Panicum virgatum
- Robert W. Heckman
- Austin Rueda
- Philip A. Fay
Oecologia (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Main

The tetraploid switchgrass genome

Climate adaptation drives biomass yield

Ecotype convergence among gene pools

Genetic targets for yield improvement

Evolutionary convergence via introgression

Reduced heritability of dominant subgenomes

Discussion

Methods

Plant collections, propagation, cultivation and phenotyping

Genome assembly and polishing

Gene annotation

Comparative genomics

Subgenome evolution and dating

Subfunctionalization and gene expression analyses

Ploidy assessment

Variant calling

Population genomics

Ecotype classification

Admixture and introgression block calculation and dating

Landscape genomics

GWAS

Candidate gene exploration

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links