Species often include multiple ecotypes that are adapted to different environments1. However, it is unclear how ecotypes arise and how their distinctive combinations of adaptive alleles are maintained despite hybridization with non-adapted populations2,3,4. Here, by resequencing 1,506 wild sunflowers from 3 species (Helianthus annuus, Helianthus petiolaris and Helianthus argophyllus), we identify 37 large (1–100 Mbp in size), non-recombining haplotype blocks that are associated with numerous ecologically relevant traits, as well as soil and climate characteristics. Limited recombination in these haplotype blocks keeps adaptive alleles together, and these regions differentiate sunflower ecotypes. For example, haplotype blocks control a 77-day difference in flowering between ecotypes of the silverleaf sunflower H. argophyllus (probably through deletion of a homologue of FLOWERING LOCUS T (FT)), and are associated with seed size, flowering time and soil fertility in dune-adapted sunflowers. These haplotypes are highly divergent, frequently associated with structural variants and often appear to represent introgressions from other—possibly now-extinct—congeners. These results highlight a pervasive role of structural variation in ecotypic adaptation.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Scientific Data Open Access 30 November 2022
Nature Ecology & Evolution Open Access 17 October 2022
Nature Communications Open Access 09 August 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All raw sequenced data are stored in the Sequence Read Archive (SRA) under BioProject accessions PRJNA532579, PRJNA398560 and PRJNA564337. SRA accession numbers for individual samples are listed in Supplementary Table 1 (tabs ‘Coverage and analyses’, ‘Outgroups’, ‘Samples from other studies’ and ‘HiC samples’). The HA412-HOv2 and PSC8 genome assemblies are available at https://sunflowergenome.org/ and https://heliagene.org/. Filtered SNP datasets are available at https://rieseberglab.github.io/ubc-sunflower-genome/. GWA results, as well as the corresponding SNP and trait data, are available at https://easygwas.ethz.ch/gwas/myhistory/public/20/, https://easygwas.ethz.ch/gwas/myhistory/public/21/, https://easygwas.ethz.ch/gwas/myhistory/public/22/, https://easygwas.ethz.ch/gwas/myhistory/public/23/. HaFT1, HaFT2 and HaFT6 sequences have been deposited in GenBank under accession numbers MN517758–MN517761. Source data for all figures are provided at https://github.com/owensgl/haploblocks/. Source data are provided with this paper.
All code associated with this project is available at https://github.com/owensgl/haploblocks/.
Clausen, J. Stages in the Evolution of Plant Species (Cornell Univ. Press, 1951).
Endler, J. A. Gene flow and population differentiation. Science 179, 243–250 (1973).
Felsenstein, J. Skepticism towards Santa Rosalia, or why are there so few kinds of animals? Evolution 35, 124–138 (1981).
Romanes, G. J. Physiological selection; an additional suggestion on the origin of species. Zool. J. Linn. Soc. 19, 337–411 (1886).
Whitney, K. D., Randell, R. A. & Rieseberg, L. H. Adaptive introgression of abiotic tolerance traits in the sunflower Helianthus annuus. New Phytol. 187, 230–239 (2010).
Ostevik, K. L., Andrew, R. L., Otto, S. P. & Rieseberg, L. H. Multiple reproductive barriers separate recently diverged sunflower ecotypes. Evolution 70, 2322–2335 (2016).
Moyers, B. T. The Landscape of Divergence in Silverleaf Sunflowers. PhD thesis, Univ. of British Columbia (2015).
Qiu, F. et al. Phylogenetic trends and environmental correlates of nuclear genome size variation in Helianthus sunflowers. New Phytol. 221, 1609–1618 (2019).
Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).
Shagina, I. et al. Normalization of genomic DNA using duplex-specific nuclease. Biotechniques 48, 455–459 (2010).
Staton, S. E. & Rieseberg, L. H. Sunflower Genome Database, https://www.sunflowergenome.org/ (2019).
INRA. INRA Sunflower Bioinformatics Resources, https://www.heliagene.org/ (2019).
Baute, G. J., Owens, G. L., Bock, D. G. & Rieseberg, L. H. Genome-wide genotyping-by-sequencing data provide a high-resolution view of wild Helianthus diversity, genetic structure, and interspecies gene flow. Am. J. Bot. 103, 2170–2177 (2016).
Stephens, J. D., Rogers, W. L., Mason, C. M., Donovan, L. A. & Malmberg, R. L. Species tree estimation of diploid Helianthus (Asteraceae) using target enrichment. Am. J. Bot. 102, 910–920 (2015).
Heiser, C. B. & Smith, D. M. The North American Sunflowers (Helianthus) (Seeman Printery, 1969).
Hübner, S. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5, 54–62 (2019).
Raduski, A. R., Rieseberg, L. H. & Strasburg, J. L. Effective population size, gene flow, and species status in a narrow endemic sunflower, Helianthus neglectus, compared to its widespread sister species, H. petiolaris. Int. J. Mol. Sci. 11, 492–506 (2010).
Strasburg, J. L. & Rieseberg, L. H. Molecular demographic history of the annual sunflowers Helianthus annuus and H. petiolaris—large effective population sizes and rates of long-term gene flow. Evolution 62, 1936–1950 (2008).
Blackman, B. K., Michaels, S. D. & Rieseberg, L. H. Connecting the sun to flowering in sunflower adaptation. Mol. Ecol. 20, 3503–3512 (2011).
Zan, Y. & Carlborg, Ö. A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population. Mol. Biol. Evol. 36, 141–154 (2019).
Kobayashi, Y., Kaya, H., Goto, K., Iwabuchi, M. & Araki, T. A pair of related genes with antagonistic roles in mediating flowering signals. Science 286, 1960–1962 (1999).
Werner, J. D. et al. Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. Proc. Natl Acad. Sci. USA 102, 2460–2465 (2005).
Cao, Y., Wen, L., Wang, Z. & Ma, L. SKIP interacts with the Paf1 complex to regulate flowering via the activation of FLC transcription in Arabidopsis. Mol. Plant 8, 1816–1819 (2015).
Wang, L. C. et al. Involvement of the Arabidopsis HIT1/AtVPS53 tethering protein homologue in the acclimation of the plasma membrane to heat stress. J. Exp. Bot. 62, 3609–3620 (2011).
Blackman, B. K. et al. Contributions of flowering time genes to sunflower domestication and improvement. Genetics 187, 271–287 (2011).
Brouillette, L. C. & Donovan, L. A. Nitrogen stress response of a hybrid species: a gene expression study. Ann. Bot. 107, 101–108 (2011).
Andrew, R. L. & Rieseberg, L. H. Divergence is focused on few genomic regions early in speciation: incipient speciation of sunflower ecotypes. Evolution 67, 2468–2482 (2013).
Ostevik, K. L. The Ecology and Genetics of Adaptation and Speciation in Dune Sunflowers. PhD thesis, Univ. of British Columbia (2016).
Li, H. & Ralph, P. Local PCA shows how the effect of population structure differs along the genome. Genetics 211, 289–304 (2019).
Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434 (2006).
Ortiz-Barrientos, D., Engelstädter, J. & Rieseberg, L. H. Recombination rate evolution and the origin of species. Trends Ecol. Evol. 31, 226–236 (2016).
Trickett, A. J. & Butlin, R. K. Recombination suppressors and the evolution of new species. Heredity 73, 339–345 (1994).
Arostegui, M. C., Quinn, T. P., Seeb, L. W., Seeb, J. E. & McKinney, G. J. Retention of a chromosomal inversion from an anadromous ancestor provides the genetic basis for alternative freshwater ecotypes in rainbow trout. Mol. Ecol. 28, 1412–1427 (2019).
Joron, M. et al. Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477, 203–206 (2011).
Lowry, D. B. & Willis, J. H. A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation. PLoS Biol. 8, e1000500 (2010).
Fustier, M. A. et al. Common gardens in teosintes reveal the establishment of a syndrome of adaptation to altitude. PLoS Genet. 15, e1008512 (2019).
Wellenreuther, M., Rosenquist, H., Jaksons, P. & Larson, K. W. Local adaptation along an environmental cline in a species with an inversion polymorphism. J. Evol. Biol. 30, 1068–1077 (2017).
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Mason, C. M. How old are sunflowers? A molecular clock analysis of key divergences in the origin and diversification of Helianthus (Asteraceae). Int. J. Plant Sci. 179, 182–191 (2018).
Wellenreuther, M. & Bernatchez, L. Eco-evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 33, 427–440 (2018).
Jay, P. et al. Supergene evolution triggered by the introgression of a chromosomal inversion. Curr. Biol. 28, 1839–1845. (2018).
Lotterhos, K. E. The effect of neutral recombination variation on genome scans for selection. G3 9, 1851–1867 (2019).
Heiser, C. B., Jr. Hybridization in the annual sunflowers: Helianthus annuus × H. debilis var. cucumerifolius. Evolution 5, 42–51 (1951).
Hooper, D. M. & Price, T. D. Chromosomal inversion differences correlate with range overlap in passerine birds. Nat. Ecol. Evol. 1, 1526–1534 (2017).
Heiser, C. B. Three new annual sunflowers (Helianthus) from the southwestern United States. Rhodora 60, 272–283 (1958).
Andrew, R. L., Kane, N. C., Baute, G. J., Grassa, C. J. & Rieseberg, L. H. Recent nonhybrid origin of sunflower ecotypes in a novel habitat. Mol. Ecol. 22, 799–813 (2013).
Kirkpatrick, M. Reinforcement and divergence under assortative mating. Proc. R. Soc. Lond. B 267, 1649–1655 (2000).
Feder, J. L., Gejji, R., Powell, T. H. & Nosil, P. Adaptive chromosomal divergence driven by mixed geographic mode of evolution. Evolution 65, 2157–2170 (2011).
Yeaman, S. & Whitlock, M. C. The genetic architecture of adaptation under migration–selection balance. Evolution 65, 1897–1911 (2011).
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Rodríguez, G. R. et al. Tomato Analyzer: a useful software application to collect accurate and detailed morphological and colorimetric data from two-dimensional objects. J. Vis. Exp 37, e1856 (2010).
Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325 (1980).
Zeng, J., Zou, Y., Bai, J. & Zheng, H. Preparation of total DNA from recalcitrant plant taxa. Acta Bot. Sin. 44, 694–697 (2002).
Rowan, B. A., Patel, V., Weigel, D. & Schneeberger, K. Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping. G3 5, 385–398 (2015).
Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).
Matvienko, M. et al. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS ONE 8, e55913 (2013).
Lee-Yaw, J. A., Grassa, C. J., Joly, S., Andrew, R. L. & Rieseberg, L. H. An evaluation of alternative explanations for widespread cytonuclear discordance in annual sunflowers (Helianthus). New Phytol. 221, 515–526 (2019).
Owens, G. L., Baute, G. J., Hubner, S. & Rieseberg, L. H. Genomic sequence and copy number evolution during hybrid crop development in sunflowers. Evol. Appl. 12, 54–65 (2019).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Sedlazeck, F. J., Rescheneder, P. & von Haeseler, A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29, 2790–2791 (2013).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Broad Institute. Picard tools, http://broadinstitute.github.io/picard/ (Broad Institute, 2019).
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at https://www.biorxiv.org/content/10.1101/201178v3 (2017).
Datta, K., Gururaj, K., Naik, M., Narvaez, P. & Rutar, M. GenomicsDB: storing genome data as sparse columnar arrays. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/genomics-storing-genome-data-paper.pdf (2017).
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
Grimm, D. G. et al. easyGWAS: a cloud-based platform for comparing the results of genome-wide association studies. Plant Cell 29, 5–19 (2017).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Wang, T., Hamann, A., Spittlehouse, D. & Carroll, C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE 11, e0156720 (2016).
Gautier, M. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201, 1555–1579 (2015).
Jeffreys, H. Theory of Probability (Clarendon, 1961).
Hellens, R. P., Edwards, E. A., Leyland, N. R., Bean, S. & Mullineaux, P. M. pGreen: a versatile and flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Mol. Biol. 42, 819–832 (2000).
Weigel, D. & Glazebrook, J. Arabidopsis: A Laboratory Manual (CSHL, 2002).
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Hartigan, J. A. & Wong, M. A. Algorithm AS 136: a K-means clustering algorithm. J. R. Stat. Soc. C. Appl. Stat. 28, 100–108 (1979).
R Core Team. R: A language and environment for statistical computing, https://www.R-project.org/ (2019).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20, 277 (2019).
Ostevik, K. L., Samuk, K. & Rieseberg, L. H. Ancestral reconstruction of karyotypes reveals an exceptional rate of non-random chromosomal evolution in sunflower. Genetics 214, 1031–1045 (2020).
Huang, K., Andrew, R. L., Owens, G. L., Ostevik, K. L. & Rieseberg, L. H. Multiple chromosomal inversions contribute to adaptive divergence of a dune sunflower ecotype. Mol. Ecol., https://doi.org/10.1111/mec.15428 (2020).
Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Hu, X. & Friedberg, I. SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier. Gigascience 8, giz118 (2019).
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Sambatti, J. B., Strasburg, J. L., Ortiz-Barrientos, D., Baack, E. J. & Rieseberg, L. H. Reconciling extremely strong barriers with high levels of gene exchange in annual sunflowers. Evolution 66, 1459–1473 (2012).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Rambaut, A. FigTree, http://tree.bio.ed.ac.uk/software/figtree/ (2009).
We thank J. Gouzy and N. B. Langlade for providing access to the HA412-HOv2 annotation and PSC8 genome assembly; B. T. Moyers for discussion and providing the H. argophyllus picture; J. Lee-Yaw and A. J. Moreno-Geraldes for comments; D. Skonieczny, A. Kim, A. Parra and C. Konecny for assistance with fieldwork and data acquisition; A. Warfield for computing advice; J. D. Herndon for providing the dune H. petiolaris picture; D. G. Grimm for assistance with easyGWAS; UBC’s Data Science Institute for support to J.S.L.; and Compute Canada for computing resources. Maps were realized using tiles from Stamen Design (https://stamen.com), under CC BY 3.0, from data by OpenStreetMaps contributors (https://openstreetmap.org), under ODbL. Funding was provided by Genome Canada and Genome BC (LSARP2014-223SUN), the NSF Plant Genome Program (IOS-1444522), the International Consortium for Sunflower Genomic Resources, Sofiproteol, an HFSP long-term postdoctoral fellowship to M.T. (LT000780/2013) and a Banting postdoctoral fellowship to G.L.O.
The authors declare no competing interests.
Peer review information Nature thanks Jeffrey Ross-Ibarra, Jeremy Schmutz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Comparison between the original order of SNPs in chromosome 2 on the XRQv1 assembly9 (against which sequencing reads were originally mapped) and after SNP re-mapping to the HA412-HOv2 assembly11. Data are summarized in 5-kbp ranges. Error bars represent 2 standard errors. The higher R2 at longer distances is due to better scaffolding of contigs in HA412-HOv2. Number of SNPs: n = 261,020 (XRQ); n 237,674 (HA412-HO). b, GWA for flowering time in H. argophyllus based on the XRQv1 assembly identified more than 40 highly significant associations. c, Remapping of the SNPs to the new HA412-HOv2 sunflower assembly considerably reduced the number of associations in the flowering time GWA, with the vast majority of the signal mapping to the arg06.01 haploblock region (Fig. 2). In b, c, the purple lines represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. Associations were calculated using two-sided mixed models. n = 277 individuals. d, Genotype call accuracy. Variants for 12 individuals for each species from our SNP dataset were compared to Sanger sequencing data. Six regions were compared. Number of sites: n = 136 (H. annuus); n = 139 (H. argophyllus); n = 262 (H. petiolaris). Number of genotype calls: n = 1,385 (H. annuus), n = 1,254 (H. argophyllus), n = 2,351 (H. petiolaris). Overall genotype accuracy: H. annuus = 95.9%; H. argophyllus = 96.8%; H. petiolaris = 97.9% (Supplementary Table 1). Vertical purple lines represent the average observed coverage across genic regions for individuals in the corresponding dataset. Error bars, binomial confidence interval (Wilson score method). e, Genome-wide principal component analysis for each dataset. Sites were pruned for linkage (r < 0.2 within 500 kb). Number of individuals: n = 730 (H. annuus); n = 299 (H. argophyllus); n = 168 (H. petiolaris petiolaris); n = 259 (H. petiolaris fallax).
a, Flowering time for the three wild sunflower species measured in a common garden experiment. Number of individuals: n = 612 (H. annuus); n = 161 (H. petiolaris petiolaris); n = 211 (H. petiolaris fallax); n = 48 (H. niveus canescens); n = 261 (H. argophyllus 0/0); n = 25 (H. argophyllus 0/1); n = 23 (H. argophyllus 1/1). b, Leaf nitrogen content and carbon/nitrogen ratio GWAs in H. argophyllus (two-sided mixed model associations; n = 289 individuals). The purple lines represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. c, Genotype presence or absence for the 130–135-Mbp region of chromosome 6 in H. argophyllus. The x-axis represents consecutive SNP positions; distances on this axis are therefore not proportional to physical distances on the chromosome. Purple bars highlight the positions of the five HaFT genes in the region (HaFT5 and HaFT6 are only a few hundred bp apart). Flowering time data are the same as used in GWA analyses. d, HaFT1 and HaFT2 expression levels in mature leaves or shoot apices of >6-month-old, flowering H. argophyllus plants, grown in a greenhouse in long days conditions (14 h light:10 h dark). This experiment was performed on two independent pairs of individuals, with similar results. e, Six-week-old A. thaliana plants grown in long day conditions at 23 °C. At least 19 independent transgenic events were analysed for each construct in each genetic background, and flowering time was consistent within each group. Scale bar, 1 cm. f, Flowering time in long and short days (10 h light:14 h dark). HaFT2 alleles from early- and late-flowering H. argophyllus plants complement the ft-10 mutant, similar to HaFT1 from the early-flowering ecotype. HaFT6 is expressed at low levels in H. argophyllus plants (not shown), and appears to be a hypo-functional FT homologue. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Differences in flowering time between untransformed controls, HaFT6 lines and all the other transgenic lines are significant in all conditions (P < 10−6 for all relevant comparisons; one-way ANOVA with post hoc Tukey HSD test, df = 4; exact P values are reported in the Source Data). Number of individuals or independent transformation events for the long days dataset in Col-0 background; n = 28 (Col-0); n = 32 (HaFT1); n = 30 (HaFT2early); n = 34 (HaFT2late); n = 45 (HaFT6). For the long days dataset in ft-10 background: n = 25 (ft-10); n = 30 (HaFT1); n = 38 (HaFT2early); n = 45 (HaFT2late); n = 18 (HaFT6). For the short days dataset; n = 10 (Col-0); n = 24 (HaFT1); n = 17 (HaFT2early); n = 31 (HaFT2late); n = 31 (HaFT6). g, PCR detection of transgene expression in leaves of plants grown for four weeks in long days. The reduced ability of HaFT6 to induce flowering is not due to inefficient expression of the transgene. Results for four independent primary transformants for each transgenic line and for wild-type Col-0 plants are shown. For gel source data, see Supplementary Fig. 1.
Extended Data Fig. 3 Several haploblocks differentiate dune and non-dune populations of H. petiolaris.
a, Correlation between seed size and flowering time. Although dune-adapted H. petiolaris fallax flowers later and has larger seeds than non-dune-adapted populations, these two traits generally show no correlation, or a weak negative correlation, in H. annuus and H. petiolaris. Purple lines represent linear regressions, shaded grey area are 95% confidence intervals. H. annuus: n = 426 individuals, one-sided F1,423 = 1.831, P = 0.18; H. petiolaris: n = 307 individuals, one-sided F1,305 = 9.841, P = 0.0019. b, Seed length GWA in H. petiolaris fallax (two-sided mixed model associations; n = 165 individuals). No significant association with haploblocks is found in GWA analyses for seed width (not shown). c, FST values in 2-Mbp non-overlapping sliding windows for comparisons between dune- and non-dune-adapted populations of H. petiolaris fallax in Colorado. Purple bars represent predicted haploblocks. d, Flowering time (approximated as total leaf number (TLN) on the primary stem) GWA for H. petiolaris petiolaris (two-sided mixed model associations; n = 160 individuals). The purple lines in b, d represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. e, Distribution of FST values for SNPs and haploblocks in comparisons between dune- and non-dune-adapted populations of H. petiolaris fallax in Texas and Colorado16. Percentiles are reported for the most highly divergent haploblocks. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Number of individuals: n = 28 (Colorado); n = 54 (Texas). Number of SNPs: n = 1,196,399 (Colorado); n = 1,169,273 (Texas). f, Maximum-likelihood trees for two of the haploblocks segregating within H. petiolaris. Dune populations of H. petiolaris fallax are highlighted in light (Colorado) and dark tan (Texas). For pet09.01 and pet11.01, although both dune populations have converged on the same haplotype, the Texas haplotype is the ancestral H. petiolaris fallax copy, whereas in Colorado the haplotype is derived from introgression with H. petiolaris petiolaris, suggesting convergent adaptation. Bootstrap values for major nodes are reported (asterisks = 100).
For each predicted haploblock, the local PCA MDS plot for the relevant chromosome, a PCA of the selected region, observed heterozygosity for each haploblock genotype and LD patterns for the relevant chromosome are shown. In the local PCA MDS plots, each dot represents a 100-SNP window, and windows within the haploblock region are highlighted. The x-axis values represent Mbp. For H. petiolaris, haploblocks were identified in the full species or subspecies datasets; the local PCA and LD plots are from the dataset in which the haploblock was identified, and PCA and heterozygosity plots use the full dataset. In PCA plots, samples are coloured by inferred haploblock genotype. For LD plots, upper triangle = all individuals; lower triangle = only individuals homozygous for the more common haploblock allele. Colours represent the second highest R2 value in 0.5-Mbp windows. For most haploblock regions, high LD is driven by differences between haplotypes, so high LD is removed when only one haplotype is present. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Sample size for all haploblock analyses is provided in the Source Data, available at https://github.com/owensgl/haploblocks/.
Map showing collection locations of the three sunflower species and the frequency of haploblock genotypes at each collection site.
Extended Data Fig. 6 Comparisons between reference assemblies and genetic maps confirm structural rearrangements associated with haploblocks.
a, Alignment of chromosome 1 for the H. annuus genome assemblies PSC8 and HA412-HOv2. The ann01.01 region (at about 8 Mpb; inset), for which the two cultivars have different haplotypes, shows inverted alignment. b, Three H. annuus genetic maps (constructed using F2 populations between wild individuals and the HA412-HO cultivar). c, Four genetic maps (constructed using F1 populations. From top to bottom: H. petiolaris petiolaris, H. petiolaris fallax87, newly constructed dune H. petiolaris fallax and newly constructed non-dune H. petiolaris fallax88) are plotted relative to the HA412-HOv2 reference assembly. To the right of each dot plot, markers are plotted in the order in which they appear in each genetic map. Haploblock regions and the markers that fall within them are highlighted in purple. Circled haploblock regions show evidence of different orientations across the multiple maps (dotted lines), of suppressed recombination (dashed lines) or are contiguous in H. petiolaris maps despite being split over multiple windows in the HA412-HOv2 reference assembly (solid lines). Parental haploblock genotypes are known for the H. annuus maps and for the bottom two H. petiolaris maps. Ann05.01 and ann11.01 were segregating within in the H. annuus mapping populations. Genotypes at pet05.01 and pet11.01 differed between the H. petiolaris fallax parents of newly constructed dune and non-dune populations, whereas both parents were heterozygous for the pet09.01 haploblock. In all these cases, patterns of segregation are consistent with the parental haploblock genotypes. For the remaining H. petiolaris maps, the parental haploblock genotypes are not known. Because an absence of evidence is uninformative in these cases, only haploblock regions with evidence for inversions or contiguous windows from these two maps are plotted.
a, Differences in HiC interactions between pairs of early- and late-flowering H. argophyllus or dune and non-dune H. petiolaris samples. Purple bars and solid black lines represent approximate haploblock boundaries. Pieces of a single haploblock that map to different regions of the HA412-HOv2 reference are highlighted by dotted lines. Top row, comparisons between H. annuus and H. argophyllus or H. petiolaris, for H. annuus haploblock regions. Because the relative haploblock genotypes between sunflower species are not known, only cases in which evidence of structural variants were observed are reported. Following rows, regions for which the pairs of H. argophyllus or H. petiolaris samples differed at haploblock alleles. Red or blue dots show increased or decreased, respectively, long-distance interactions in one sample, consistent with differences in genome structure. Relevant differences in long-distance interactions are highlighted by black arrows; for each of these, the percentage rank compared to all other possible interactions at the same distance across the genome is reported. No evidence of large-scale structural variation was observed for arg06.01 and pet10.01. An excess of interactions in the early-flowering allele for the approximately 130–140-Mbp region of chromosome 6 is consistent with the presence of deletions in the late-flowering alleles (Extended Data Fig. 2c), as well as with improved mappability of reads from the early-flowering allele, which—being an introgression from wild H. annuus—is closer in sequence to the HA412-HO reference. Differences in HiC interactions were capped between −0.3 and 0.3 for plotting purposes. b, Inversion scenarios with comparisons of simulated HiC interaction matrixes consistent with empirical patterns. There are H. annuus-specific inversions in the reference genome, as well as inversions between haploblocks.
Heat map of GWAs for individual phenotypic traits, treating haploblocks as individual loci. Haploblocks were filtered to retain only regions with minor allele frequency ≥ 3%. PCA and kinship matrices used as covariates were calculated without variants inside haploblock regions. GWAs were calculated using two-sided mixed models. Number of individuals: n = 614 (H. annuus); n = 294 (H. argophyllus); n = 209 (H. petiolaris fallax); n = 163 (H. petiolaris petiolaris).
a, Heat map of GEAs for individual environmental variables, treating haploblocks as individual loci. Haploblocks were filtered to retain only regions with minor allele frequency ≥ 3%. The population correlation matrix was calculated without variants inside haploblock regions. GEAs were calculated using two-sided XtX statistics. Number of populations: n = 71 (H. annuus); n = 30 (H. argophyllus); n = 23 (H. petiolaris fallax); n = 17 (H. petiolaris petiolaris). b, The proportion of haploblock and SNP loci significantly associated with one or more environmental variable (dB ≥ 10) or phenotypic trait (P ≤ 0.001). *P < 0.05, **P < 0.0005 (two-sided proportion test; exact P values and number of individuals are reported in Source Data).
Extended Data Fig. 10 A 100-Mbp haploblock is associated with early flowering in the texanus ecotype of H. annuus.
a, GWA for flowering in H. annuus (two-sided mixed model associations; n = 612 individuals), using a kinship matrix and PCA covariate including (black dots) or excluding (yellow dots) the haploblock regions. Haploblock regions are highlighted in purple. The purple line represents 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. b, Flowering time for individuals with different genotypes at ann13.01. Number of individuals: n = 244 (0/0); n = 168 (0/1); n = 200 (1/1). c, Distribution of ann13.01 haplotypes. d, Distribution of FST values for individual SNPs and haploblocks in comparisons between the texanus ecotype of H. annuus and other H. annuus populations. Percentiles are reported for the most highly divergent haploblocks. In b, d, box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges.
This file contains gels source data.
| Population information and environmental variables Phenotypic data, coverage and SRA IDs for individual wild sunflower samples.
| Candidate genes for GWA and GEA analyses.
| Primers, markers and adapters used in this study.
About this article
Cite this article
Todesco, M., Owens, G.L., Bercovich, N. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020). https://doi.org/10.1038/s41586-020-2467-6
This article is cited by
Science China Life Sciences (2023)
Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference
Nature Communications (2022)
Nature Communications (2022)
Nature Ecology & Evolution (2022)
Nature Communications (2022)