Massive haplotypes underlie ecotypic differentiation in sunflowers


Species often include multiple ecotypes that are adapted to different environments1. However, it is unclear how ecotypes arise and how their distinctive combinations of adaptive alleles are maintained despite hybridization with non-adapted populations2,3,4. Here, by resequencing 1,506 wild sunflowers from 3 species (Helianthus annuus, Helianthus petiolaris and Helianthus argophyllus), we identify 37 large (1–100 Mbp in size), non-recombining haplotype blocks that are associated with numerous ecologically relevant traits, as well as soil and climate characteristics. Limited recombination in these haplotype blocks keeps adaptive alleles together, and these regions differentiate sunflower ecotypes. For example, haplotype blocks control a 77-day difference in flowering between ecotypes of the silverleaf sunflower H. argophyllus (probably through deletion of a homologue of FLOWERING LOCUS T (FT)), and are associated with seed size, flowering time and soil fertility in dune-adapted sunflowers. These haplotypes are highly divergent, frequently associated with structural variants and often appear to represent introgressions from other—possibly now-extinct—congeners. These results highlight a pervasive role of structural variation in ecotypic adaptation.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Population structure and association analyses of wild sunflowers.
Fig. 2: A large introgression from H. annuus containing a functional HaFT1 gene causes early flowering in coastal H. argophyllus.
Fig. 3: Large non-recombining haplotypes control dune adaptation in H. p. fallax.
Fig. 4: Large haploblocks are pervasive in wild sunflowers and are associated with structural variants.
Fig. 5: Haploblocks are highly divergent and are associated with multiple ecologically relevant traits and environmental variables.

Data availability

All raw sequenced data are stored in the Sequence Read Archive (SRA) under BioProject accessions PRJNA532579, PRJNA398560 and PRJNA564337. SRA accession numbers for individual samples are listed in Supplementary Table 1 (tabs ‘Coverage and analyses’, ‘Outgroups’, ‘Samples from other studies’ and ‘HiC samples’). The HA412-HOv2 and PSC8 genome assemblies are available at and Filtered SNP datasets are available at GWA results, as well as the corresponding SNP and trait data, are available at,,, HaFT1, HaFT2 and HaFT6 sequences have been deposited in GenBank under accession numbers MN517758MN517761. Source data for all figures are provided at data are provided with this paper.

Code availability

All code associated with this project is available at


  1. 1.

    Clausen, J. Stages in the Evolution of Plant Species (Cornell Univ. Press, 1951).

  2. 2.

    Endler, J. A. Gene flow and population differentiation. Science 179, 243–250 (1973).

    ADS  PubMed  CAS  Google Scholar 

  3. 3.

    Felsenstein, J. Skepticism towards Santa Rosalia, or why are there so few kinds of animals? Evolution 35, 124–138 (1981).

    PubMed  Google Scholar 

  4. 4.

    Romanes, G. J. Physiological selection; an additional suggestion on the origin of species. Zool. J. Linn. Soc. 19, 337–411 (1886).

    Google Scholar 

  5. 5.

    Whitney, K. D., Randell, R. A. & Rieseberg, L. H. Adaptive introgression of abiotic tolerance traits in the sunflower Helianthus annuus. New Phytol. 187, 230–239 (2010).

    PubMed  Google Scholar 

  6. 6.

    Ostevik, K. L., Andrew, R. L., Otto, S. P. & Rieseberg, L. H. Multiple reproductive barriers separate recently diverged sunflower ecotypes. Evolution 70, 2322–2335 (2016).

    PubMed  Google Scholar 

  7. 7.

    Moyers, B. T. The Landscape of Divergence in Silverleaf Sunflowers. PhD thesis, Univ. of British Columbia (2015).

  8. 8.

    Qiu, F. et al. Phylogenetic trends and environmental correlates of nuclear genome size variation in Helianthus sunflowers. New Phytol. 221, 1609–1618 (2019).

    PubMed  CAS  Google Scholar 

  9. 9.

    Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).

    ADS  CAS  Google Scholar 

  10. 10.

    Shagina, I. et al. Normalization of genomic DNA using duplex-specific nuclease. Biotechniques 48, 455–459 (2010).

    PubMed  CAS  Google Scholar 

  11. 11.

    Staton, S. E. & Rieseberg, L. H. Sunflower Genome Database, (2019).

  12. 12.

    INRA. INRA Sunflower Bioinformatics Resources, (2019).

  13. 13.

    Baute, G. J., Owens, G. L., Bock, D. G. & Rieseberg, L. H. Genome-wide genotyping-by-sequencing data provide a high-resolution view of wild Helianthus diversity, genetic structure, and interspecies gene flow. Am. J. Bot. 103, 2170–2177 (2016).

    PubMed  Google Scholar 

  14. 14.

    Stephens, J. D., Rogers, W. L., Mason, C. M., Donovan, L. A. & Malmberg, R. L. Species tree estimation of diploid Helianthus (Asteraceae) using target enrichment. Am. J. Bot. 102, 910–920 (2015).

    PubMed  Google Scholar 

  15. 15.

    Heiser, C. B. & Smith, D. M. The North American Sunflowers (Helianthus) (Seeman Printery, 1969).

  16. 16.

    Hübner, S. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5, 54–62 (2019).

    PubMed  Google Scholar 

  17. 17.

    Raduski, A. R., Rieseberg, L. H. & Strasburg, J. L. Effective population size, gene flow, and species status in a narrow endemic sunflower, Helianthus neglectus, compared to its widespread sister species, H. petiolaris. Int. J. Mol. Sci. 11, 492–506 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

  18. 18.

    Strasburg, J. L. & Rieseberg, L. H. Molecular demographic history of the annual sunflowers Helianthus annuus and H. petiolaris—large effective population sizes and rates of long-term gene flow. Evolution 62, 1936–1950 (2008).

    PubMed  PubMed Central  Google Scholar 

  19. 19.

    Blackman, B. K., Michaels, S. D. & Rieseberg, L. H. Connecting the sun to flowering in sunflower adaptation. Mol. Ecol. 20, 3503–3512 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  20. 20.

    Zan, Y. & Carlborg, Ö. A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population. Mol. Biol. Evol. 36, 141–154 (2019).

    PubMed  CAS  Google Scholar 

  21. 21.

    Kobayashi, Y., Kaya, H., Goto, K., Iwabuchi, M. & Araki, T. A pair of related genes with antagonistic roles in mediating flowering signals. Science 286, 1960–1962 (1999).

    PubMed  CAS  Google Scholar 

  22. 22.

    Werner, J. D. et al. Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. Proc. Natl Acad. Sci. USA 102, 2460–2465 (2005).

    ADS  PubMed  CAS  Google Scholar 

  23. 23.

    Cao, Y., Wen, L., Wang, Z. & Ma, L. SKIP interacts with the Paf1 complex to regulate flowering via the activation of FLC transcription in Arabidopsis. Mol. Plant 8, 1816–1819 (2015).

    PubMed  CAS  Google Scholar 

  24. 24.

    Wang, L. C. et al. Involvement of the Arabidopsis HIT1/AtVPS53 tethering protein homologue in the acclimation of the plasma membrane to heat stress. J. Exp. Bot. 62, 3609–3620 (2011).

    PubMed  CAS  Google Scholar 

  25. 25.

    Blackman, B. K. et al. Contributions of flowering time genes to sunflower domestication and improvement. Genetics 187, 271–287 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  26. 26.

    Brouillette, L. C. & Donovan, L. A. Nitrogen stress response of a hybrid species: a gene expression study. Ann. Bot. 107, 101–108 (2011).

    PubMed  CAS  Google Scholar 

  27. 27.

    Andrew, R. L. & Rieseberg, L. H. Divergence is focused on few genomic regions early in speciation: incipient speciation of sunflower ecotypes. Evolution 67, 2468–2482 (2013).

    PubMed  Google Scholar 

  28. 28.

    Ostevik, K. L. The Ecology and Genetics of Adaptation and Speciation in Dune Sunflowers. PhD thesis, Univ. of British Columbia (2016).

  29. 29.

    Li, H. & Ralph, P. Local PCA shows how the effect of population structure differs along the genome. Genetics 211, 289–304 (2019).

    PubMed  CAS  Google Scholar 

  30. 30.

    Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434 (2006).

    PubMed  PubMed Central  CAS  Google Scholar 

  31. 31.

    Ortiz-Barrientos, D., Engelstädter, J. & Rieseberg, L. H. Recombination rate evolution and the origin of species. Trends Ecol. Evol. 31, 226–236 (2016).

    PubMed  Google Scholar 

  32. 32.

    Trickett, A. J. & Butlin, R. K. Recombination suppressors and the evolution of new species. Heredity 73, 339–345 (1994).

    PubMed  Google Scholar 

  33. 33.

    Arostegui, M. C., Quinn, T. P., Seeb, L. W., Seeb, J. E. & McKinney, G. J. Retention of a chromosomal inversion from an anadromous ancestor provides the genetic basis for alternative freshwater ecotypes in rainbow trout. Mol. Ecol. 28, 1412–1427 (2019).

    PubMed  CAS  Google Scholar 

  34. 34.

    Joron, M. et al. Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477, 203–206 (2011).

    ADS  PubMed  PubMed Central  CAS  Google Scholar 

  35. 35.

    Lowry, D. B. & Willis, J. H. A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation. PLoS Biol. 8, e1000500 (2010).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Fustier, M. A. et al. Common gardens in teosintes reveal the establishment of a syndrome of adaptation to altitude. PLoS Genet. 15, e1008512 (2019).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Wellenreuther, M., Rosenquist, H., Jaksons, P. & Larson, K. W. Local adaptation along an environmental cline in a species with an inversion polymorphism. J. Evol. Biol. 30, 1068–1077 (2017).

    PubMed  CAS  Google Scholar 

  38. 38.

    Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).

    PubMed  CAS  Google Scholar 

  39. 39.

    Mason, C. M. How old are sunflowers? A molecular clock analysis of key divergences in the origin and diversification of Helianthus (Asteraceae). Int. J. Plant Sci. 179, 182–191 (2018).

    Google Scholar 

  40. 40.

    Wellenreuther, M. & Bernatchez, L. Eco-evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 33, 427–440 (2018).

    PubMed  Google Scholar 

  41. 41.

    Jay, P. et al. Supergene evolution triggered by the introgression of a chromosomal inversion. Curr. Biol. 28, 1839–1845. (2018).

    PubMed  CAS  Google Scholar 

  42. 42.

    Lotterhos, K. E. The effect of neutral recombination variation on genome scans for selection. G3 9, 1851–1867 (2019).

    PubMed  CAS  Google Scholar 

  43. 43.

    Heiser, C. B., Jr. Hybridization in the annual sunflowers: Helianthus annuus × H. debilis var. cucumerifolius. Evolution 5, 42–51 (1951).

    Google Scholar 

  44. 44.

    Hooper, D. M. & Price, T. D. Chromosomal inversion differences correlate with range overlap in passerine birds. Nat. Ecol. Evol. 1, 1526–1534 (2017).

    PubMed  Google Scholar 

  45. 45.

    Heiser, C. B. Three new annual sunflowers (Helianthus) from the southwestern United States. Rhodora 60, 272–283 (1958).

    Google Scholar 

  46. 46.

    Andrew, R. L., Kane, N. C., Baute, G. J., Grassa, C. J. & Rieseberg, L. H. Recent nonhybrid origin of sunflower ecotypes in a novel habitat. Mol. Ecol. 22, 799–813 (2013).

    PubMed  CAS  Google Scholar 

  47. 47.

    Kirkpatrick, M. Reinforcement and divergence under assortative mating. Proc. R. Soc. Lond. B 267, 1649–1655 (2000).

    CAS  Google Scholar 

  48. 48.

    Feder, J. L., Gejji, R., Powell, T. H. & Nosil, P. Adaptive chromosomal divergence driven by mixed geographic mode of evolution. Evolution 65, 2157–2170 (2011).

    PubMed  Google Scholar 

  49. 49.

    Yeaman, S. & Whitlock, M. C. The genetic architecture of adaptation under migration–selection balance. Evolution 65, 1897–1911 (2011).

    PubMed  Google Scholar 

  50. 50.

    Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  51. 51.

    Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  52. 52.

    Rodríguez, G. R. et al. Tomato Analyzer: a useful software application to collect accurate and detailed morphological and colorimetric data from two-dimensional objects. J. Vis. Exp 37, e1856 (2010).

  53. 53.

    Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325 (1980).

    PubMed  PubMed Central  CAS  Google Scholar 

  54. 54.

    Zeng, J., Zou, Y., Bai, J. & Zheng, H. Preparation of total DNA from recalcitrant plant taxa. Acta Bot. Sin. 44, 694–697 (2002).

    CAS  Google Scholar 

  55. 55.

    Rowan, B. A., Patel, V., Weigel, D. & Schneeberger, K. Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping. G3 5, 385–398 (2015).

    PubMed  CAS  Google Scholar 

  56. 56.

    Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  57. 57.

    Matvienko, M. et al. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS ONE 8, e55913 (2013).

    ADS  PubMed  PubMed Central  CAS  Google Scholar 

  58. 58.

    Lee-Yaw, J. A., Grassa, C. J., Joly, S., Andrew, R. L. & Rieseberg, L. H. An evaluation of alternative explanations for widespread cytonuclear discordance in annual sunflowers (Helianthus). New Phytol. 221, 515–526 (2019).

    PubMed  CAS  Google Scholar 

  59. 59.

    Owens, G. L., Baute, G. J., Hubner, S. & Rieseberg, L. H. Genomic sequence and copy number evolution during hybrid crop development in sunflowers. Evol. Appl. 12, 54–65 (2019).

    PubMed  CAS  Google Scholar 

  60. 60.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    PubMed  PubMed Central  CAS  Google Scholar 

  61. 61.

    Sedlazeck, F. J., Rescheneder, P. & von Haeseler, A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29, 2790–2791 (2013).

    PubMed  CAS  Google Scholar 

  62. 62.

    Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  63. 63.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  64. 64.

    Broad Institute. Picard tools, (Broad Institute, 2019).

  65. 65.

    Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  66. 66.

    Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at (2017).

  67. 67.

    Datta, K., Gururaj, K., Naik, M., Narvaez, P. & Rutar, M. GenomicsDB: storing genome data as sparse columnar arrays. (2017).

  68. 68.

    Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

    PubMed  Google Scholar 

  69. 69.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013).

  70. 70.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  71. 71.

    Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).

    PubMed  CAS  Google Scholar 

  72. 72.

    Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  73. 73.

    Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    CAS  Google Scholar 

  74. 74.

    Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  75. 75.

    Grimm, D. G. et al. easyGWAS: a cloud-based platform for comparing the results of genome-wide association studies. Plant Cell 29, 5–19 (2017).

    PubMed  CAS  Google Scholar 

  76. 76.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

  77. 77.

    Wang, T., Hamann, A., Spittlehouse, D. & Carroll, C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE 11, e0156720 (2016).

    PubMed  PubMed Central  Google Scholar 

  78. 78.

    Gautier, M. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201, 1555–1579 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  79. 79.

    Jeffreys, H. Theory of Probability (Clarendon, 1961).

  80. 80.

    Hellens, R. P., Edwards, E. A., Leyland, N. R., Bean, S. & Mullineaux, P. M. pGreen: a versatile and flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Mol. Biol. 42, 819–832 (2000).

    PubMed  CAS  Google Scholar 

  81. 81.

    Weigel, D. & Glazebrook, J. Arabidopsis: A Laboratory Manual (CSHL, 2002).

  82. 82.

    Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  83. 83.

    Hartigan, J. A. & Wong, M. A. Algorithm AS 136: a K-means clustering algorithm. J. R. Stat. Soc. C. Appl. Stat. 28, 100–108 (1979).

    MATH  Google Scholar 

  84. 84.

    R Core Team. R: A language and environment for statistical computing, (2019).

  85. 85.

    Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

    PubMed  PubMed Central  Google Scholar 

  86. 86.

    Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20, 277 (2019).

    PubMed  PubMed Central  Google Scholar 

  87. 87.

    Ostevik, K. L., Samuk, K. & Rieseberg, L. H. Ancestral reconstruction of karyotypes reveals an exceptional rate of non-random chromosomal evolution in sunflower. Genetics 214, 1031–1045 (2020).

    PubMed  Google Scholar 

  88. 88.

    Huang, K., Andrew, R. L., Owens, G. L., Ostevik, K. L. & Rieseberg, L. H. Multiple chromosomal inversions contribute to adaptive divergence of a dune sunflower ecotype. Mol. Ecol., (2020).

  89. 89.

    Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).

    ADS  PubMed  PubMed Central  CAS  Google Scholar 

  90. 90.

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

  91. 91.

    Hu, X. & Friedberg, I. SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier. Gigascience 8, giz118 (2019).

    PubMed  PubMed Central  Google Scholar 

  92. 92.

    Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).

    PubMed  PubMed Central  Google Scholar 

  93. 93.

    Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  94. 94.

    Sambatti, J. B., Strasburg, J. L., Ortiz-Barrientos, D., Baack, E. J. & Rieseberg, L. H. Reconciling extremely strong barriers with high levels of gene exchange in annual sunflowers. Evolution 66, 1459–1473 (2012).

    PubMed  Google Scholar 

  95. 95.

    Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).

    Google Scholar 

  96. 96.

    Rambaut, A. FigTree, (2009).

Download references


We thank J. Gouzy and N. B. Langlade for providing access to the HA412-HOv2 annotation and PSC8 genome assembly; B. T. Moyers for discussion and providing the H. argophyllus picture; J. Lee-Yaw and A. J. Moreno-Geraldes for comments; D. Skonieczny, A. Kim, A. Parra and C. Konecny for assistance with fieldwork and data acquisition; A. Warfield for computing advice; J. D. Herndon for providing the dune H. petiolaris picture; D. G. Grimm for assistance with easyGWAS; UBC’s Data Science Institute for support to J.S.L.; and Compute Canada for computing resources. Maps were realized using tiles from Stamen Design (, under CC BY 3.0, from data by OpenStreetMaps contributors (, under ODbL. Funding was provided by Genome Canada and Genome BC (LSARP2014-223SUN), the NSF Plant Genome Program (IOS-1444522), the International Consortium for Sunflower Genomic Resources, Sofiproteol, an HFSP long-term postdoctoral fellowship to M.T. (LT000780/2013) and a Banting postdoctoral fellowship to G.L.O.

Author information




L.H.R., S.Y., J.M.B., L.A.D. and N.B. conceived the study; D.O.B. collected seeds and soil samples from wild populations; N.B., M.T., D.O.B., I.I. and W.C. performed the common garden experiment, and collected and organized phenotypic data; L.A.D. analysed the soil samples; N.B., M.T. and E.B.M.D. generated resequencing data; M.T. and M.A.P.-R. analysed HaFT1 amplification and expression in H. argophyllus; M.T., K.H. and M.A.P.-R. generated material for HiC experiments; M.T. generated and analysed A. thaliana HaFT transgenic lines; J.-S.L., M.N. and G.L.O. performed read alignments, and SNP calling and filtering; M.T., G.L.O., S.S., K.H., K.L.O., E.B.M.D., K.L. and M.J. analysed genomic data; S.E.S. and S.M. contributed resources; R.N. provided conceptual advice; M.T., G.L.O. and L.H.R. wrote the manuscript, with contributions from all of the authors.

Corresponding authors

Correspondence to Natalia Bercovich or Loren H. Rieseberg.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Jeffrey Ross-Ibarra, Jeremy Schmutz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Remapping SNPs to the HA412-HOv2 reference genome improved ordering.

a, Comparison between the original order of SNPs in chromosome 2 on the XRQv1 assembly9 (against which sequencing reads were originally mapped) and after SNP re-mapping to the HA412-HOv2 assembly11. Data are summarized in 5-kbp ranges. Error bars represent 2 standard errors. The higher R2 at longer distances is due to better scaffolding of contigs in HA412-HOv2. Number of SNPs: n = 261,020 (XRQ); n 237,674 (HA412-HO). b, GWA for flowering time in H. argophyllus based on the XRQv1 assembly identified more than 40 highly significant associations. c, Remapping of the SNPs to the new HA412-HOv2 sunflower assembly considerably reduced the number of associations in the flowering time GWA, with the vast majority of the signal mapping to the arg06.01 haploblock region (Fig. 2). In b, c, the purple lines represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. Associations were calculated using two-sided mixed models. n = 277 individuals. d, Genotype call accuracy. Variants for 12 individuals for each species from our SNP dataset were compared to Sanger sequencing data. Six regions were compared. Number of sites: n = 136 (H. annuus); n = 139 (H. argophyllus); n = 262 (H. petiolaris). Number of genotype calls: n = 1,385 (H. annuus), n = 1,254 (H. argophyllus), n = 2,351 (H. petiolaris). Overall genotype accuracy: H. annuus = 95.9%; H. argophyllus = 96.8%; H. petiolaris = 97.9% (Supplementary Table 1). Vertical purple lines represent the average observed coverage across genic regions for individuals in the corresponding dataset. Error bars, binomial confidence interval (Wilson score method). e, Genome-wide principal component analysis for each dataset. Sites were pruned for linkage (r < 0.2 within 500 kb). Number of individuals: n = 730 (H. annuus); n = 299 (H. argophyllus); n = 168 (H. petiolaris petiolaris); n = 259 (H. petiolaris fallax). Source data

Extended Data Fig. 2 Phenotypic, structural and functional analyses for arg06.01.

a, Flowering time for the three wild sunflower species measured in a common garden experiment. Number of individuals: n = 612 (H. annuus); n = 161 (H. petiolaris petiolaris); n = 211 (H. petiolaris fallax); n = 48 (H. niveus canescens); n = 261 (H. argophyllus 0/0); n = 25 (H. argophyllus 0/1); n = 23 (H. argophyllus 1/1). b, Leaf nitrogen content and carbon/nitrogen ratio GWAs in H. argophyllus (two-sided mixed model associations; n = 289 individuals). The purple lines represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. c, Genotype presence or absence for the 130–135-Mbp region of chromosome 6 in H. argophyllus. The x-axis represents consecutive SNP positions; distances on this axis are therefore not proportional to physical distances on the chromosome. Purple bars highlight the positions of the five HaFT genes in the region (HaFT5 and HaFT6 are only a few hundred bp apart). Flowering time data are the same as used in GWA analyses. d, HaFT1 and HaFT2 expression levels in mature leaves or shoot apices of >6-month-old, flowering H. argophyllus plants, grown in a greenhouse in long days conditions (14 h light:10 h dark). This experiment was performed on two independent pairs of individuals, with similar results. e, Six-week-old A. thaliana plants grown in long day conditions at 23 °C. At least 19 independent transgenic events were analysed for each construct in each genetic background, and flowering time was consistent within each group. Scale bar, 1 cm. f, Flowering time in long and short days (10 h light:14 h dark). HaFT2 alleles from early- and late-flowering H. argophyllus plants complement the ft-10 mutant, similar to HaFT1 from the early-flowering ecotype. HaFT6 is expressed at low levels in H. argophyllus plants (not shown), and appears to be a hypo-functional FT homologue. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Differences in flowering time between untransformed controls, HaFT6 lines and all the other transgenic lines are significant in all conditions (P < 10−6 for all relevant comparisons; one-way ANOVA with post hoc Tukey HSD test, df = 4; exact P values are reported in the Source Data). Number of individuals or independent transformation events for the long days dataset in Col-0 background; n = 28 (Col-0); n = 32 (HaFT1); n = 30 (HaFT2early); n = 34 (HaFT2late); n = 45 (HaFT6). For the long days dataset in ft-10 background: n = 25 (ft-10); n = 30 (HaFT1); n = 38 (HaFT2early); n = 45 (HaFT2late); n = 18 (HaFT6). For the short days dataset; n = 10 (Col-0); n = 24 (HaFT1); n = 17 (HaFT2early); n = 31 (HaFT2late); n = 31 (HaFT6). g, PCR detection of transgene expression in leaves of plants grown for four weeks in long days. The reduced ability of HaFT6 to induce flowering is not due to inefficient expression of the transgene. Results for four independent primary transformants for each transgenic line and for wild-type Col-0 plants are shown. For gel source data, see Supplementary Fig. 1. Source data

Extended Data Fig. 3 Several haploblocks differentiate dune and non-dune populations of H. petiolaris.

a, Correlation between seed size and flowering time. Although dune-adapted H. petiolaris fallax flowers later and has larger seeds than non-dune-adapted populations, these two traits generally show no correlation, or a weak negative correlation, in H. annuus and H. petiolaris. Purple lines represent linear regressions, shaded grey area are 95% confidence intervals. H. annuus: n = 426 individuals, one-sided F1,423 = 1.831, P = 0.18; H. petiolaris: n = 307 individuals, one-sided F1,305 = 9.841, P = 0.0019. b, Seed length GWA in H. petiolaris fallax (two-sided mixed model associations; n = 165 individuals). No significant association with haploblocks is found in GWA analyses for seed width (not shown). c, FST values in 2-Mbp non-overlapping sliding windows for comparisons between dune- and non-dune-adapted populations of H. petiolaris fallax in Colorado. Purple bars represent predicted haploblocks. d, Flowering time (approximated as total leaf number (TLN) on the primary stem) GWA for H. petiolaris petiolaris (two-sided mixed model associations; n = 160 individuals). The purple lines in b, d represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. e, Distribution of FST values for SNPs and haploblocks in comparisons between dune- and non-dune-adapted populations of H. petiolaris fallax in Texas and Colorado16. Percentiles are reported for the most highly divergent haploblocks. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Number of individuals: n = 28 (Colorado); n = 54 (Texas). Number of SNPs: n = 1,196,399 (Colorado); n = 1,169,273 (Texas). f, Maximum-likelihood trees for two of the haploblocks segregating within H. petiolaris. Dune populations of H. petiolaris fallax are highlighted in light (Colorado) and dark tan (Texas). For pet09.01 and pet11.01, although both dune populations have converged on the same haplotype, the Texas haplotype is the ancestral H. petiolaris fallax copy, whereas in Colorado the haplotype is derived from introgression with H. petiolaris petiolaris, suggesting convergent adaptation. Bootstrap values for major nodes are reported (asterisks = 100). Source data

Extended Data Fig. 4 Local PCA highlights haploblock regions.

For each predicted haploblock, the local PCA MDS plot for the relevant chromosome, a PCA of the selected region, observed heterozygosity for each haploblock genotype and LD patterns for the relevant chromosome are shown. In the local PCA MDS plots, each dot represents a 100-SNP window, and windows within the haploblock region are highlighted. The x-axis values represent Mbp. For H. petiolaris, haploblocks were identified in the full species or subspecies datasets; the local PCA and LD plots are from the dataset in which the haploblock was identified, and PCA and heterozygosity plots use the full dataset. In PCA plots, samples are coloured by inferred haploblock genotype. For LD plots, upper triangle = all individuals; lower triangle = only individuals homozygous for the more common haploblock allele. Colours represent the second highest R2 value in 0.5-Mbp windows. For most haploblock regions, high LD is driven by differences between haplotypes, so high LD is removed when only one haplotype is present. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Sample size for all haploblock analyses is provided in the Source Data, available at

Extended Data Fig. 5 Geographical distribution of haploblock genotypes.

Map showing collection locations of the three sunflower species and the frequency of haploblock genotypes at each collection site.

Extended Data Fig. 6 Comparisons between reference assemblies and genetic maps confirm structural rearrangements associated with haploblocks.

a, Alignment of chromosome 1 for the H. annuus genome assemblies PSC8 and HA412-HOv2. The ann01.01 region (at about 8 Mpb; inset), for which the two cultivars have different haplotypes, shows inverted alignment. b, Three H. annuus genetic maps (constructed using F2 populations between wild individuals and the HA412-HO cultivar). c, Four genetic maps (constructed using F1 populations. From top to bottom: H. petiolaris petiolaris, H. petiolaris fallax87, newly constructed dune H. petiolaris fallax and newly constructed non-dune H. petiolaris fallax88) are plotted relative to the HA412-HOv2 reference assembly. To the right of each dot plot, markers are plotted in the order in which they appear in each genetic map. Haploblock regions and the markers that fall within them are highlighted in purple. Circled haploblock regions show evidence of different orientations across the multiple maps (dotted lines), of suppressed recombination (dashed lines) or are contiguous in H. petiolaris maps despite being split over multiple windows in the HA412-HOv2 reference assembly (solid lines). Parental haploblock genotypes are known for the H. annuus maps and for the bottom two H. petiolaris maps. Ann05.01 and ann11.01 were segregating within in the H. annuus mapping populations. Genotypes at pet05.01 and pet11.01 differed between the H. petiolaris fallax parents of newly constructed dune and non-dune populations, whereas both parents were heterozygous for the pet09.01 haploblock. In all these cases, patterns of segregation are consistent with the parental haploblock genotypes. For the remaining H. petiolaris maps, the parental haploblock genotypes are not known. Because an absence of evidence is uninformative in these cases, only haploblock regions with evidence for inversions or contiguous windows from these two maps are plotted. Source data

Extended Data Fig. 7 HiC comparisons identify SVs associated with most, but not all, haploblocks.

a, Differences in HiC interactions between pairs of early- and late-flowering H. argophyllus or dune and non-dune H. petiolaris samples. Purple bars and solid black lines represent approximate haploblock boundaries. Pieces of a single haploblock that map to different regions of the HA412-HOv2 reference are highlighted by dotted lines. Top row, comparisons between H. annuus and H. argophyllus or H. petiolaris, for H. annuus haploblock regions. Because the relative haploblock genotypes between sunflower species are not known, only cases in which evidence of structural variants were observed are reported. Following rows, regions for which the pairs of H. argophyllus or H. petiolaris samples differed at haploblock alleles. Red or blue dots show increased or decreased, respectively, long-distance interactions in one sample, consistent with differences in genome structure. Relevant differences in long-distance interactions are highlighted by black arrows; for each of these, the percentage rank compared to all other possible interactions at the same distance across the genome is reported. No evidence of large-scale structural variation was observed for arg06.01 and pet10.01. An excess of interactions in the early-flowering allele for the approximately 130–140-Mbp region of chromosome 6 is consistent with the presence of deletions in the late-flowering alleles (Extended Data Fig. 2c), as well as with improved mappability of reads from the early-flowering allele, which—being an introgression from wild H. annuus—is closer in sequence to the HA412-HO reference. Differences in HiC interactions were capped between −0.3 and 0.3 for plotting purposes. b, Inversion scenarios with comparisons of simulated HiC interaction matrixes consistent with empirical patterns. There are H. annuus-specific inversions in the reference genome, as well as inversions between haploblocks.

Extended Data Fig. 8 Haploblock GWAs.

Heat map of GWAs for individual phenotypic traits, treating haploblocks as individual loci. Haploblocks were filtered to retain only regions with minor allele frequency ≥ 3%. PCA and kinship matrices used as covariates were calculated without variants inside haploblock regions. GWAs were calculated using two-sided mixed models. Number of individuals: n = 614 (H. annuus); n = 294 (H. argophyllus); n = 209 (H. petiolaris fallax); n = 163 (H. petiolaris petiolaris).

Extended Data Fig. 9 Haploblock GEAs.

a, Heat map of GEAs for individual environmental variables, treating haploblocks as individual loci. Haploblocks were filtered to retain only regions with minor allele frequency ≥ 3%. The population correlation matrix was calculated without variants inside haploblock regions. GEAs were calculated using two-sided XtX statistics. Number of populations: n = 71 (H. annuus); n = 30 (H. argophyllus); n = 23 (H. petiolaris fallax); n = 17 (H. petiolaris petiolaris). b, The proportion of haploblock and SNP loci significantly associated with one or more environmental variable (dB ≥ 10) or phenotypic trait (P ≤ 0.001). *P < 0.05, **P < 0.0005 (two-sided proportion test; exact P values and number of individuals are reported in Source Data). Source data

Extended Data Fig. 10 A 100-Mbp haploblock is associated with early flowering in the texanus ecotype of H. annuus.

a, GWA for flowering in H. annuus (two-sided mixed model associations; n = 612 individuals), using a kinship matrix and PCA covariate including (black dots) or excluding (yellow dots) the haploblock regions. Haploblock regions are highlighted in purple. The purple line represents 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. b, Flowering time for individuals with different genotypes at ann13.01. Number of individuals: n = 244 (0/0); n = 168 (0/1); n = 200 (1/1). c, Distribution of ann13.01 haplotypes. d, Distribution of FST values for individual SNPs and haploblocks in comparisons between the texanus ecotype of H. annuus and other H. annuus populations. Percentiles are reported for the most highly divergent haploblocks. In b, d, box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges.

Extended Data Table 1 Positions and frequencies of haploblocks, and experimental support for linked SVs.

Supplementary information

Supplementary Figure 1

This file contains gels source data.

Reporting Summary

Supplementary Table 1

| Population information and environmental variables Phenotypic data, coverage and SRA IDs for individual wild sunflower samples.

Supplementary Table 2

| Candidate genes for GWA and GEA analyses.

Supplementary Table 3

| Primers, markers and adapters used in this study.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Todesco, M., Owens, G.L., Bercovich, N. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing