Massive haplotypes underlie ecotypic differentiation in sunflowers

Todesco, Marco; Owens, Gregory L.; Bercovich, Natalia; Légaré, Jean-Sébastien; Soudi, Shaghayegh; Burge, Dylan O.; Huang, Kaichi; Ostevik, Katherine L.; Drummond, Emily B. M.; Imerovski, Ivana; Lande, Kathryn; Pascual-Robles, Mariana A.; Nanavati, Mihir; Jahani, Mojtaba; Cheung, Winnie; Staton, S. Evan; Muños, Stéphane; Nielsen, Rasmus; Donovan, Lisa A.; Burke, John M.; Yeaman, Sam; Rieseberg, Loren H.

doi:10.1038/s41586-020-2467-6

Article
Published: 08 July 2020

Massive haplotypes underlie ecotypic differentiation in sunflowers

Nature volume 584, pages 602–607 (2020)Cite this article

23k Accesses
219 Citations
310 Altmetric
Metrics details

Subjects

Abstract

Species often include multiple ecotypes that are adapted to different environments¹. However, it is unclear how ecotypes arise and how their distinctive combinations of adaptive alleles are maintained despite hybridization with non-adapted populations^2,3,4. Here, by resequencing 1,506 wild sunflowers from 3 species (Helianthus annuus, Helianthus petiolaris and Helianthus argophyllus), we identify 37 large (1–100 Mbp in size), non-recombining haplotype blocks that are associated with numerous ecologically relevant traits, as well as soil and climate characteristics. Limited recombination in these haplotype blocks keeps adaptive alleles together, and these regions differentiate sunflower ecotypes. For example, haplotype blocks control a 77-day difference in flowering between ecotypes of the silverleaf sunflower H. argophyllus (probably through deletion of a homologue of FLOWERING LOCUS T (FT)), and are associated with seed size, flowering time and soil fertility in dune-adapted sunflowers. These haplotypes are highly divergent, frequently associated with structural variants and often appear to represent introgressions from other—possibly now-extinct—congeners. These results highlight a pervasive role of structural variation in ecotypic adaptation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Population structure and association analyses of wild sunflowers.**

**Fig. 2: A large introgression from *H. annuus* containing a functional HaFT1 gene causes early flowering in coastal *H. argophyllus*.**

**Fig. 3: Large non-recombining haplotypes control dune adaptation in *H. p. fallax*.**

**Fig. 4: Large haploblocks are pervasive in wild sunflowers and are associated with structural variants.**

**Fig. 5: Haploblocks are highly divergent and are associated with multiple ecologically relevant traits and environmental variables.**

Large haploblocks underlie rapid adaptation in the invasive weed Ambrosia artemisiifolia

Article Open access 27 March 2023

Adaptive evolution in a conifer hybrid zone is driven by a mosaic of recently introgressed and background genetic variants

Article Open access 05 February 2021

Genome-wide analysis of Cushion willow provides insights into alpine plant divergence in a biodiversity hotspot

Article Open access 19 November 2019

Data availability

All raw sequenced data are stored in the Sequence Read Archive (SRA) under BioProject accessions PRJNA532579, PRJNA398560 and PRJNA564337. SRA accession numbers for individual samples are listed in Supplementary Table 1 (tabs ‘Coverage and analyses’, ‘Outgroups’, ‘Samples from other studies’ and ‘HiC samples’). The HA412-HOv2 and PSC8 genome assemblies are available at https://sunflowergenome.org/ and https://heliagene.org/. Filtered SNP datasets are available at https://rieseberglab.github.io/ubc-sunflower-genome/. GWA results, as well as the corresponding SNP and trait data, are available at https://easygwas.ethz.ch/gwas/myhistory/public/20/, https://easygwas.ethz.ch/gwas/myhistory/public/21/, https://easygwas.ethz.ch/gwas/myhistory/public/22/, https://easygwas.ethz.ch/gwas/myhistory/public/23/. HaFT1, HaFT2 and HaFT6 sequences have been deposited in GenBank under accession numbers MN517758–MN517761. Source data for all figures are provided at https://github.com/owensgl/haploblocks/. Source data are provided with this paper.

Code availability

All code associated with this project is available at https://github.com/owensgl/haploblocks/.

References

Clausen, J. Stages in the Evolution of Plant Species (Cornell Univ. Press, 1951).
Endler, J. A. Gene flow and population differentiation. Science 179, 243–250 (1973).
ADS PubMed CAS Google Scholar
Felsenstein, J. Skepticism towards Santa Rosalia, or why are there so few kinds of animals? Evolution 35, 124–138 (1981).
PubMed Google Scholar
Romanes, G. J. Physiological selection; an additional suggestion on the origin of species. Zool. J. Linn. Soc. 19, 337–411 (1886).
Google Scholar
Whitney, K. D., Randell, R. A. & Rieseberg, L. H. Adaptive introgression of abiotic tolerance traits in the sunflower Helianthus annuus. New Phytol. 187, 230–239 (2010).
PubMed Google Scholar
Ostevik, K. L., Andrew, R. L., Otto, S. P. & Rieseberg, L. H. Multiple reproductive barriers separate recently diverged sunflower ecotypes. Evolution 70, 2322–2335 (2016).
PubMed Google Scholar
Moyers, B. T. The Landscape of Divergence in Silverleaf Sunflowers. PhD thesis, Univ. of British Columbia (2015).
Qiu, F. et al. Phylogenetic trends and environmental correlates of nuclear genome size variation in Helianthus sunflowers. New Phytol. 221, 1609–1618 (2019).
PubMed CAS Google Scholar
Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).
ADS CAS PubMed Google Scholar
Shagina, I. et al. Normalization of genomic DNA using duplex-specific nuclease. Biotechniques 48, 455–459 (2010).
PubMed CAS Google Scholar
Staton, S. E. & Rieseberg, L. H. Sunflower Genome Database, https://www.sunflowergenome.org/ (2019).
INRA. INRA Sunflower Bioinformatics Resources, https://www.heliagene.org/ (2019).
Baute, G. J., Owens, G. L., Bock, D. G. & Rieseberg, L. H. Genome-wide genotyping-by-sequencing data provide a high-resolution view of wild Helianthus diversity, genetic structure, and interspecies gene flow. Am. J. Bot. 103, 2170–2177 (2016).
PubMed Google Scholar
Stephens, J. D., Rogers, W. L., Mason, C. M., Donovan, L. A. & Malmberg, R. L. Species tree estimation of diploid Helianthus (Asteraceae) using target enrichment. Am. J. Bot. 102, 910–920 (2015).
PubMed Google Scholar
Heiser, C. B. & Smith, D. M. The North American Sunflowers (Helianthus) (Seeman Printery, 1969).
Hübner, S. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5, 54–62 (2019).
PubMed Google Scholar
Raduski, A. R., Rieseberg, L. H. & Strasburg, J. L. Effective population size, gene flow, and species status in a narrow endemic sunflower, Helianthus neglectus, compared to its widespread sister species, H. petiolaris. Int. J. Mol. Sci. 11, 492–506 (2010).
PubMed PubMed Central CAS Google Scholar
Strasburg, J. L. & Rieseberg, L. H. Molecular demographic history of the annual sunflowers Helianthus annuus and H. petiolaris—large effective population sizes and rates of long-term gene flow. Evolution 62, 1936–1950 (2008).
PubMed PubMed Central Google Scholar
Blackman, B. K., Michaels, S. D. & Rieseberg, L. H. Connecting the sun to flowering in sunflower adaptation. Mol. Ecol. 20, 3503–3512 (2011).
PubMed PubMed Central CAS Google Scholar
Zan, Y. & Carlborg, Ö. A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population. Mol. Biol. Evol. 36, 141–154 (2019).
PubMed CAS Google Scholar
Kobayashi, Y., Kaya, H., Goto, K., Iwabuchi, M. & Araki, T. A pair of related genes with antagonistic roles in mediating flowering signals. Science 286, 1960–1962 (1999).
PubMed CAS Google Scholar
Werner, J. D. et al. Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. Proc. Natl Acad. Sci. USA 102, 2460–2465 (2005).
ADS PubMed CAS PubMed Central Google Scholar
Cao, Y., Wen, L., Wang, Z. & Ma, L. SKIP interacts with the Paf1 complex to regulate flowering via the activation of FLC transcription in Arabidopsis. Mol. Plant 8, 1816–1819 (2015).
PubMed CAS Google Scholar
Wang, L. C. et al. Involvement of the Arabidopsis HIT1/AtVPS53 tethering protein homologue in the acclimation of the plasma membrane to heat stress. J. Exp. Bot. 62, 3609–3620 (2011).
PubMed CAS Google Scholar
Blackman, B. K. et al. Contributions of flowering time genes to sunflower domestication and improvement. Genetics 187, 271–287 (2011).
PubMed PubMed Central CAS Google Scholar
Brouillette, L. C. & Donovan, L. A. Nitrogen stress response of a hybrid species: a gene expression study. Ann. Bot. 107, 101–108 (2011).
PubMed CAS Google Scholar
Andrew, R. L. & Rieseberg, L. H. Divergence is focused on few genomic regions early in speciation: incipient speciation of sunflower ecotypes. Evolution 67, 2468–2482 (2013).
PubMed Google Scholar
Ostevik, K. L. The Ecology and Genetics of Adaptation and Speciation in Dune Sunflowers. PhD thesis, Univ. of British Columbia (2016).
Li, H. & Ralph, P. Local PCA shows how the effect of population structure differs along the genome. Genetics 211, 289–304 (2019).
PubMed CAS Google Scholar
Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434 (2006).
PubMed PubMed Central CAS Google Scholar
Ortiz-Barrientos, D., Engelstädter, J. & Rieseberg, L. H. Recombination rate evolution and the origin of species. Trends Ecol. Evol. 31, 226–236 (2016).
PubMed Google Scholar
Trickett, A. J. & Butlin, R. K. Recombination suppressors and the evolution of new species. Heredity 73, 339–345 (1994).
PubMed Google Scholar
Arostegui, M. C., Quinn, T. P., Seeb, L. W., Seeb, J. E. & McKinney, G. J. Retention of a chromosomal inversion from an anadromous ancestor provides the genetic basis for alternative freshwater ecotypes in rainbow trout. Mol. Ecol. 28, 1412–1427 (2019).
PubMed CAS Google Scholar
Joron, M. et al. Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477, 203–206 (2011).
ADS PubMed PubMed Central CAS Google Scholar
Lowry, D. B. & Willis, J. H. A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation. PLoS Biol. 8, e1000500 (2010).
PubMed PubMed Central Google Scholar
Fustier, M. A. et al. Common gardens in teosintes reveal the establishment of a syndrome of adaptation to altitude. PLoS Genet. 15, e1008512 (2019).
PubMed PubMed Central Google Scholar
Wellenreuther, M., Rosenquist, H., Jaksons, P. & Larson, K. W. Local adaptation along an environmental cline in a species with an inversion polymorphism. J. Evol. Biol. 30, 1068–1077 (2017).
PubMed CAS Google Scholar
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
PubMed CAS Google Scholar
Mason, C. M. How old are sunflowers? A molecular clock analysis of key divergences in the origin and diversification of Helianthus (Asteraceae). Int. J. Plant Sci. 179, 182–191 (2018).
Google Scholar
Wellenreuther, M. & Bernatchez, L. Eco-evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 33, 427–440 (2018).
PubMed Google Scholar
Jay, P. et al. Supergene evolution triggered by the introgression of a chromosomal inversion. Curr. Biol. 28, 1839–1845. (2018).
PubMed CAS Google Scholar
Lotterhos, K. E. The effect of neutral recombination variation on genome scans for selection. G3 9, 1851–1867 (2019).
PubMed CAS PubMed Central Google Scholar
Heiser, C. B., Jr. Hybridization in the annual sunflowers: Helianthus annuus × H. debilis var. cucumerifolius. Evolution 5, 42–51 (1951).
Google Scholar
Hooper, D. M. & Price, T. D. Chromosomal inversion differences correlate with range overlap in passerine birds. Nat. Ecol. Evol. 1, 1526–1534 (2017).
PubMed Google Scholar
Heiser, C. B. Three new annual sunflowers (Helianthus) from the southwestern United States. Rhodora 60, 272–283 (1958).
Google Scholar
Andrew, R. L., Kane, N. C., Baute, G. J., Grassa, C. J. & Rieseberg, L. H. Recent nonhybrid origin of sunflower ecotypes in a novel habitat. Mol. Ecol. 22, 799–813 (2013).
PubMed CAS Google Scholar
Kirkpatrick, M. Reinforcement and divergence under assortative mating. Proc. R. Soc. Lond. B 267, 1649–1655 (2000).
CAS Google Scholar
Feder, J. L., Gejji, R., Powell, T. H. & Nosil, P. Adaptive chromosomal divergence driven by mixed geographic mode of evolution. Evolution 65, 2157–2170 (2011).
PubMed Google Scholar
Yeaman, S. & Whitlock, M. C. The genetic architecture of adaptation under migration–selection balance. Evolution 65, 1897–1911 (2011).
PubMed Google Scholar
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
PubMed CAS Google Scholar
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
PubMed PubMed Central CAS Google Scholar
Rodríguez, G. R. et al. Tomato Analyzer: a useful software application to collect accurate and detailed morphological and colorimetric data from two-dimensional objects. J. Vis. Exp 37, e1856 (2010).
Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325 (1980).
PubMed PubMed Central CAS Google Scholar
Zeng, J., Zou, Y., Bai, J. & Zheng, H. Preparation of total DNA from recalcitrant plant taxa. Acta Bot. Sin. 44, 694–697 (2002).
CAS Google Scholar
Rowan, B. A., Patel, V., Weigel, D. & Schneeberger, K. Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping. G3 5, 385–398 (2015).
PubMed CAS PubMed Central Google Scholar
Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).
PubMed PubMed Central CAS Google Scholar
Matvienko, M. et al. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS ONE 8, e55913 (2013).
ADS PubMed PubMed Central CAS Google Scholar
Lee-Yaw, J. A., Grassa, C. J., Joly, S., Andrew, R. L. & Rieseberg, L. H. An evaluation of alternative explanations for widespread cytonuclear discordance in annual sunflowers (Helianthus). New Phytol. 221, 515–526 (2019).
PubMed CAS Google Scholar
Owens, G. L., Baute, G. J., Hubner, S. & Rieseberg, L. H. Genomic sequence and copy number evolution during hybrid crop development in sunflowers. Evol. Appl. 12, 54–65 (2019).
PubMed CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
PubMed PubMed Central CAS Google Scholar
Sedlazeck, F. J., Rescheneder, P. & von Haeseler, A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29, 2790–2791 (2013).
PubMed CAS Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
PubMed PubMed Central CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Broad Institute. Picard tools, http://broadinstitute.github.io/picard/ (Broad Institute, 2019).
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
PubMed PubMed Central CAS Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at https://www.biorxiv.org/content/10.1101/201178v3 (2017).
Datta, K., Gururaj, K., Naik, M., Narvaez, P. & Rutar, M. GenomicsDB: storing genome data as sparse columnar arrays. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/genomics-storing-genome-data-paper.pdf (2017).
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
PubMed Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
PubMed PubMed Central CAS Google Scholar
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
PubMed CAS Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
PubMed PubMed Central CAS Google Scholar
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
CAS PubMed Google Scholar
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
PubMed PubMed Central CAS Google Scholar
Grimm, D. G. et al. easyGWAS: a cloud-based platform for comparing the results of genome-wide association studies. Plant Cell 29, 5–19 (2017).
PubMed CAS Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
PubMed PubMed Central CAS Google Scholar
Wang, T., Hamann, A., Spittlehouse, D. & Carroll, C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE 11, e0156720 (2016).
PubMed PubMed Central Google Scholar
Gautier, M. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201, 1555–1579 (2015).
PubMed PubMed Central CAS Google Scholar
Jeffreys, H. Theory of Probability (Clarendon, 1961).
Hellens, R. P., Edwards, E. A., Leyland, N. R., Bean, S. & Mullineaux, P. M. pGreen: a versatile and flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Mol. Biol. 42, 819–832 (2000).
PubMed CAS Google Scholar
Weigel, D. & Glazebrook, J. Arabidopsis: A Laboratory Manual (CSHL, 2002).
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
PubMed PubMed Central CAS Google Scholar
Hartigan, J. A. & Wong, M. A. Algorithm AS 136: a K-means clustering algorithm. J. R. Stat. Soc. C. Appl. Stat. 28, 100–108 (1979).
MATH Google Scholar
R Core Team. R: A language and environment for statistical computing, https://www.R-project.org/ (2019).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
PubMed PubMed Central Google Scholar
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20, 277 (2019).
PubMed PubMed Central Google Scholar
Ostevik, K. L., Samuk, K. & Rieseberg, L. H. Ancestral reconstruction of karyotypes reveals an exceptional rate of non-random chromosomal evolution in sunflower. Genetics 214, 1031–1045 (2020).
PubMed PubMed Central Google Scholar
Huang, K., Andrew, R. L., Owens, G. L., Ostevik, K. L. & Rieseberg, L. H. Multiple chromosomal inversions contribute to adaptive divergence of a dune sunflower ecotype. Mol. Ecol., https://doi.org/10.1111/mec.15428 (2020).
Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).
ADS PubMed CAS Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
PubMed PubMed Central CAS Google Scholar
Hu, X. & Friedberg, I. SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier. Gigascience 8, giz118 (2019).
PubMed PubMed Central Google Scholar
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
PubMed PubMed Central Google Scholar
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
PubMed PubMed Central CAS Google Scholar
Sambatti, J. B., Strasburg, J. L., Ortiz-Barrientos, D., Baack, E. J. & Rieseberg, L. H. Reconciling extremely strong barriers with high levels of gene exchange in annual sunflowers. Evolution 66, 1459–1473 (2012).
PubMed Google Scholar
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Google Scholar
Rambaut, A. FigTree, http://tree.bio.ed.ac.uk/software/figtree/ (2009).

Download references

Acknowledgements

We thank J. Gouzy and N. B. Langlade for providing access to the HA412-HOv2 annotation and PSC8 genome assembly; B. T. Moyers for discussion and providing the H. argophyllus picture; J. Lee-Yaw and A. J. Moreno-Geraldes for comments; D. Skonieczny, A. Kim, A. Parra and C. Konecny for assistance with fieldwork and data acquisition; A. Warfield for computing advice; J. D. Herndon for providing the dune H. petiolaris picture; D. G. Grimm for assistance with easyGWAS; UBC’s Data Science Institute for support to J.S.L.; and Compute Canada for computing resources. Maps were realized using tiles from Stamen Design (https://stamen.com), under CC BY 3.0, from data by OpenStreetMaps contributors (https://openstreetmap.org), under ODbL. Funding was provided by Genome Canada and Genome BC (LSARP2014-223SUN), the NSF Plant Genome Program (IOS-1444522), the International Consortium for Sunflower Genomic Resources, Sofiproteol, an HFSP long-term postdoctoral fellowship to M.T. (LT000780/2013) and a Banting postdoctoral fellowship to G.L.O.

Author information

Mihir Nanavati
Present address: Microsoft Research, New York, NY, USA
These authors contributed equally: Marco Todesco, Gregory L. Owens, Natalia Bercovich

Authors and Affiliations

Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
Marco Todesco, Gregory L. Owens, Natalia Bercovich, Jean-Sébastien Légaré, Dylan O. Burge, Kaichi Huang, Emily B. M. Drummond, Ivana Imerovski, Kathryn Lande, Mariana A. Pascual-Robles, Mojtaba Jahani, Winnie Cheung, S. Evan Staton & Loren H. Rieseberg
Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
Marco Todesco, Gregory L. Owens, Natalia Bercovich, Jean-Sébastien Légaré, Dylan O. Burge, Kaichi Huang, Emily B. M. Drummond, Ivana Imerovski, Kathryn Lande, Mariana A. Pascual-Robles, Mojtaba Jahani, Winnie Cheung, S. Evan Staton & Loren H. Rieseberg
Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
Gregory L. Owens & Rasmus Nielsen
Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada
Jean-Sébastien Légaré & Mihir Nanavati
Data Science Institute, University of British Columbia, Vancouver, British Columbia, Canada
Jean-Sébastien Légaré
Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
Shaghayegh Soudi & Sam Yeaman
Department of Biology, Duke University, Durham, NC, USA
Katherine L. Ostevik
LIPM, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan, France
Stéphane Muños
Department of Plant Biology, University of Georgia, Athens, GA, USA
Lisa A. Donovan & John M. Burke

Authors

Marco Todesco
View author publications
You can also search for this author in PubMed Google Scholar
Gregory L. Owens
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Bercovich
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Sébastien Légaré
View author publications
You can also search for this author in PubMed Google Scholar
Shaghayegh Soudi
View author publications
You can also search for this author in PubMed Google Scholar
Dylan O. Burge
View author publications
You can also search for this author in PubMed Google Scholar
Kaichi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Katherine L. Ostevik
View author publications
You can also search for this author in PubMed Google Scholar
Emily B. M. Drummond
View author publications
You can also search for this author in PubMed Google Scholar
Ivana Imerovski
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn Lande
View author publications
You can also search for this author in PubMed Google Scholar
Mariana A. Pascual-Robles
View author publications
You can also search for this author in PubMed Google Scholar
Mihir Nanavati
View author publications
You can also search for this author in PubMed Google Scholar
Mojtaba Jahani
View author publications
You can also search for this author in PubMed Google Scholar
Winnie Cheung
View author publications
You can also search for this author in PubMed Google Scholar
S. Evan Staton
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Muños
View author publications
You can also search for this author in PubMed Google Scholar
Rasmus Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Lisa A. Donovan
View author publications
You can also search for this author in PubMed Google Scholar
John M. Burke
View author publications
You can also search for this author in PubMed Google Scholar
Sam Yeaman
View author publications
You can also search for this author in PubMed Google Scholar
Loren H. Rieseberg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.H.R., S.Y., J.M.B., L.A.D. and N.B. conceived the study; D.O.B. collected seeds and soil samples from wild populations; N.B., M.T., D.O.B., I.I. and W.C. performed the common garden experiment, and collected and organized phenotypic data; L.A.D. analysed the soil samples; N.B., M.T. and E.B.M.D. generated resequencing data; M.T. and M.A.P.-R. analysed HaFT1 amplification and expression in H. argophyllus; M.T., K.H. and M.A.P.-R. generated material for HiC experiments; M.T. generated and analysed A. thaliana HaFT transgenic lines; J.-S.L., M.N. and G.L.O. performed read alignments, and SNP calling and filtering; M.T., G.L.O., S.S., K.H., K.L.O., E.B.M.D., K.L. and M.J. analysed genomic data; S.E.S. and S.M. contributed resources; R.N. provided conceptual advice; M.T., G.L.O. and L.H.R. wrote the manuscript, with contributions from all of the authors.

Corresponding authors

Correspondence to Natalia Bercovich or Loren H. Rieseberg.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Jeffrey Ross-Ibarra, Jeremy Schmutz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Remapping SNPs to the HA412-HOv2 reference genome improved ordering.

a, Comparison between the original order of SNPs in chromosome 2 on the XRQv1 assembly⁹ (against which sequencing reads were originally mapped) and after SNP re-mapping to the HA412-HOv2 assembly¹¹. Data are summarized in 5-kbp ranges. Error bars represent 2 standard errors. The higher R² at longer distances is due to better scaffolding of contigs in HA412-HOv2. Number of SNPs: n = 261,020 (XRQ); n 237,674 (HA412-HO). b, GWA for flowering time in H. argophyllus based on the XRQv1 assembly identified more than 40 highly significant associations. c, Remapping of the SNPs to the new HA412-HOv2 sunflower assembly considerably reduced the number of associations in the flowering time GWA, with the vast majority of the signal mapping to the arg06.01 haploblock region (Fig. 2). In b, c, the purple lines represent 5% Bonferroni-corrected significance. Only positions with −log₁₀ P value >2 are plotted. Associations were calculated using two-sided mixed models. n = 277 individuals. d, Genotype call accuracy. Variants for 12 individuals for each species from our SNP dataset were compared to Sanger sequencing data. Six regions were compared. Number of sites: n = 136 (H. annuus); n = 139 (H. argophyllus); n = 262 (H. petiolaris). Number of genotype calls: n = 1,385 (H. annuus), n = 1,254 (H. argophyllus), n = 2,351 (H. petiolaris). Overall genotype accuracy: H. annuus = 95.9%; H. argophyllus = 96.8%; H. petiolaris = 97.9% (Supplementary Table 1). Vertical purple lines represent the average observed coverage across genic regions for individuals in the corresponding dataset. Error bars, binomial confidence interval (Wilson score method). e, Genome-wide principal component analysis for each dataset. Sites were pruned for linkage (r < 0.2 within 500 kb). Number of individuals: n = 730 (H. annuus); n = 299 (H. argophyllus); n = 168 (H. petiolaris petiolaris); n = 259 (H. petiolaris fallax).

Source data

Extended Data Fig. 2 Phenotypic, structural and functional analyses for arg06.01.

a, Flowering time for the three wild sunflower species measured in a common garden experiment. Number of individuals: n = 612 (H. annuus); n = 161 (H. petiolaris petiolaris); n = 211 (H. petiolaris fallax); n = 48 (H. niveus canescens); n = 261 (H. argophyllus 0/0); n = 25 (H. argophyllus 0/1); n = 23 (H. argophyllus 1/1). b, Leaf nitrogen content and carbon/nitrogen ratio GWAs in H. argophyllus (two-sided mixed model associations; n = 289 individuals). The purple lines represent 5% Bonferroni-corrected significance. Only positions with −log₁₀ P value >2 are plotted. c, Genotype presence or absence for the 130–135-Mbp region of chromosome 6 in H. argophyllus. The x-axis represents consecutive SNP positions; distances on this axis are therefore not proportional to physical distances on the chromosome. Purple bars highlight the positions of the five HaFT genes in the region (HaFT5 and HaFT6 are only a few hundred bp apart). Flowering time data are the same as used in GWA analyses. d, HaFT1 and HaFT2 expression levels in mature leaves or shoot apices of >6-month-old, flowering H. argophyllus plants, grown in a greenhouse in long days conditions (14 h light:10 h dark). This experiment was performed on two independent pairs of individuals, with similar results. e, Six-week-old A. thaliana plants grown in long day conditions at 23 °C. At least 19 independent transgenic events were analysed for each construct in each genetic background, and flowering time was consistent within each group. Scale bar, 1 cm. f, Flowering time in long and short days (10 h light:14 h dark). HaFT2 alleles from early- and late-flowering H. argophyllus plants complement the ft-10 mutant, similar to HaFT1 from the early-flowering ecotype. HaFT6 is expressed at low levels in H. argophyllus plants (not shown), and appears to be a hypo-functional FT homologue. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Differences in flowering time between untransformed controls, HaFT6 lines and all the other transgenic lines are significant in all conditions (P < 10⁻⁶ for all relevant comparisons; one-way ANOVA with post hoc Tukey HSD test, df = 4; exact P values are reported in the Source Data). Number of individuals or independent transformation events for the long days dataset in Col-0 background; n = 28 (Col-0); n = 32 (HaFT1); n = 30 (HaFT2^early); n = 34 (HaFT2^late); n = 45 (HaFT6). For the long days dataset in ft-10 background: n = 25 (ft-10); n = 30 (HaFT1); n = 38 (HaFT2^early); n = 45 (HaFT2^late); n = 18 (HaFT6). For the short days dataset; n = 10 (Col-0); n = 24 (HaFT1); n = 17 (HaFT2^early); n = 31 (HaFT2^late); n = 31 (HaFT6). g, PCR detection of transgene expression in leaves of plants grown for four weeks in long days. The reduced ability of HaFT6 to induce flowering is not due to inefficient expression of the transgene. Results for four independent primary transformants for each transgenic line and for wild-type Col-0 plants are shown. For gel source data, see Supplementary Fig. 1.

Source data

Extended Data Fig. 3 Several haploblocks differentiate dune and non-dune populations of H. petiolaris.

a, Correlation between seed size and flowering time. Although dune-adapted H. petiolaris fallax flowers later and has larger seeds than non-dune-adapted populations, these two traits generally show no correlation, or a weak negative correlation, in H. annuus and H. petiolaris. Purple lines represent linear regressions, shaded grey area are 95% confidence intervals. H. annuus: n = 426 individuals, one-sided F_1,423 = 1.831, P = 0.18; H. petiolaris: n = 307 individuals, one-sided F_1,305 = 9.841, P = 0.0019. b, Seed length GWA in H. petiolaris fallax (two-sided mixed model associations; n = 165 individuals). No significant association with haploblocks is found in GWA analyses for seed width (not shown). c, F_ST values in 2-Mbp non-overlapping sliding windows for comparisons between dune- and non-dune-adapted populations of H. petiolaris fallax in Colorado. Purple bars represent predicted haploblocks. d, Flowering time (approximated as total leaf number (TLN) on the primary stem) GWA for H. petiolaris petiolaris (two-sided mixed model associations; n = 160 individuals). The purple lines in b, d represent 5% Bonferroni-corrected significance. Only positions with −log10 P value >2 are plotted. e, Distribution of F_ST values for SNPs and haploblocks in comparisons between dune- and non-dune-adapted populations of H. petiolaris fallax in Texas and Colorado¹⁶. Percentiles are reported for the most highly divergent haploblocks. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Number of individuals: n = 28 (Colorado); n = 54 (Texas). Number of SNPs: n = 1,196,399 (Colorado); n = 1,169,273 (Texas). f, Maximum-likelihood trees for two of the haploblocks segregating within H. petiolaris. Dune populations of H. petiolaris fallax are highlighted in light (Colorado) and dark tan (Texas). For pet09.01 and pet11.01, although both dune populations have converged on the same haplotype, the Texas haplotype is the ancestral H. petiolaris fallax copy, whereas in Colorado the haplotype is derived from introgression with H. petiolaris petiolaris, suggesting convergent adaptation. Bootstrap values for major nodes are reported (asterisks = 100).

Source data

Extended Data Fig. 4 Local PCA highlights haploblock regions.

For each predicted haploblock, the local PCA MDS plot for the relevant chromosome, a PCA of the selected region, observed heterozygosity for each haploblock genotype and LD patterns for the relevant chromosome are shown. In the local PCA MDS plots, each dot represents a 100-SNP window, and windows within the haploblock region are highlighted. The x-axis values represent Mbp. For H. petiolaris, haploblocks were identified in the full species or subspecies datasets; the local PCA and LD plots are from the dataset in which the haploblock was identified, and PCA and heterozygosity plots use the full dataset. In PCA plots, samples are coloured by inferred haploblock genotype. For LD plots, upper triangle = all individuals; lower triangle = only individuals homozygous for the more common haploblock allele. Colours represent the second highest R² value in 0.5-Mbp windows. For most haploblock regions, high LD is driven by differences between haplotypes, so high LD is removed when only one haplotype is present. Box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges. Sample size for all haploblock analyses is provided in the Source Data, available at https://github.com/owensgl/haploblocks/.

Extended Data Fig. 5 Geographical distribution of haploblock genotypes.

Map showing collection locations of the three sunflower species and the frequency of haploblock genotypes at each collection site.

Extended Data Fig. 6 Comparisons between reference assemblies and genetic maps confirm structural rearrangements associated with haploblocks.

a, Alignment of chromosome 1 for the H. annuus genome assemblies PSC8 and HA412-HOv2. The ann01.01 region (at about 8 Mpb; inset), for which the two cultivars have different haplotypes, shows inverted alignment. b, Three H. annuus genetic maps (constructed using F₂ populations between wild individuals and the HA412-HO cultivar). c, Four genetic maps (constructed using F₁ populations. From top to bottom: H. petiolaris petiolaris, H. petiolaris fallax⁸⁷, newly constructed dune H. petiolaris fallax and newly constructed non-dune H. petiolaris fallax⁸⁸) are plotted relative to the HA412-HOv2 reference assembly. To the right of each dot plot, markers are plotted in the order in which they appear in each genetic map. Haploblock regions and the markers that fall within them are highlighted in purple. Circled haploblock regions show evidence of different orientations across the multiple maps (dotted lines), of suppressed recombination (dashed lines) or are contiguous in H. petiolaris maps despite being split over multiple windows in the HA412-HOv2 reference assembly (solid lines). Parental haploblock genotypes are known for the H. annuus maps and for the bottom two H. petiolaris maps. Ann05.01 and ann11.01 were segregating within in the H. annuus mapping populations. Genotypes at pet05.01 and pet11.01 differed between the H. petiolaris fallax parents of newly constructed dune and non-dune populations, whereas both parents were heterozygous for the pet09.01 haploblock. In all these cases, patterns of segregation are consistent with the parental haploblock genotypes. For the remaining H. petiolaris maps, the parental haploblock genotypes are not known. Because an absence of evidence is uninformative in these cases, only haploblock regions with evidence for inversions or contiguous windows from these two maps are plotted.

Source data

Extended Data Fig. 7 HiC comparisons identify SVs associated with most, but not all, haploblocks.

a, Differences in HiC interactions between pairs of early- and late-flowering H. argophyllus or dune and non-dune H. petiolaris samples. Purple bars and solid black lines represent approximate haploblock boundaries. Pieces of a single haploblock that map to different regions of the HA412-HOv2 reference are highlighted by dotted lines. Top row, comparisons between H. annuus and H. argophyllus or H. petiolaris, for H. annuus haploblock regions. Because the relative haploblock genotypes between sunflower species are not known, only cases in which evidence of structural variants were observed are reported. Following rows, regions for which the pairs of H. argophyllus or H. petiolaris samples differed at haploblock alleles. Red or blue dots show increased or decreased, respectively, long-distance interactions in one sample, consistent with differences in genome structure. Relevant differences in long-distance interactions are highlighted by black arrows; for each of these, the percentage rank compared to all other possible interactions at the same distance across the genome is reported. No evidence of large-scale structural variation was observed for arg06.01 and pet10.01. An excess of interactions in the early-flowering allele for the approximately 130–140-Mbp region of chromosome 6 is consistent with the presence of deletions in the late-flowering alleles (Extended Data Fig. 2c), as well as with improved mappability of reads from the early-flowering allele, which—being an introgression from wild H. annuus—is closer in sequence to the HA412-HO reference. Differences in HiC interactions were capped between −0.3 and 0.3 for plotting purposes. b, Inversion scenarios with comparisons of simulated HiC interaction matrixes consistent with empirical patterns. There are H. annuus-specific inversions in the reference genome, as well as inversions between haploblocks.

Extended Data Fig. 8 Haploblock GWAs.

Heat map of GWAs for individual phenotypic traits, treating haploblocks as individual loci. Haploblocks were filtered to retain only regions with minor allele frequency ≥ 3%. PCA and kinship matrices used as covariates were calculated without variants inside haploblock regions. GWAs were calculated using two-sided mixed models. Number of individuals: n = 614 (H. annuus); n = 294 (H. argophyllus); n = 209 (H. petiolaris fallax); n = 163 (H. petiolaris petiolaris).

Extended Data Fig. 9 Haploblock GEAs.

a, Heat map of GEAs for individual environmental variables, treating haploblocks as individual loci. Haploblocks were filtered to retain only regions with minor allele frequency ≥ 3%. The population correlation matrix was calculated without variants inside haploblock regions. GEAs were calculated using two-sided XtX statistics. Number of populations: n = 71 (H. annuus); n = 30 (H. argophyllus); n = 23 (H. petiolaris fallax); n = 17 (H. petiolaris petiolaris). b, The proportion of haploblock and SNP loci significantly associated with one or more environmental variable (dB ≥ 10) or phenotypic trait (P ≤ 0.001). *P < 0.05, **P < 0.0005 (two-sided proportion test; exact P values and number of individuals are reported in Source Data).

Source data

Extended Data Fig. 10 A 100-Mbp haploblock is associated with early flowering in the texanus ecotype of H. annuus.

a, GWA for flowering in H. annuus (two-sided mixed model associations; n = 612 individuals), using a kinship matrix and PCA covariate including (black dots) or excluding (yellow dots) the haploblock regions. Haploblock regions are highlighted in purple. The purple line represents 5% Bonferroni-corrected significance. Only positions with −log₁₀ P value >2 are plotted. b, Flowering time for individuals with different genotypes at ann13.01. Number of individuals: n = 244 (0/0); n = 168 (0/1); n = 200 (1/1). c, Distribution of ann13.01 haplotypes. d, Distribution of F_ST values for individual SNPs and haploblocks in comparisons between the texanus ecotype of H. annuus and other H. annuus populations. Percentiles are reported for the most highly divergent haploblocks. In b, d, box plots show the median, box edges represent the 25th and 75th percentiles, whiskers represent the maximum and minimum data points within 1.5× interquartile range outside box edges.

Extended Data Table 1 Positions and frequencies of haploblocks, and experimental support for linked SVs.

Full size table

Supplementary information

Supplementary Figure 1

This file contains gels source data.

Reporting Summary

Supplementary Table 1

| Population information and environmental variables Phenotypic data, coverage and SRA IDs for individual wild sunflower samples.

Supplementary Table 2

| Candidate genes for GWA and GEA analyses.

Supplementary Table 3

| Primers, markers and adapters used in this study.

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 9

Rights and permissions

Reprints and permissions

About this article

Cite this article

Todesco, M., Owens, G.L., Bercovich, N. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020). https://doi.org/10.1038/s41586-020-2467-6

Download citation

Received: 28 September 2019
Accepted: 16 April 2020
Published: 08 July 2020
Issue Date: 27 August 2020
DOI: https://doi.org/10.1038/s41586-020-2467-6

This article is cited by

Genomic determinants, architecture, and constraints in drought-related traits in Corymbia calophylla
- Collin W. Ahrens
- Kevin Murray
- Paul D. Rymer
BMC Genomics (2024)
Exciting times for evolutionary biology

Nature Ecology & Evolution (2024)
Biotic interactions promote local adaptation to soil in plants
- Thomas Dorey
- Léa Frachon
- Florian P. Schiestl
Nature Communications (2024)
Impact of whole-genome duplications on structural variant evolution in Cochlearia
- Tuomas Hämälä
- Christopher Moore
- Levi Yant
Nature Communications (2024)
Divergent dynamics of sexual and habitat isolation at the transition between stick insect populations and species
- Patrik Nosil
- Zachariah Gompert
- Daniel J. Funk
Nature Communications (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links