Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance


Domesticated plants and animals often display dramatic responses to selection, but the origins of the genetic diversity underlying these responses remain poorly understood. Despite domestication and improvement bottlenecks, the cultivated sunflower remains highly variable genetically, possibly due to hybridization with wild relatives. To characterize genetic diversity in the sunflower and to quantify contributions from wild relatives, we sequenced 287 cultivated lines, 17 Native American landraces and 189 wild accessions representing 11 compatible wild species. Cultivar sequences failing to map to the sunflower reference were assembled de novo for each genotype to determine the gene repertoire, or ‘pan-genome’, of the cultivated sunflower. Assembled genes were then compared to the wild species to estimate origins. Results indicate that the cultivated sunflower pan-genome comprises 61,205 genes, of which 27% vary across genotypes. Approximately 10% of the cultivated sunflower pan-genome is derived through introgression from wild sunflower species, and 1.5% of genes originated solely through introgression. Gene ontology functional analyses further indicate that genes associated with biotic resistance are over-represented among introgressed regions, an observation consistent with breeding records. Analyses of allelic variation associated with downy mildew resistance provide an example in which such introgressions have contributed to resistance to a globally challenging disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Genomic variation in the SAM population based on 675,291 SNPs detected across all lines.
Fig. 2: The cultivated sunflower pan-genome.
Fig. 3: Average number of genes in the cultivated sunflower pan-genome assigned to the wild source.
Fig. 4: Genome-wide association mapping of downy mildew resistance in the SAM population.

Data availability

All raw sequence data are stored in the Sequence Read Archive (SRA) under Bioproject PRJNA353001 for cultivars and PRJNA397453 for wild and landrace. Accession numbers for each sample are listed in Supplementary Table 10. The SNP data set in vcf format, pan-genome sequences in fasta format and genome scan statistics in bed files format can be downloaded from


  1. 1.

    Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop domestication. Cell 127, 1309–1321 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Harlan, J. R. Crops and Man 2nd edn (American Society of Agronomy, Madison, 1992).

  3. 3.

    Ladizinsky, G. Plant Evolution Under Domestication (Springer, Dordrecht, 1998).

  4. 4.

    Galluzzi, G., Van Duijvendijk, C., Collette, L. Azzu, N. & Hodgkin, T. Biodiversity for Food and Agriculture. Contributing to Food Security and Sustainability in a Changing World (FAO, 2011).

  5. 5.

    Golicz, A. A., Batley, J. & Edwards, D. Towards plant pangenomics. Plant Biotechnol. J. 14, 1099–1105 (2016).

    Article  Google Scholar 

  6. 6.

    Barabaschi, D. et al. Next generation breeding. Plant Sci. 242, 3–13 (2016).

    CAS  Article  Google Scholar 

  7. 7.

    Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial ‘pan-genome’. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).

    CAS  Article  Google Scholar 

  8. 8.

    Medini, D. et al. Microbiology in the post-genomic era. Nat. Rev. Microbiol. 6, 419–430 (2008).

    CAS  Article  Google Scholar 

  9. 9.

    Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419–423 (2011).

    CAS  Article  Google Scholar 

  10. 10.

    Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

    CAS  Article  Google Scholar 

  11. 11.

    Lin, K. et al. Beyond genomic variation—comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage. BMC Genomics 15, 250 (2014).

    Article  Google Scholar 

  12. 12.

    Hirsch, C. N. et al. Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135 (2014).

    CAS  Article  Google Scholar 

  13. 13.

    Lu, F. et al. High-resolution genetic mapping of maize pan-genome sequence anchors. Nat. Commun. 6, 6914 (2015).

    CAS  Article  Google Scholar 

  14. 14.

    Burke, J. M., Tang, S., Knapp, S. J. & Rieseberg, L. H. Genetic analysis of sunflower domestication. Genetics 161, 1257–1267 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Burke, J. M., Knapp, S. J. & Rieseberg, L. H. Genetic consequences of selection during the evolution of cultivated sunflower. Genetics 171, 1933–1940 (2005).

    CAS  Article  Google Scholar 

  16. 16.

    Harter, A. V. et al. Origin of extant domesticated sunflowers in eastern North America. Nature 430, 201–205 (2004).

    CAS  Article  Google Scholar 

  17. 17.

    Smith, B. D. Eastern North America as an independent center of plant domestication. Proc. Natl Acad. Sci. USA 103, 12223–12228 (2006).

    CAS  Article  Google Scholar 

  18. 18.

    Korell, M., Mosges, G. & Friedt, W. Construction of a sunflower pedigree map. Helia 15, 7–16 (1992).

    Google Scholar 

  19. 19.

    Putt, E. D. in Sunflower Technology and Production Vol. 35 (ed. Schneiter A. A.) 1–19 (American Society of Agronomy, Madison, 1997).

  20. 20.

    Rauf, S. Breeding sunflower (Helianthus annuus L.) for drought tolerance. CBCS 3, 29–44 (2008).

    Google Scholar 

  21. 21.

    Mayrose, M., Kane, N. C., Mayrose, I., Dlugosch, K. M. & Rieseberg, L. H. Increased growth in sunflower correlates with reduced defences and altered gene expression in response to biotic and abiotic stress. Mol. Ecol. 20, 4683–4694 (2011).

    Article  Google Scholar 

  22. 22.

    Seiler, G. J. Utilization of wild sunflower species for the improvement of cultivated sunflower. Field Crops Res. 30, 195–230 (1992).

    Article  Google Scholar 

  23. 23.

    Seiler, G. J., Qi, L. L. & Marek, L. F. Utilization of sunflower crop wild relatives for cultivated sunflower improvement. Crop Sci. 57, 1083–1101 (2017).

    Article  Google Scholar 

  24. 24.

    Ma, G. J., Markell, S. G., Song, Q. J. & Qi, L. L. Genotyping-by-sequencing targeting of a novel downy mildew resistance gene Pl 20 from wild Helianthus argophyllus for sunflower (Helianthus annuus L.). Theor. Appl. Genet. 30, 1519–1529 (2017).

    Article  Google Scholar 

  25. 25.

    Dempewolf, H. et al. Past and future use of wild relatives in crop breeding. Crop Sci. 57, 1070–1082 (2017).

    Article  Google Scholar 

  26. 26.

    Baute, G. J., Kane, N. C., Grassa, C. J., Lai, Z. & Rieseberg, L. H. Genome scans reveal candidate domestication and improvement genes in cultivated sunflower, as well as post-domestication introgression with wild relatives. New Phytol. 206, 830–838 (2015).

    CAS  Article  Google Scholar 

  27. 27.

    Leclercq, P. Cytoplasmic male sterility in sunflower. Ann. Amelior. Plant 19, 99–106 (1969).

    Google Scholar 

  28. 28.

    Kinman, M. L. New developments in the USDA and state experiment station sunflower breeding programs. In Proc. 4th Int. Sunflower Conference 181–183 (International Sunflower Association, 1970).

  29. 29.

    Miller, J. F. & Fick, G. N. in Sunflower Technology and Production Vol. 35 (ed. Schneiter A. A.) 441–496 (American Society of Agronomy, Madison, 1997).

  30. 30.

    Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).

    CAS  Article  Google Scholar 

  31. 31.

    Mandel, J. R., Dechaine, J. M., Marek, L. F. & Burke, J. M. Genetic diversity and population structure in cultivated sunflower and a comparison to its wild progenitor, Helianthus annuus L. Theor. Appl. Genet. 123, 693–704 (2011).

    CAS  Article  Google Scholar 

  32. 32.

    Miller, J. F., Gulya, T. J. & Vick, B. A. Registration of three maintainer (HA 456, HA 457, and HA 412 HO) high-oleic oilseed sunflower germplasms. Crop Sci. 46, 2728 (2006).

    Article  Google Scholar 

  33. 33.

    Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  Google Scholar 

  34. 34.

    Mandel, J. R. et al. Association mapping and the genomic consequences of selection in sunflower. PLoS Genet. 9, e1003378 (2013).

    CAS  Article  Google Scholar 

  35. 35.

    Baute, G. J. Genomics of Sunflower Improvement: From Wild Relatives to a Global Oil Seed. Dissertation, Univ. British Columbia (2015).

  36. 36.

    Chung, H. S. & Howe, G. A. A critical role for the TIFY motif in repression of jasmonate signaling by a stabilized splice variant of the JASMONATE ZIM-domain protein JAZ10 in Arabidopsis. Plant Cell 21, 131–145 (2009).

    CAS  Article  Google Scholar 

  37. 37.

    Miller, J. F., Gulya, T. J. & Seiler, G. J. Registration of five fertility restorer sunflower germplasms. Crop Sci. 42, 989 (2002).

    Article  Google Scholar 

  38. 38.

    Kane, N. C. et al. Comparative genomic and population genetic analyses indicate highly porous genomes and high levels of gene flow between divergent Helianthus species. Evolution 63, 2061–2075 (2009).

    CAS  Article  Google Scholar 

  39. 39.

    Brisbin, A. et al. PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Hum. Biol. 84, 343–364 (2013).

    Article  Google Scholar 

  40. 40.

    Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).

    Article  Google Scholar 

  41. 41.

    Hufford, M. B. et al. The genomic signature of crop-wild introgression in maize. PLoS Genet. 9, e1003477 (2013).

    CAS  Article  Google Scholar 

  42. 42.

    Jocic, S. et al. Towards sustainable downy mildew resistance in sunflower. Helia 35, 61–72 (2012).

    Article  Google Scholar 

  43. 43.

    Vaid, N., Macovei, A. & Tuteja, N. Knights in action: lectin receptor-like kinases in plant development and stress responses. Mol. Plant 6, 1405–1418 (2013).

    CAS  Article  Google Scholar 

  44. 44.

    Gupta, S. K., Rai, A. K., Kanwar, S. S. & Sharma, T. R. Comparative analysis of zinc finger proteins involved in plant disease resistance. PLoS ONE 7, e42578 (2012).

    CAS  Article  Google Scholar 

  45. 45.

    Zhang, Y., Fan, W., Kinkema, M., Li, X. & Dong, X. Interaction of NPR1 with basic leucine zipper protein transcription factors that bind sequences required for salicylic acid induction of the PR-1 gene. Proc. Natl Acad. Sci. USA 96, 6523–6528 (1999).

    CAS  Article  Google Scholar 

  46. 46.

    Asai, T. et al. MAP kinase signalling cascade in Arabidopsis innate immunity. Nature 415, 977–983 (2002).

    CAS  Article  Google Scholar 

  47. 47.

    Romeis, T. Protein kinases in the plant defence response. Curr. Opin. Plant Biol. 4, 407–414 (2001).

    CAS  Article  Google Scholar 

  48. 48.

    Kalde, M., Nuhse, T. S., Findlay, K. & Peck, S. C. The syntaxin SYP132 contributes to plant resistance against bacteria and secretion of pathogenesis-related protein 1. Proc. Natl Acad. Sci. USA 104, 11850–11855 (2007).

    CAS  Article  Google Scholar 

  49. 49.

    Oh, I. S. et al. Secretome analysis reveals an Arabidopsis lipase involved in defense against Alternaria brassicicola. Plant Cell 17, 2832–2847 (2005).

    CAS  Article  Google Scholar 

  50. 50.

    Hajjar, R. & Hodgkin, T. The use of wild relatives in crop improvement: a survey of developments over the last 20 years. Euphytica 156, 1–13 (2007).

    Article  Google Scholar 

  51. 51.

    Zamir, D. Improving plant breeding with exotic genetic libraries. Nat. Rev. Genet. 2, 983–989 (2001).

    CAS  Article  Google Scholar 

  52. 52.

    Qi, L. L., Foley, M. E., Cai, X. W. & Gulya, T. J. Genetics and mapping of a novel downy mildew resistance gene, Pl 18, introgressed from wild Helianthus argophyllus into cultivated sunflower (Helianthus annuus L.). Theor. Appl. Genet. 129, 741–752 (2016).

    CAS  Article  Google Scholar 

  53. 53.

    Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).

  54. 54.

    Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    CAS  Article  Google Scholar 

  55. 55.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    CAS  Article  Google Scholar 

  56. 56.

    Schapranow, M. P., Häger, F., Fähnrich, C., Ziegler, E. & Plattner, H. In-memory computing enabling real-time genome data analysis. Int. J. Adv. Life Sci. 6, 11–29 (2014).

    Google Scholar 

  57. 57.

    Picard v.2.5.0 (Broad Institute, 2016);

  58. 58.

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at (2012).

  59. 59.

    vcflib (MIT, 2015);

  60. 60.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  Article  Google Scholar 

  61. 61.

    Boisvert, S., Laviolette, F. & Corbeil, J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J. Comput. Biol. 17, 1519–1533 (2010).

    CAS  Article  Google Scholar 

  62. 62.

    UniVec Database (NCBI, 2016);

  63. 63.

    Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    CAS  Article  Google Scholar 

  64. 64.

    Dong, Q., Shannon, D. S. & Brendel , V. PlantGDB, plant genome database and analysis tools. Nucleic Acids Res. 32, D354–D359 (2004).

    CAS  Article  Google Scholar 

  65. 65.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  66. 66.

    Seqtk v.1.0 (MIT, 2013);

  67. 67.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    CAS  Article  Google Scholar 

  68. 68.

    Gao, X., Starmer, J. & Martin, E. R. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet. Epidemiol. 32, 361–369 (2008).

    Article  Google Scholar 

  69. 69.

    Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).

    CAS  Article  Google Scholar 

  70. 70.

    Phanstiel, D. H., Boyle, A. P., Araya, C. L. & Snyder, M. P. Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics 30, 2808–2810 (2014).

    CAS  Article  Google Scholar 

  71. 71.

    Gene Ontology Annotation (GOA) Database (EMBL-EBI, 2016);

  72. 72.

    Supek, F. et al. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011).

    CAS  Article  Google Scholar 

  73. 73.

    Gulya, T. J. Everything you should know about downy mildew testing but were afraid to ask. In Proc. 18th Sunflower Research Workshop 39–48 (National Sunflower Association, 1996).

  74. 74.

    Cohen, Y. & Sackston, W. E. Factors affecting infection of sunflowers by Plasmopara halstedii. Can. J. Bot. 51, 15–22 (1973).

    Article  Google Scholar 

  75. 75.

    Trojanová, Z., Sedlářová, M., Gulya, T. J. & Lebeda, A. Methodology of virulence screening and race characterization of Plasmopara halstedii, and resistance evaluation in sunflower—a review. Plant Pathol. 66, 171–185 (2017).

    Article  Google Scholar 

  76. 76.

    Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).

    CAS  Article  Google Scholar 

Download references


We thank the Genome Quebec Innovation Centre and UBC’s Biodiversity Research Centre for conducting the sequencing, M. Heffernan for the development of the statistical pipeline for GWAS, K. Rashid for providing downy mildew isolates, W. Cheung for project coordination and assistance with experimental work, and SAP SE for computing resources. Funding was provided by Genome Canada and Genome BC (LSARP2014-223SUN), the NSF Plant Genome Program (DBI-0820451 and DBI-1444522) and the International Consortium for Sunflower Genomic Resources.

Author information




J.R.M., M.T., G.J.B., C.J.G., D.P.E., K.L.O., S.Y, B.T.M. and N.C.K. contributed to DNA sample collection and data production. N.B. and M.T. mapped downy mildew resistance. S.H., J.O. and E.Z. conducted the alignments and variant calling. S.H. performed the genome scans, pan-genome analyses, introgression analyses and GWAS. J.S.L. conducted the expression analyses. G.L.O. archived the data. J.E.B., I.C., L.G. and R.R.M. optimized the SNP data set for GWAS. L.H.R., J.M.B., N.B.L., S.M., T.K. and D.Z.H.S. designed the experiments and coordinated the project. S.H., N.B., J.M.B. and L.H.R. wrote the manuscript.

Corresponding author

Correspondence to Sariel Hübner.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary References, Supplementary Tables 1–11 and legends for Supplementary Figures 1–17.

Reporting Summary

Supplementary Figures

Supplementary Figures 1–17.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hübner, S., Bercovich, N., Todesco, M. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nature Plants 5, 54–62 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing