Article | Published:

Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance


Domesticated plants and animals often display dramatic responses to selection, but the origins of the genetic diversity underlying these responses remain poorly understood. Despite domestication and improvement bottlenecks, the cultivated sunflower remains highly variable genetically, possibly due to hybridization with wild relatives. To characterize genetic diversity in the sunflower and to quantify contributions from wild relatives, we sequenced 287 cultivated lines, 17 Native American landraces and 189 wild accessions representing 11 compatible wild species. Cultivar sequences failing to map to the sunflower reference were assembled de novo for each genotype to determine the gene repertoire, or ‘pan-genome’, of the cultivated sunflower. Assembled genes were then compared to the wild species to estimate origins. Results indicate that the cultivated sunflower pan-genome comprises 61,205 genes, of which 27% vary across genotypes. Approximately 10% of the cultivated sunflower pan-genome is derived through introgression from wild sunflower species, and 1.5% of genes originated solely through introgression. Gene ontology functional analyses further indicate that genes associated with biotic resistance are over-represented among introgressed regions, an observation consistent with breeding records. Analyses of allelic variation associated with downy mildew resistance provide an example in which such introgressions have contributed to resistance to a globally challenging disease.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

All raw sequence data are stored in the Sequence Read Archive (SRA) under Bioproject PRJNA353001 for cultivars and PRJNA397453 for wild and landrace. Accession numbers for each sample are listed in Supplementary Table 10. The SNP data set in vcf format, pan-genome sequences in fasta format and genome scan statistics in bed files format can be downloaded from

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop domestication. Cell 127, 1309–1321 (2006).

  2. 2.

    Harlan, J. R. Crops and Man 2nd edn (American Society of Agronomy, Madison, 1992).

  3. 3.

    Ladizinsky, G. Plant Evolution Under Domestication (Springer, Dordrecht, 1998).

  4. 4.

    Galluzzi, G., Van Duijvendijk, C., Collette, L. Azzu, N. & Hodgkin, T. Biodiversity for Food and Agriculture. Contributing to Food Security and Sustainability in a Changing World (FAO, 2011).

  5. 5.

    Golicz, A. A., Batley, J. & Edwards, D. Towards plant pangenomics. Plant Biotechnol. J. 14, 1099–1105 (2016).

  6. 6.

    Barabaschi, D. et al. Next generation breeding. Plant Sci. 242, 3–13 (2016).

  7. 7.

    Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial ‘pan-genome’. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).

  8. 8.

    Medini, D. et al. Microbiology in the post-genomic era. Nat. Rev. Microbiol. 6, 419–430 (2008).

  9. 9.

    Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419–423 (2011).

  10. 10.

    Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

  11. 11.

    Lin, K. et al. Beyond genomic variation—comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage. BMC Genomics 15, 250 (2014).

  12. 12.

    Hirsch, C. N. et al. Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135 (2014).

  13. 13.

    Lu, F. et al. High-resolution genetic mapping of maize pan-genome sequence anchors. Nat. Commun. 6, 6914 (2015).

  14. 14.

    Burke, J. M., Tang, S., Knapp, S. J. & Rieseberg, L. H. Genetic analysis of sunflower domestication. Genetics 161, 1257–1267 (2002).

  15. 15.

    Burke, J. M., Knapp, S. J. & Rieseberg, L. H. Genetic consequences of selection during the evolution of cultivated sunflower. Genetics 171, 1933–1940 (2005).

  16. 16.

    Harter, A. V. et al. Origin of extant domesticated sunflowers in eastern North America. Nature 430, 201–205 (2004).

  17. 17.

    Smith, B. D. Eastern North America as an independent center of plant domestication. Proc. Natl Acad. Sci. USA 103, 12223–12228 (2006).

  18. 18.

    Korell, M., Mosges, G. & Friedt, W. Construction of a sunflower pedigree map. Helia 15, 7–16 (1992).

  19. 19.

    Putt, E. D. in Sunflower Technology and Production Vol. 35 (ed. Schneiter A. A.) 1–19 (American Society of Agronomy, Madison, 1997).

  20. 20.

    Rauf, S. Breeding sunflower (Helianthus annuus L.) for drought tolerance. CBCS 3, 29–44 (2008).

  21. 21.

    Mayrose, M., Kane, N. C., Mayrose, I., Dlugosch, K. M. & Rieseberg, L. H. Increased growth in sunflower correlates with reduced defences and altered gene expression in response to biotic and abiotic stress. Mol. Ecol. 20, 4683–4694 (2011).

  22. 22.

    Seiler, G. J. Utilization of wild sunflower species for the improvement of cultivated sunflower. Field Crops Res. 30, 195–230 (1992).

  23. 23.

    Seiler, G. J., Qi, L. L. & Marek, L. F. Utilization of sunflower crop wild relatives for cultivated sunflower improvement. Crop Sci. 57, 1083–1101 (2017).

  24. 24.

    Ma, G. J., Markell, S. G., Song, Q. J. & Qi, L. L. Genotyping-by-sequencing targeting of a novel downy mildew resistance gene Pl 20 from wild Helianthus argophyllus for sunflower (Helianthus annuus L.). Theor. Appl. Genet. 30, 1519–1529 (2017).

  25. 25.

    Dempewolf, H. et al. Past and future use of wild relatives in crop breeding. Crop Sci. 57, 1070–1082 (2017).

  26. 26.

    Baute, G. J., Kane, N. C., Grassa, C. J., Lai, Z. & Rieseberg, L. H. Genome scans reveal candidate domestication and improvement genes in cultivated sunflower, as well as post-domestication introgression with wild relatives. New Phytol. 206, 830–838 (2015).

  27. 27.

    Leclercq, P. Cytoplasmic male sterility in sunflower. Ann. Amelior. Plant 19, 99–106 (1969).

  28. 28.

    Kinman, M. L. New developments in the USDA and state experiment station sunflower breeding programs. In Proc. 4th Int. Sunflower Conference 181–183 (International Sunflower Association, 1970).

  29. 29.

    Miller, J. F. & Fick, G. N. in Sunflower Technology and Production Vol. 35 (ed. Schneiter A. A.) 441–496 (American Society of Agronomy, Madison, 1997).

  30. 30.

    Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).

  31. 31.

    Mandel, J. R., Dechaine, J. M., Marek, L. F. & Burke, J. M. Genetic diversity and population structure in cultivated sunflower and a comparison to its wild progenitor, Helianthus annuus L. Theor. Appl. Genet. 123, 693–704 (2011).

  32. 32.

    Miller, J. F., Gulya, T. J. & Vick, B. A. Registration of three maintainer (HA 456, HA 457, and HA 412 HO) high-oleic oilseed sunflower germplasms. Crop Sci. 46, 2728 (2006).

  33. 33.

    Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

  34. 34.

    Mandel, J. R. et al. Association mapping and the genomic consequences of selection in sunflower. PLoS Genet. 9, e1003378 (2013).

  35. 35.

    Baute, G. J. Genomics of Sunflower Improvement: From Wild Relatives to a Global Oil Seed. Dissertation, Univ. British Columbia (2015).

  36. 36.

    Chung, H. S. & Howe, G. A. A critical role for the TIFY motif in repression of jasmonate signaling by a stabilized splice variant of the JASMONATE ZIM-domain protein JAZ10 in Arabidopsis. Plant Cell 21, 131–145 (2009).

  37. 37.

    Miller, J. F., Gulya, T. J. & Seiler, G. J. Registration of five fertility restorer sunflower germplasms. Crop Sci. 42, 989 (2002).

  38. 38.

    Kane, N. C. et al. Comparative genomic and population genetic analyses indicate highly porous genomes and high levels of gene flow between divergent Helianthus species. Evolution 63, 2061–2075 (2009).

  39. 39.

    Brisbin, A. et al. PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Hum. Biol. 84, 343–364 (2013).

  40. 40.

    Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).

  41. 41.

    Hufford, M. B. et al. The genomic signature of crop-wild introgression in maize. PLoS Genet. 9, e1003477 (2013).

  42. 42.

    Jocic, S. et al. Towards sustainable downy mildew resistance in sunflower. Helia 35, 61–72 (2012).

  43. 43.

    Vaid, N., Macovei, A. & Tuteja, N. Knights in action: lectin receptor-like kinases in plant development and stress responses. Mol. Plant 6, 1405–1418 (2013).

  44. 44.

    Gupta, S. K., Rai, A. K., Kanwar, S. S. & Sharma, T. R. Comparative analysis of zinc finger proteins involved in plant disease resistance. PLoS ONE 7, e42578 (2012).

  45. 45.

    Zhang, Y., Fan, W., Kinkema, M., Li, X. & Dong, X. Interaction of NPR1 with basic leucine zipper protein transcription factors that bind sequences required for salicylic acid induction of the PR-1 gene. Proc. Natl Acad. Sci. USA 96, 6523–6528 (1999).

  46. 46.

    Asai, T. et al. MAP kinase signalling cascade in Arabidopsis innate immunity. Nature 415, 977–983 (2002).

  47. 47.

    Romeis, T. Protein kinases in the plant defence response. Curr. Opin. Plant Biol. 4, 407–414 (2001).

  48. 48.

    Kalde, M., Nuhse, T. S., Findlay, K. & Peck, S. C. The syntaxin SYP132 contributes to plant resistance against bacteria and secretion of pathogenesis-related protein 1. Proc. Natl Acad. Sci. USA 104, 11850–11855 (2007).

  49. 49.

    Oh, I. S. et al. Secretome analysis reveals an Arabidopsis lipase involved in defense against Alternaria brassicicola. Plant Cell 17, 2832–2847 (2005).

  50. 50.

    Hajjar, R. & Hodgkin, T. The use of wild relatives in crop improvement: a survey of developments over the last 20 years. Euphytica 156, 1–13 (2007).

  51. 51.

    Zamir, D. Improving plant breeding with exotic genetic libraries. Nat. Rev. Genet. 2, 983–989 (2001).

  52. 52.

    Qi, L. L., Foley, M. E., Cai, X. W. & Gulya, T. J. Genetics and mapping of a novel downy mildew resistance gene, Pl 18, introgressed from wild Helianthus argophyllus into cultivated sunflower (Helianthus annuus L.). Theor. Appl. Genet. 129, 741–752 (2016).

  53. 53.

    Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).

  54. 54.

    Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

  55. 55.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  56. 56.

    Schapranow, M. P., Häger, F., Fähnrich, C., Ziegler, E. & Plattner, H. In-memory computing enabling real-time genome data analysis. Int. J. Adv. Life Sci. 6, 11–29 (2014).

  57. 57.

    Picard v.2.5.0 (Broad Institute, 2016);

  58. 58.

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at (2012).

  59. 59.

    vcflib (MIT, 2015);

  60. 60.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  61. 61.

    Boisvert, S., Laviolette, F. & Corbeil, J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J. Comput. Biol. 17, 1519–1533 (2010).

  62. 62.

    UniVec Database (NCBI, 2016);

  63. 63.

    Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

  64. 64.

    Dong, Q., Shannon, D. S. & Brendel , V. PlantGDB, plant genome database and analysis tools. Nucleic Acids Res. 32, D354–D359 (2004).

  65. 65.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  66. 66.

    Seqtk v.1.0 (MIT, 2013);

  67. 67.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

  68. 68.

    Gao, X., Starmer, J. & Martin, E. R. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet. Epidemiol. 32, 361–369 (2008).

  69. 69.

    Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).

  70. 70.

    Phanstiel, D. H., Boyle, A. P., Araya, C. L. & Snyder, M. P. Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics 30, 2808–2810 (2014).

  71. 71.

    Gene Ontology Annotation (GOA) Database (EMBL-EBI, 2016);

  72. 72.

    Supek, F. et al. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011).

  73. 73.

    Gulya, T. J. Everything you should know about downy mildew testing but were afraid to ask. In Proc. 18th Sunflower Research Workshop 39–48 (National Sunflower Association, 1996).

  74. 74.

    Cohen, Y. & Sackston, W. E. Factors affecting infection of sunflowers by Plasmopara halstedii. Can. J. Bot. 51, 15–22 (1973).

  75. 75.

    Trojanová, Z., Sedlářová, M., Gulya, T. J. & Lebeda, A. Methodology of virulence screening and race characterization of Plasmopara halstedii, and resistance evaluation in sunflower—a review. Plant Pathol. 66, 171–185 (2017).

  76. 76.

    Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).

Download references


We thank the Genome Quebec Innovation Centre and UBC’s Biodiversity Research Centre for conducting the sequencing, M. Heffernan for the development of the statistical pipeline for GWAS, K. Rashid for providing downy mildew isolates, W. Cheung for project coordination and assistance with experimental work, and SAP SE for computing resources. Funding was provided by Genome Canada and Genome BC (LSARP2014-223SUN), the NSF Plant Genome Program (DBI-0820451 and DBI-1444522) and the International Consortium for Sunflower Genomic Resources.

Author information

J.R.M., M.T., G.J.B., C.J.G., D.P.E., K.L.O., S.Y, B.T.M. and N.C.K. contributed to DNA sample collection and data production. N.B. and M.T. mapped downy mildew resistance. S.H., J.O. and E.Z. conducted the alignments and variant calling. S.H. performed the genome scans, pan-genome analyses, introgression analyses and GWAS. J.S.L. conducted the expression analyses. G.L.O. archived the data. J.E.B., I.C., L.G. and R.R.M. optimized the SNP data set for GWAS. L.H.R., J.M.B., N.B.L., S.M., T.K. and D.Z.H.S. designed the experiments and coordinated the project. S.H., N.B., J.M.B. and L.H.R. wrote the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to Sariel Hübner.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary References, Supplementary Tables 1–11 and legends for Supplementary Figures 1–17.

Reporting Summary

Supplementary Figures

Supplementary Figures 1–17.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Genomic variation in the SAM population based on 675,291 SNPs detected across all lines.
Fig. 2: The cultivated sunflower pan-genome.
Fig. 3: Average number of genes in the cultivated sunflower pan-genome assigned to the wild source.
Fig. 4: Genome-wide association mapping of downy mildew resistance in the SAM population.