The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution

Journal name:
Nature
Volume:
546,
Pages:
148–152
Date published:
DOI:
doi:10.1038/nature22380
Received
Accepted
Published online

The domesticated sunflower, Helianthus annuus L., is a global oil crop that has promise for climate change adaptation, because it can maintain stable yields across a wide variety of environmental conditions, including drought1. Even greater resilience is achievable through the mining of resistance alleles from compatible wild sunflower relatives2, 3, including numerous extremophile species4. Here we report a high-quality reference for the sunflower genome (3.6 gigabases), together with extensive transcriptomic data from vegetative and floral organs. The genome mostly consists of highly similar, related sequences5 and required single-molecule real-time sequencing technologies for successful assembly. Genome analyses enabled the reconstruction of the evolutionary history of the Asterids, further establishing the existence of a whole-genome triplication at the base of the Asterids II clade6 and a sunflower-specific whole-genome duplication around 29 million years ago7. An integrative approach combining quantitative genetics, expression and diversity data permitted development of comprehensive gene networks for two major breeding traits, flowering time and oil metabolism, and revealed new candidate genes in these networks. We found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years. This genome represents a cornerstone for future research programs aiming to exploit genetic diversity to improve biotic and abiotic stress resistance and oil production, while also considering agricultural constraints and human nutritional needs8, 9.

At a glance

Figures

  1. The sunflower genome assembly allows integration of diversity, genetics and expression data.
    Figure 1: The sunflower genome assembly allows integration of diversity, genetics and expression data.

    a, Circular representation of the pseudomolecules. b, Density of two families of long retrotransposon terminal repeats (5.25–9.5 kb in blue and 9.5–12.25 kb in red) and genes (purple). c, SNP density in 80 lines of the domesticated sunflower. d, Locations of genes mapping to oil metabolic pathways. e, QTLs for seven oil-related traits, whereby the colour (from light to dark) indicates the trait: palmitate, linoleate, oil content, oleate, phytosterol, stearate and tocopherol. f, Regions associated with flowering time in domesticated sunflowers. g, Location of homologues of A. thaliana flowering genes. h, Expression of organ-specific genes (from outside to inside tracks: pollen, stamen, pistil, disc floret ovary, ray floret ovary, disc floret corolla, bract, ray floret ligule, leaf, stem and root).

  2. Sunflower evolutionary history.
    Figure 2: Sunflower evolutionary history.

    a, Evolutionary scenario of the Asterids (sunflower, artichoke, lettuce and coffee) from the AEKs of 21 (post-WGT-γ) and 7 (pre-WGT-γ) protochromosomes. The modern genomes are illustrated at the bottom with the different colours reflecting the origin from the seven ancestral chromosomes from the n = 7 AEK (top). Polyploidization events are shown with coloured dots (duplications) and stars (triplications), along with the shuffling events (fusions and fissions). The time scale is shown on the left (million years). b, Ks distributions. Left y axis, sunflower paralogues (black); right y axis, coffee paralogues (orange), artichoke paralogues (blue) and sunflower–coffee orthologues (purple). Polyploidization (WGT-1, WGD-2 and WGT-γ) and speciation (sunflower–coffee) events are referenced on the x axis. c, Dot plots of paralogues in sunflower, artichoke and coffee genomes illustrating, respectively, WGD-2 (1–2 chromosomal relationships in red circles), WGT-1 (1–3 relationships in blue circles) and WGT-γ (1–3 relationships in brown circles) events.

  3. Age distribution of transposons in the sunflower.
    Extended Data Fig. 1: Age distribution of transposons in the sunflower.

    The x axis represents the age of insertions in millions of years, the y axis is the density of insertions at a given time point. Top, the age distribution of each superfamily of subclass I of the Class II transposons (the terminal inverted repeat transposons). Bottom, the age distribution of LTR-RT superfamilies.

  4. The density of LTR-RTs in 1 Mb bins per chromosome.
    Extended Data Fig. 2: The density of LTR-RTs in 1 Mb bins per chromosome.

    The scale represents a fraction, where 1.0 is 100% of a given bin.

  5. Comparison of grape–sunflower–artichoke–coffee–lettuce genomes.
    Extended Data Fig. 3: Comparison of grape–sunflower–artichoke–coffee–lettuce genomes.

    Top, dot plots of orthologues between the grape genome (y axis, as a representative of the n = 21 post-γ ancestor) and, from left to right, the sunflower (1–6 chromosomal relationships inherited from WGT-1 and WGD-2), artichoke (1–3 chromosomal relationships deriving from WGT-1), coffee (1–1 chromosomal relationships illustrating the absence of a coffee-specific WGD, despite WGT-1) genomes and the lettuce genetic map (1–3 chromosomal relationships deriving from WGT-1). Bottom, dot plots of orthologues between the sunflower genome (y axis, n = 17 chromosomes) and artichoke (x axis, n = 17 chromosomes) and lettuce (x axis, n = 9 chromosomes) genomes with 1–1 chromosomal relationships.

  6. Organ-specific expression in the sunflower transcriptome.
    Extended Data Fig. 4: Organ-specific expression in the sunflower transcriptome.

    a, Histogram of the specificity index Tau in expressed genes. b, Box plot distribution of the specificity index Tau in 11 different organs. The different organs are represented with the following colours: Ray floret ovary, dark brown; disc floret corolla, orange; ray floret ligule, yellow; bract, bright green; stem, dark green; pistil, bright blue; roots, dark blue; leaves, light green; disc floret ovary (seeds), red; stamens, magenta; pollen, light blue. c, Violin plot of the specificity index Tau for transcription factors (TFs, magenta) and long non-coding RNA (lncRNA, light blue). d, Cumulative bar plot showing the organ distribution of specific genes (left), transcription factors (middle) and lncRNA (right). Colours are the same as in b.

  7. Integrative analysis of flowering time.
    Extended Data Fig. 5: Integrative analysis of flowering time.

    a, Flowering time network in the sunflower. Flowering time genes of A. thaliana and their interactions are drawn in green. Sunflower genes and orthology relationships with A. thaliana genes are shown in orange. b, Genomic architecture of flowering time in the domesticated sunflower. Outer ring, location of genomic regions associated with flowering time. Inner ring, links between ohnologues of a sunflower-specific whole-genome duplication (WGD-2), limited to genes located in regions associated with flowering time. Links between ohnologues of WGD-2 that are both located in regions associated with flowering time are drawn in red, other links are drawn in grey. c, Pathway of the integration of flowering signals in meristem (simplified pathway adapted from ref. 20). The bright orange backgrounds indicate genes for which at least one sunflower orthologue was located in a region associated with flowering time. Bold italic genes indicates genes for which we identified additional in-paralogues compared to a previous study using more limited genomic data21. Simple arrows represent positive regulation and other arrows negative regulation. Curved lines between genes represent protein–protein complexes.

  8. Integrative analysis of oil metabolism.
    Extended Data Fig. 6: Integrative analysis of oil metabolism.

    a, Whole-metabolic network (3,821 reactions and 475 pathways). Genes are coloured by expression levels in developing seeds. b, Co-expression network of oil metabolic pathway. Genes that co-localize with QTLs are coloured in orange. c, Sub-network with genes from b co-localizing with QTLs. Node size is proportional to Fst between lines cultivated for oil production and other domesticated lines. Genes with an Fst in the top 5% are coloured in dark orange. d, Mapping of candidate genes (orange genes from c) on the pathways of diacylglycerol and triacylglycerol biosynthesis. e, Mapping of candidate genes on the pathway of linoleate biosynthesis. f, Tree of a gene cluster including a candidate gene of the PAP2 superfamily, involved in the synthesis of fatty acid precursors (d). Athal, Arabidopsis thaliana; Brapa, Brassica rapa; Ccard, Cynara cardunculus; Hvulg, Hordeum vulgare; Osati, Oryza sativa; Ptrich, Populus trichocarpa.

Tables

  1. Link between the genomic architecture of flowering time and the most recent whole-genome duplication experienced by the sunflower
    Extended Data Table 1: Link between the genomic architecture of flowering time and the most recent whole-genome duplication experienced by the sunflower

Accession codes

References

  1. Kane, N. C. & Rieseberg, L. H. Selective sweeps reveal candidate genes for adaptation to drought and salt tolerance in common sunflower, Helianthus annuus. Genetics 175, 18231834 (2007)
  2. Zamir, D. Improving plant breeding with exotic genetic libraries. Nat. Rev. Genet. 2, 983989 (2001)
  3. Fernández-Martínez, J., Melero-Vara, J., Munõz-Ruz, J., Ruso, J. & Domínguez, J. Selection of wild and cultivated sunflower for resistance to a new broomrape race that overcomes resistance of the Or5 gene. Crop Sci. 40, 550555 (2000)
  4. Seiler, G. J. Wild annual Helianthus anomalus and H. deserticola for improving oil content and quality in sunflower. Ind. Crops Prod. 25, 95100 (2007)
  5. Staton, S. E. et al. The sunflower (Helianthus annuus L.) genome reflects a recent history of biased accumulation of transposable elements. Plant J. 72, 142153 (2012)
  6. Barker, M. S. et al. Most Compositae (Asteraceae) are descendants of a paleohexaploid and all share a paleotetraploid ancestor with the Calyceraceae. Am. J. Bot. 103, 12031211 (2016)
  7. Barker, M. S. et al. Multiple paleopolyploidizations during the evolution of the Compositae reveal parallel patterns of duplicate gene retention after millions of years. Mol. Biol. Evol. 25, 24452455 (2008)
  8. Challinor, A. J., Ewert, F., Arnold, S., Simelton, E. & Fraser, E. Crops and climate change: progress, trends, and challenges in simulating impacts and informing adaptation. J. Exp. Bot. 60, 27752789 (2009)
  9. Lobell, D. B. et al. Prioritizing climate change adaptation needs for food security in 2030. Science 319, 607610 (2008)
  10. Rieseberg, L. H., Van Fossen, C. & Desrochers, A. M. Hybrid speciation accompanied by genomic reorganization in wild sunflowers. Nature 375, 313316 (1995)
  11. Vandenbrink, J. P., Brown, E. A., Harmer, S. L. & Blackman, B. K. Turning heads: the biology of solar tracking in sunflower. Plant Sci. 224, 2026 (2014)
  12. Tähtiharju, S. et al. Evolution and diversification of the CYC/TB1 gene family in Asteraceae—a comparative study in Gerbera (Mutisieae) and sunflower (Heliantheae). Mol. Biol. Evol. 29, 11551166 (2012)
  13. Kane, N. C. et al. Progress towards a reference genome for sunflower. Botany 89, 429437 (2011)
  14. Vitte, C. & Bennetzen, J. L. Analysis of retrotransposon structural diversity uncovers properties and propensities in angiosperm genome evolution. Proc. Natl Acad. Sci. USA 103, 1763817643 (2006)
  15. Truco, M. J. et al. An ultra-high-density, transcript-based, genetic map of lettuce. G3 (Bethesda) 3, 617631 (2013)
  16. Scaglione, D. et al. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny. Sci. Rep. 6, 19427 (2016)
  17. Denoeud, F. et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345, 11811184 (2014)
  18. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463467 (2007)
  19. Salse, J. Ancestors of modern plant crops. Curr. Opin. Plant Biol. 30, 134142 (2016)
  20. Bouché, F., Lobet, G., Tocquin, P. & Périlleux, C. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Res. 44 (D1), D1167D1171 (2016)
  21. Blackman, B. K. et al. Contributions of flowering time genes to sunflower domestication and improvement. Genetics 187, 271287 (2011)
  22. Baute, G. J., Kane, N. C., Grassa, C. J., Lai, Z. & Rieseberg, L. H. Genome scans reveal candidate domestication and improvement genes in cultivated sunflower, as well as post-domestication introgression with wild relatives. New Phytol. 206, 830838 (2015)
  23. Chapman, M. A. & Burke, J. M. Evidence of selection on fatty acid biosynthetic genes during the evolution of cultivated sunflower. Theor. Appl. Genet. 125, 897907 (2012)
  24. Merah, O. et al. Genetic analysis of phytosterol content in sunflower seeds. Theor. Appl. Genet. 125, 15891601 (2012)
  25. Haddadi, P. et al. Genetic dissection of tocopherol and phytosterol in recombinant inbred lines of sunflower through quantitative trait locus analysis and the candidate gene approach. Mol. Breed. 29, 717729 (2012)
  26. Carman, G. M. & Han, G.-S. Roles of phosphatidate phosphatase enzymes in lipid metabolism. Trends Biochem. Sci. 31, 694699 (2006)
  27. Deng, X. D., Cai, J. J. & Fei, X. W. Involvement of phosphatidate phosphatase in the biosynthesis of triacylglycerols in Chlamydomonas reinhardtii. J. Zhejiang Univ. Sci. B 14, 11211131 (2013)
  28. Bolger, M. E. et al. Plant genome sequencing — applications for crop improvement. Curr. Opin. Biotechnol. 26, 3137 (2014)
  29. Kang, Y. J. et al. Translational genomics for plant breeding with the genome sequence explosion. Plant Biotechnol. J. 14, 10571069 (2016)
  30. Curtin, S. J. et al. Validating genome-wide association candidates controlling quantitative variation in nodulation. Plant Physiol. 173, 921931 (2017)
  31. Mayjonade, B. et al. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. Biotechniques 61, 203205 (2016)
  32. Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623630 (2015)
  33. Chin, C.-S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563569 (2013)
  34. Foissac, S. et al. Genome annotation in plants and fungi: EuGene as a model platform. Curr. Bioinform. 3, 8797 (2008)
  35. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 32103212 (2015)
  36. Lamesch, P. et al. The Arabidopsis information resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40, D1202D1210 (2012)
  37. Axtell, M. J. ShortStack: comprehensive annotation and quantification of small RNA genes. RNA 19, 740751 (2013)
  38. Formey, D. et al. The small RNA diversity from Medicago truncatula roots under biotic interactions evidences the environmental plasticity of the miRNAome. Genome Biol. 15, 457 (2014)
  39. Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68D73 (2014)
  40. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008)
  41. Steinbiss, S., Willhoeft, U., Gremme, G. & Kurtz, S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 37, 70027013 (2009)
  42. Gremme, G., Steinbiss, S. & Kurtz, S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 645656 (2013)
  43. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 15861591 (2007)
  44. Strasburg, J. L. & Rieseberg, L. H. Molecular demographic history of the annual sunflowers Helianthus annuus and H. petiolaris—large effective population sizes and rates of long-term gene flow. Evolution 62, 19361950 (2008)
  45. Salse, J., Abrouk, M., Murat, F., Quraishi, U. M. & Feuillet, C. Improved criteria and comparative genomics tool provide new insights into grass paleogenomics. Brief. Bioinform. 10, 619630 (2009)
  46. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945959 (2000)
  47. Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650659 (2005)
  48. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139140 (2010)
  49. Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 22832285 (2009)
  50. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841842 (2010)
  51. Karp, P. D., Paley, S. & Romero, P. The pathway tools software. Bioinformatics 18 (Suppl 1), S225S232 (2002)
  52. Hudson, R. R., Slatkin, M. & Maddison, W. P. Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583589 (1992)
  53. De Mita, S. & Siol, M. EggLib: processing, analysis and simulation tools for population genetics and genomics. BMC Genet. 13, 27 (2012)
  54. Ebrahimi, A. et al. QTL mapping of seed-quality traits in sunflower recombinant inbred lines under different water regimes. Genome 51, 599615 (2008)
  55. Pérez-Vich, B. et al. Molecular basis of the high-palmitic acid trait in sunflower seed oil. Mol. Breed. 36, 43 (2016)
  56. Premnath, A., Narayana, M., Ramakrishnan, C., Kuppusamy, S. & Chockalingam, V. Mapping quantitative trait loci controlling oil content, oleic acid and linoleic acid content in sunflower (Helianthus annuus L.). Mol. Breed. 36, 106 (2016)

Download references

Author information

  1. These authors contributed equally to this work.

    • Hélène Badouin,
    • Jérôme Gouzy &
    • Christopher J. Grassa
  2. These authors jointly supervised this work.

    • Stéphane Muños,
    • Patrick Vincourt,
    • Loren H. Rieseberg &
    • Nicolas B. Langlade

Affiliations

  1. LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France

    • Hélène Badouin,
    • Jérôme Gouzy,
    • Christopher J. Grassa,
    • Ludovic Cottret,
    • Sébastien Carrère,
    • Baptiste Mayjonade,
    • Ludovic Legrand,
    • Nicolas Blanchet,
    • Marie-Claude Boniface,
    • Olivier Catrice,
    • Ghislain Fievet,
    • Yannick Lippi,
    • Lolita Lorenzon,
    • Gwenola Marage,
    • Gwenaëlle Marchand,
    • Prune Pegot-Espagnet,
    • Nicolas Pouilly,
    • Erika Sallet,
    • Justine Thomas,
    • Didier Varès,
    • Brigitte Mangin,
    • Stéphane Muños,
    • Patrick Vincourt &
    • Nicolas B. Langlade
  2. Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada

    • Christopher J. Grassa,
    • S. Evan Staton,
    • Gregory L. Owens,
    • Navdeep Gill,
    • Nolan C. Kane,
    • Sariel Hubner,
    • Nadia Chaidir,
    • Matthew King,
    • Evan Morien,
    • Thuy Nguyen,
    • Frances Raftis &
    • Loren H. Rieseberg
  3. INRA/UBP UMR 1095 GDEC (Genetics, Diversity and Ecophysiology of Cereals), Clermont Ferrand 63100, France

    • Florent Murat,
    • Felicity Vear &
    • Jérôme Salse
  4. Institute of Plant Sciences Paris-Saclay (IPS2), CNRS, INRA, University of Paris-Saclay, 91405 Orsay, France

    • Christine Lelandais-Brière &
    • Martin Crespi
  5. Institute of Plant Sciences Paris-Saclay (IPS2), CNRS, INRA, University of Paris-Diderot, Sorbonne Paris-Cité, 91405 Orsay, France

    • Christine Lelandais-Brière &
    • Martin Crespi
  6. Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado 80309-0334, USA

    • Nolan C. Kane
  7. Department of Plant Biology, Miller Plant Sciences, University of Georgia, Athens, Georgia 30602, USA

    • John E. Bowers &
    • John M. Burke
  8. Department of Biotechnology, Tel-Hai Academic College, Upper Galilee 12210, Israel

    • Sariel Hubner
  9. MIGAL - Galilee Research Institute, PO box 831, Kiryat Shmona 11016, Israel

    • Sariel Hubner
  10. INRA, Centre National de Ressources Génomiques Végétales, F-31326 Castanet-Tolosan, France

    • Arnaud Bellec,
    • Hélène Bergès,
    • Nicolas Helmstetter &
    • Sonia Vautrin
  11. INRA, US 1279 EPGV/CEA/CNG, Evry, France

    • Aurélie Bérard,
    • Dominique Brunel,
    • Marie-Christine Le Paslier &
    • Elodie Marquand
  12. Dow AgroSciences LLC, Indianapolis, Indiana 46268, USA

    • Nadia Chaidir
  13. Biogemma, 31700 Mondonville, France

    • Clotilde Claudel
  14. INRA, GeT-PlaGe, Genotoul, Castanet-Tolosan, France

    • Cécile Donnadieu &
    • Céline Vandecasteele
  15. INRA, UMR1388 Génétique, Physiologie et Systèmes d’Elevage, F-31326 Castanet-Tolosan, France

    • Thomas Faraut
  16. DuPont Pioneer, Johnston, Iowa 50131, USA

    • Matthew King
  17. Department of Plant Sciences, University of California, Davis, California 95616, USA

    • Steven J. Knapp
  18. Department of Biology, Indiana University, Bloomington, Indiana 47405, USA

    • Zhao Lai &
    • Loren H. Rieseberg
  19. Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana 47405, USA

    • Zhao Lai
  20. Department of Biological Sciences, University of Memphis, Memphis, Tennessee 38152, USA

    • Jennifer R. Mandel
  21. TERRES INOVIA, UMR Arche INRA/ENSAT F-31320 Castanet-Tolosan, France

    • Emmanuelle Bret-Mestries
  22. Department of Horticulture, University of Georgia, Athens, Georgia 30602, USA

    • Savithri Nambeesan
  23. Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK

    • Thuy Nguyen
  24. MIAT, Université de Toulouse, INRA, Castanet-Tolosan, France

    • Thomas Schiex

Contributions

F.M., S.E.S., L.C., C.L.-B. and G.L.O. contributed equally to this work. M.C., B.Man., J.M.B. and J.S. contributed equally to this work. A.Bel., H.Be. and N.H. prepared BAC libraries. B.May. developed the DNA-extraction protocol for PacBio sequencing. B.May., C.V. and C.D. performed PacBio sequencing. A.Bér., D.B., D.V., E.Ma., E.B.-M., G.Marc., G.Mara., J.R.M., J.T., L.Lo., M.-C.B., M.-C.L.P., N.B., N.B.L., N.P., S.N., S.V., Y.L. and Z.L. contributed to DNA/RNA sample collection and data production. O.C. performed flow cytometry experiments. N.G., T.N. and N.C.K. built the physical map and integrated the physical and genetic maps. C.J.G., S.M., J.E.B. and J.M.B. developed genetic maps. J.G. assembled the XRQ genome. C.J.G. built the XRQ pseudomolecules. C.C. performed quality control of XRQ pseudomolecules. C.J.G., J.E.B., N.C.K., S.H. and M.K. assembled the HA412-HO genome. J.G., E.S. and T.S. annotated protein-coding genes and miRNA (XRQ). S.E.S. annotated the HA412-HO genome. S.C., J.G., F.R., M.K., T.F., C.J.G., J.E.B., N.C.K., N.G., T.N., N.C., E.Mo. developed bioinformatics resources. L.Le., E.Ma. and G.F. performed bioinformatics analyses. G.L.O. conducted ancestry analyses. P.V. designed the GWAS hybrid panel. B.Man., N.B.L. and P.V. designed the GWAS experiment. F.V. developed the XRQ inbred line. B.Man., P.P.-E. conducted the GWAS analysis. L.C. conducted metabolism analyses. F.M. and J.S. conducted palaeo-evolution analyses. S.E.S. conducted repeat analyses. C.L.-B. and M.C. conducted small-RNA analyses. H.Ba., S.M. and N.B.L. performed integrated analyses on flowering time and oil metabolism. H.Ba. and N.B.L. performed transcriptomic analysis. H.Ba. performed analysis of sunflower ohnologues. S.J.K. contributed to the genome consortium coordination. N.B.L., L.H.R., P.V., S.M., J.M.B. and J.G. designed experiments and coordinated the project. L.H.R. coordinated the sunflower genome consortium. H.Ba., N.B.L. and L.H.R. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Reviewer Information Nature thanks A. Paterson, J. Schmutz and Y. Van der Peer for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Age distribution of transposons in the sunflower. (165 KB)

    The x axis represents the age of insertions in millions of years, the y axis is the density of insertions at a given time point. Top, the age distribution of each superfamily of subclass I of the Class II transposons (the terminal inverted repeat transposons). Bottom, the age distribution of LTR-RT superfamilies.

  2. Extended Data Figure 2: The density of LTR-RTs in 1 Mb bins per chromosome. (271 KB)

    The scale represents a fraction, where 1.0 is 100% of a given bin.

  3. Extended Data Figure 3: Comparison of grape–sunflower–artichoke–coffee–lettuce genomes. (277 KB)

    Top, dot plots of orthologues between the grape genome (y axis, as a representative of the n = 21 post-γ ancestor) and, from left to right, the sunflower (1–6 chromosomal relationships inherited from WGT-1 and WGD-2), artichoke (1–3 chromosomal relationships deriving from WGT-1), coffee (1–1 chromosomal relationships illustrating the absence of a coffee-specific WGD, despite WGT-1) genomes and the lettuce genetic map (1–3 chromosomal relationships deriving from WGT-1). Bottom, dot plots of orthologues between the sunflower genome (y axis, n = 17 chromosomes) and artichoke (x axis, n = 17 chromosomes) and lettuce (x axis, n = 9 chromosomes) genomes with 1–1 chromosomal relationships.

  4. Extended Data Figure 4: Organ-specific expression in the sunflower transcriptome. (177 KB)

    a, Histogram of the specificity index Tau in expressed genes. b, Box plot distribution of the specificity index Tau in 11 different organs. The different organs are represented with the following colours: Ray floret ovary, dark brown; disc floret corolla, orange; ray floret ligule, yellow; bract, bright green; stem, dark green; pistil, bright blue; roots, dark blue; leaves, light green; disc floret ovary (seeds), red; stamens, magenta; pollen, light blue. c, Violin plot of the specificity index Tau for transcription factors (TFs, magenta) and long non-coding RNA (lncRNA, light blue). d, Cumulative bar plot showing the organ distribution of specific genes (left), transcription factors (middle) and lncRNA (right). Colours are the same as in b.

  5. Extended Data Figure 5: Integrative analysis of flowering time. (373 KB)

    a, Flowering time network in the sunflower. Flowering time genes of A. thaliana and their interactions are drawn in green. Sunflower genes and orthology relationships with A. thaliana genes are shown in orange. b, Genomic architecture of flowering time in the domesticated sunflower. Outer ring, location of genomic regions associated with flowering time. Inner ring, links between ohnologues of a sunflower-specific whole-genome duplication (WGD-2), limited to genes located in regions associated with flowering time. Links between ohnologues of WGD-2 that are both located in regions associated with flowering time are drawn in red, other links are drawn in grey. c, Pathway of the integration of flowering signals in meristem (simplified pathway adapted from ref. 20). The bright orange backgrounds indicate genes for which at least one sunflower orthologue was located in a region associated with flowering time. Bold italic genes indicates genes for which we identified additional in-paralogues compared to a previous study using more limited genomic data21. Simple arrows represent positive regulation and other arrows negative regulation. Curved lines between genes represent protein–protein complexes.

  6. Extended Data Figure 6: Integrative analysis of oil metabolism. (387 KB)

    a, Whole-metabolic network (3,821 reactions and 475 pathways). Genes are coloured by expression levels in developing seeds. b, Co-expression network of oil metabolic pathway. Genes that co-localize with QTLs are coloured in orange. c, Sub-network with genes from b co-localizing with QTLs. Node size is proportional to Fst between lines cultivated for oil production and other domesticated lines. Genes with an Fst in the top 5% are coloured in dark orange. d, Mapping of candidate genes (orange genes from c) on the pathways of diacylglycerol and triacylglycerol biosynthesis. e, Mapping of candidate genes on the pathway of linoleate biosynthesis. f, Tree of a gene cluster including a candidate gene of the PAP2 superfamily, involved in the synthesis of fatty acid precursors (d). Athal, Arabidopsis thaliana; Brapa, Brassica rapa; Ccard, Cynara cardunculus; Hvulg, Hordeum vulgare; Osati, Oryza sativa; Ptrich, Populus trichocarpa.

Extended Data Tables

  1. Extended Data Table 1: Link between the genomic architecture of flowering time and the most recent whole-genome duplication experienced by the sunflower (155 KB)

Supplementary information

PDF files

  1. Supplementary Information (5.6 MB)

    This contains Supplementary Notes split into 10 sections, including methods, data and discussion (Genome Sequencing and Assembly, Genome Annotation, Paleogenomics and ancestry of the sunflower genome, Transcriptomes sequencing and analysis, Resequencing of domesticated lines, Flowering time, Analysis of sunflower ohnologs and oil metabolism) and Supplementary References.

  2. Supplementary Data 4 (2.6 MB)

    This document contains figures of windows estimates of the amount and origin of introgression in the genomes assemblies of the XRQ and Ha412 genotypes (one figure per chromosome).

  3. Supplementary Data 7 (134 KB)

    This file contains sunflower orthologs and in-paralogs of flowering time genes in Arabidopsis thaliana.

Excel files

  1. Supplementary Data 1 (1.6 MB)

    This file contains tables A-K regarding location and annotation of miRNA, siRNA, phasiRNA and miRNA targets. A–miRNA families. B- Additional miRNA families. C- All Miranda predictions. D- Non-redundant Miranda predictions. E- Target list by miRNA. F- Targets in flowering time QTL. G- all phasiRNA clusters. H-Non-redundant phasiRNA clusters. I-Intersection between phasiRNA clusters and miRNA targets. J- Clusters of mapping of 24 nucleotide sRNA. K – Intersection between genes and 24 nucleotides mapping clusters.

  2. Supplementary Data 2 (268 KB)

    This table describes paralogy relationships in the sunflower genome.

  3. Supplementary Data 3 (1.1 MB)

    This table describes orthology relationship between genes of sunflower and grape, artichoke and coffee respectively, and with the lettuce genetic map.

  4. Supplementary Data 5 (57 KB)

    This file contains tables lists of organ specific transcription factors of the MYB and TCP families in 11 sunflower organs.

  5. Supplementary Data 6 (53 KB)

    This file contains tables of Gene Ontology categories enriched in response to hormones or stress treatments in sunflower roots and leaves.

  6. Supplementary Data 8 (79 KB)

    This table contains a curated list of sunflower genes involved in seed oil metabolism, based on a review of literature.

Additional data