The domesticated sunflower, Helianthus annuus L., is a global oil crop that has promise for climate change adaptation, because it can maintain stable yields across a wide variety of environmental conditions, including drought1. Even greater resilience is achievable through the mining of resistance alleles from compatible wild sunflower relatives2, 3, including numerous extremophile species4. Here we report a high-quality reference for the sunflower genome (3.6 gigabases), together with extensive transcriptomic data from vegetative and floral organs. The genome mostly consists of highly similar, related sequences5 and required single-molecule real-time sequencing technologies for successful assembly. Genome analyses enabled the reconstruction of the evolutionary history of the Asterids, further establishing the existence of a whole-genome triplication at the base of the Asterids II clade6 and a sunflower-specific whole-genome duplication around 29 million years ago7. An integrative approach combining quantitative genetics, expression and diversity data permitted development of comprehensive gene networks for two major breeding traits, flowering time and oil metabolism, and revealed new candidate genes in these networks. We found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years. This genome represents a cornerstone for future research programs aiming to exploit genetic diversity to improve biotic and abiotic stress resistance and oil production, while also considering agricultural constraints and human nutritional needs8, 9.
- Selective sweeps reveal candidate genes for adaptation to drought and salt tolerance in common sunflower, Helianthus annuus. Genetics 175, 1823–1834 (2007) &
- Improving plant breeding with exotic genetic libraries. Nat. Rev. Genet. 2, 983–989 (2001)
- Selection of wild and cultivated sunflower for resistance to a new broomrape race that overcomes resistance of the Or5 gene. Crop Sci. 40, 550–555 (2000) , , , &
- Wild annual Helianthus anomalus and H. deserticola for improving oil content and quality in sunflower. Ind. Crops Prod. 25, 95–100 (2007)
- The sunflower (Helianthus annuus L.) genome reflects a recent history of biased accumulation of transposable elements. Plant J. 72, 142–153 (2012) et al.
- Most Compositae (Asteraceae) are descendants of a paleohexaploid and all share a paleotetraploid ancestor with the Calyceraceae. Am. J. Bot. 103, 1203–1211 (2016) et al.
- Multiple paleopolyploidizations during the evolution of the Compositae reveal parallel patterns of duplicate gene retention after millions of years. Mol. Biol. Evol. 25, 2445–2455 (2008) et al.
- Crops and climate change: progress, trends, and challenges in simulating impacts and informing adaptation. J. Exp. Bot. 60, 2775–2789 (2009) , , , &
- Prioritizing climate change adaptation needs for food security in 2030. Science 319, 607–610 (2008) et al.
- Hybrid speciation accompanied by genomic reorganization in wild sunflowers. Nature 375, 313–316 (1995) , &
- Turning heads: the biology of solar tracking in sunflower. Plant Sci. 224, 20–26 (2014) , , &
- Evolution and diversification of the CYC/TB1 gene family in Asteraceae—a comparative study in Gerbera (Mutisieae) and sunflower (Heliantheae). Mol. Biol. Evol. 29, 1155–1166 (2012) et al.
- Progress towards a reference genome for sunflower. Botany 89, 429–437 (2011) et al.
- Analysis of retrotransposon structural diversity uncovers properties and propensities in angiosperm genome evolution. Proc. Natl Acad. Sci. USA 103, 17638–17643 (2006) &
- An ultra-high-density, transcript-based, genetic map of lettuce. G3 (Bethesda) 3, 617–631 (2013) et al.
- The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny. Sci. Rep. 6, 19427 (2016) et al.
- The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345, 1181–1184 (2014) et al.
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007) et al.
- Ancestors of modern plant crops. Curr. Opin. Plant Biol. 30, 134–142 (2016)
- FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Res. 44 (D1), D1167–D1171 (2016) , , &
- Contributions of flowering time genes to sunflower domestication and improvement. Genetics 187, 271–287 (2011) et al.
- Genome scans reveal candidate domestication and improvement genes in cultivated sunflower, as well as post-domestication introgression with wild relatives. New Phytol. 206, 830–838 (2015) , , , &
- Evidence of selection on fatty acid biosynthetic genes during the evolution of cultivated sunflower. Theor. Appl. Genet. 125, 897–907 (2012) &
- Genetic analysis of phytosterol content in sunflower seeds. Theor. Appl. Genet. 125, 1589–1601 (2012) et al.
- Genetic dissection of tocopherol and phytosterol in recombinant inbred lines of sunflower through quantitative trait locus analysis and the candidate gene approach. Mol. Breed. 29, 717–729 (2012) et al.
- Roles of phosphatidate phosphatase enzymes in lipid metabolism. Trends Biochem. Sci. 31, 694–699 (2006) &
- Involvement of phosphatidate phosphatase in the biosynthesis of triacylglycerols in Chlamydomonas reinhardtii. J. Zhejiang Univ. Sci. B 14, 1121–1131 (2013) , &
- Plant genome sequencing — applications for crop improvement. Curr. Opin. Biotechnol. 26, 31–37 (2014) et al.
- Translational genomics for plant breeding with the genome sequence explosion. Plant Biotechnol. J. 14, 1057–1069 (2016) et al.
- Validating genome-wide association candidates controlling quantitative variation in nodulation. Plant Physiol. 173, 921–931 (2017) et al.
- Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. Biotechniques 61, 203–205 (2016) et al.
- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015) et al.
- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013) et al.
- Genome annotation in plants and fungi: EuGene as a model platform. Curr. Bioinform. 3, 87–97 (2008) et al.
- BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015) , , , &
- The Arabidopsis information resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40, D1202–D1210 (2012) et al.
- ShortStack: comprehensive annotation and quantification of small RNA genes. RNA 19, 740–751 (2013)
- The small RNA diversity from Medicago truncatula roots under biotic interactions evidences the environmental plasticity of the miRNAome. Genome Biol. 15, 457 (2014) et al.
- miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73 (2014) &
- LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008) , &
- Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 37, 7002–7013 (2009) , , &
- GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 645–656 (2013) , &
- PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007)
- Molecular demographic history of the annual sunflowers Helianthus annuus and H. petiolaris—large effective population sizes and rates of long-term gene flow. Evolution 62, 1936–1950 (2008) &
- Improved criteria and comparative genomics tool provide new insights into grass paleogenomics. Brief. Bioinform. 10, 619–630 (2009) , , , &
- Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000) , &
- Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005) et al.
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010) , &
- VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009) et al.
- BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010) &
- The pathway tools software. Bioinformatics 18 (Suppl 1), S225–S232 (2002) , &
- Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583–589 (1992) , &
- EggLib: processing, analysis and simulation tools for population genetics and genomics. BMC Genet. 13, 27 (2012) &
- QTL mapping of seed-quality traits in sunflower recombinant inbred lines under different water regimes. Genome 51, 599–615 (2008) et al.
- Molecular basis of the high-palmitic acid trait in sunflower seed oil. Mol. Breed. 36, 43 (2016) et al.
- Mapping quantitative trait loci controlling oil content, oleic acid and linoleic acid content in sunflower (Helianthus annuus L.). Mol. Breed. 36, 106 (2016) , , , &
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Age distribution of transposons in the sunflower. (165 KB)
The x axis represents the age of insertions in millions of years, the y axis is the density of insertions at a given time point. Top, the age distribution of each superfamily of subclass I of the Class II transposons (the terminal inverted repeat transposons). Bottom, the age distribution of LTR-RT superfamilies.
- Extended Data Figure 2: The density of LTR-RTs in 1 Mb bins per chromosome. (271 KB)
The scale represents a fraction, where 1.0 is 100% of a given bin.
- Extended Data Figure 3: Comparison of grape–sunflower–artichoke–coffee–lettuce genomes. (277 KB)
Top, dot plots of orthologues between the grape genome (y axis, as a representative of the n = 21 post-γ ancestor) and, from left to right, the sunflower (1–6 chromosomal relationships inherited from WGT-1 and WGD-2), artichoke (1–3 chromosomal relationships deriving from WGT-1), coffee (1–1 chromosomal relationships illustrating the absence of a coffee-specific WGD, despite WGT-1) genomes and the lettuce genetic map (1–3 chromosomal relationships deriving from WGT-1). Bottom, dot plots of orthologues between the sunflower genome (y axis, n = 17 chromosomes) and artichoke (x axis, n = 17 chromosomes) and lettuce (x axis, n = 9 chromosomes) genomes with 1–1 chromosomal relationships.
- Extended Data Figure 4: Organ-specific expression in the sunflower transcriptome. (177 KB)
a, Histogram of the specificity index Tau in expressed genes. b, Box plot distribution of the specificity index Tau in 11 different organs. The different organs are represented with the following colours: Ray floret ovary, dark brown; disc floret corolla, orange; ray floret ligule, yellow; bract, bright green; stem, dark green; pistil, bright blue; roots, dark blue; leaves, light green; disc floret ovary (seeds), red; stamens, magenta; pollen, light blue. c, Violin plot of the specificity index Tau for transcription factors (TFs, magenta) and long non-coding RNA (lncRNA, light blue). d, Cumulative bar plot showing the organ distribution of specific genes (left), transcription factors (middle) and lncRNA (right). Colours are the same as in b.
- Extended Data Figure 5: Integrative analysis of flowering time. (373 KB)
a, Flowering time network in the sunflower. Flowering time genes of A. thaliana and their interactions are drawn in green. Sunflower genes and orthology relationships with A. thaliana genes are shown in orange. b, Genomic architecture of flowering time in the domesticated sunflower. Outer ring, location of genomic regions associated with flowering time. Inner ring, links between ohnologues of a sunflower-specific whole-genome duplication (WGD-2), limited to genes located in regions associated with flowering time. Links between ohnologues of WGD-2 that are both located in regions associated with flowering time are drawn in red, other links are drawn in grey. c, Pathway of the integration of flowering signals in meristem (simplified pathway adapted from ref. 20). The bright orange backgrounds indicate genes for which at least one sunflower orthologue was located in a region associated with flowering time. Bold italic genes indicates genes for which we identified additional in-paralogues compared to a previous study using more limited genomic data21. Simple arrows represent positive regulation and other arrows negative regulation. Curved lines between genes represent protein–protein complexes.
- Extended Data Figure 6: Integrative analysis of oil metabolism. (387 KB)
a, Whole-metabolic network (3,821 reactions and 475 pathways). Genes are coloured by expression levels in developing seeds. b, Co-expression network of oil metabolic pathway. Genes that co-localize with QTLs are coloured in orange. c, Sub-network with genes from b co-localizing with QTLs. Node size is proportional to Fst between lines cultivated for oil production and other domesticated lines. Genes with an Fst in the top 5% are coloured in dark orange. d, Mapping of candidate genes (orange genes from c) on the pathways of diacylglycerol and triacylglycerol biosynthesis. e, Mapping of candidate genes on the pathway of linoleate biosynthesis. f, Tree of a gene cluster including a candidate gene of the PAP2 superfamily, involved in the synthesis of fatty acid precursors (d). Athal, Arabidopsis thaliana; Brapa, Brassica rapa; Ccard, Cynara cardunculus; Hvulg, Hordeum vulgare; Osati, Oryza sativa; Ptrich, Populus trichocarpa.
Extended Data Tables
- Supplementary Information (5.6 MB)
This contains Supplementary Notes split into 10 sections, including methods, data and discussion (Genome Sequencing and Assembly, Genome Annotation, Paleogenomics and ancestry of the sunflower genome, Transcriptomes sequencing and analysis, Resequencing of domesticated lines, Flowering time, Analysis of sunflower ohnologs and oil metabolism) and Supplementary References.
- Supplementary Data 4 (2.6 MB)
This document contains figures of windows estimates of the amount and origin of introgression in the genomes assemblies of the XRQ and Ha412 genotypes (one figure per chromosome).
- Supplementary Data 7 (134 KB)
This file contains sunflower orthologs and in-paralogs of flowering time genes in Arabidopsis thaliana.
- Supplementary Data 1 (1.6 MB)
This file contains tables A-K regarding location and annotation of miRNA, siRNA, phasiRNA and miRNA targets. A–miRNA families. B- Additional miRNA families. C- All Miranda predictions. D- Non-redundant Miranda predictions. E- Target list by miRNA. F- Targets in flowering time QTL. G- all phasiRNA clusters. H-Non-redundant phasiRNA clusters. I-Intersection between phasiRNA clusters and miRNA targets. J- Clusters of mapping of 24 nucleotide sRNA. K – Intersection between genes and 24 nucleotides mapping clusters.
- Supplementary Data 2 (268 KB)
This table describes paralogy relationships in the sunflower genome.
- Supplementary Data 3 (1.1 MB)
This table describes orthology relationship between genes of sunflower and grape, artichoke and coffee respectively, and with the lettuce genetic map.
- Supplementary Data 5 (57 KB)
This file contains tables lists of organ specific transcription factors of the MYB and TCP families in 11 sunflower organs.
- Supplementary Data 6 (53 KB)
This file contains tables of Gene Ontology categories enriched in response to hormones or stress treatments in sunflower roots and leaves.
- Supplementary Data 8 (79 KB)
This table contains a curated list of sunflower genes involved in seed oil metabolism, based on a review of literature.