We sequenced and assembled the draft genome of Theobroma cacao, an economically important tropical-fruit tree crop that is the source of chocolate. This assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of these genes anchored on the 10 T. cacao chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example, flavonoid-related genes. It also provides a major source of candidate genes for T. cacao improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten T. cacao chromosomes were shaped from an ancestor through eleven chromosome fusions.
At a glance
- Chromosome studies in the Malvaceae and certain related families. II. Genetica 17, 487–498 (1935).
- Chemical and archaeological evidence for the earliest cacao beverages. Proc. Natl. Acad. Sci. USA 104, 18937–18940 (2007). , , , &
- 1996). & The True History of Chocolate. (Thames and Hudson Ltd., London, England,
- Cacao domestication I: the origin of the cacao cultivated by the Mayas. Heredity 89, 380–386 (2002). et al.
- Cacao domestication II: progenitor germplasm of the Trinitario cacao cultivar. Heredity 91, 322–330 (2003). , , &
- The collection of Criollo cocoa germplasm in Belize. Cocoa Grower's Bull. 49, 26–40 (1995). , &
- Cocoa Resources in consuming Countries–ICCO Market Committee, 10th meeting. EBRD Offices London, MC 10, 16 (2007).
- The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009). et al.
- International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005).
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007). et al.
- Maize centromere structure and evolution: sequence analysis of centromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons. PLoS Genet. 5, e1000743 (2009). et al.
- Genome annotation in plants and fungi: EuGène as a model platform. Curr. Bioinform. 3, 87–97 (2008). et al.
- The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009). et al.
- Origin, biogenesis, and activity of plant microRNAs. Cell 136, 669–687 (2009).
- miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154–D158 (2008). , , &
- Plant receptor-like serine threonine kinases: roles in signaling and plant defense. Mol. Plant Microbe Interact. 21, 507–517 (2008). , &
- LRR-containing receptors regulating plant development and defense. Development 131, 251–261 (2004). &
- Evolutionary history and stress regulation of plant receptor-like kinase/pelle genes. Plant Physiol. 150, 12–26 (2009). , , &
- Plant NBS-LRR proteins in pathogen sensing and host defense. Nat. Immunol. 7, 1243–1249 (2006). &
- TIR-NBS-LRR genes are rare in monocots: evidence from diverse monocot orders. BMC Res. Notes 2, 197 (2009). &
- Divergent evolution of plant NBS-LRR resistance gene homologues in dicot and cereal genomes. J. Mol. Evol. 50, 203–213 (2000). , &
- NPR1 in plant pefense: it's not over 'til it's turned over. Cell 137, 804–806 (2009). , &
- Functional analysis of the Theobroma cacao NPR1 Gene in Arabidopsis . BMC Plant Biol. 10, 248 (2010). , , , &
- Structure and evolution of plant disease resistance genes. J. Appl. Genet. 43, 403–414 (2002).
- A meta–QTL analysis of disease resistance traits of Theobroma cacao L. Mol. Breed. 24, 361–374 (2009). et al.
- The regulation of triacylglycerol biosynthesis in cocoa (Theobroma cacao) L. Planta 184, 279–284 (1991). &
- Arabidopsis genes involved in acyl lipid metabolism. A 2003 census of the candidates, a study of the distribution of expressed sequence tags in organs, and a web-based database. Plant Physiol. 132, 681 (2003). et al.
- Flavonoid oxidation in plants: from biochemical properties to physiological functions. Trends Plant Sci. 12, 29–36 (2007). , , , &
- Flavonoids and brain health: multiple effects underpinned by common mechanisms. Genes Nutr. 4, 243–250 (2009).
- Polyphenols from cocoa and vascular health-a critical review. Int. J. Mol. Sci. 10, 4290–4309 (2009). , , &
- Molecular analysis of genes involved in the synthesis of proanthocyanidins in theobroma cacao. Thesis 1–146 (2010).
- A new process to develop a cocoa powder with higher flavonoid monomer content and enhanced bioavailability in healthy humans. J. Agric. Food Chem. 55, 3926–3935 (2007). et al.
- Advances in the plant isoprenoid biosynthesis pathway and its metabolic engineering. J. Integr. Plant Biol. 47, 769–782 (2005). , , &
- Linalol contents as characteristics of some flavour grade cocoas. Z. Lebensm. Unters. Forsch. 191, 306–309 (1990).
- Influence du traitement post-récolte et de la torréfaction sur le développement de l'arôme cacao. 12th Int. Cocoa Res. Conf., Salvador de Bahia (Brazil) 959–964 (1996). &
- The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006). et al.
- Towards the understanding of the cocoa transcriptome: production and analysis of an exhaustive dataset of ESTs of Theobroma cacao generated from various tissues and under various conditions. BMC Genomics 9, 512 (2008). et al.
- Identification of QTLs related to fat content, seed size and sensorial traits in Theobroma cacao L. Proc. 14th Int. Cocoa Res. Conf. 13–18 (2003). et al.
- Mapping of quantitative trait loci for butter content and hardness in cocoa beans (Theobroma cacao L.). Plant Mol. Bio. Rep. 27, 177–183 (2009). et al.
- Synteny and collinearity in plant genomes. Science 320, 486–488 (2008). et al.
- Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
- Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010). et al.
- The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452, 991–996 (2008). et al.
- Improved criteria and comparative genomics tool provide new insights into grass paleogenomics. Briefings Bioinf. 10, 619–630 (2009). , , , &
- Reconstruction of monocotelydoneous proto-chromosomes reveals faster evolution in plants than in animals. Proc. Natl. Acad. Sci. USA 106, 14908–14913 (2009). et al.
- Palaeogenomics of plants: synteny-based modelling of extinct ancestors. Trends Plant Sci. 15, 479–487 (2010). et al.
- Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution. Genome Res. 11, 1545–1547 (2010). et al.
- Over-expression of a cacao class I chitinase gene in Theobroma cacao L. enhances resistance against the pathogen, Colletotrichum gloeosporioides . Planta 224, 740–749 (2006). et al.
- The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res. 16, 140–147 (2006). et al.
- High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies. BMC Genomics 9, 603 (2008). et al.
- A cytometric exercise in plant DNA histograms, with 2C values for 70 species. Biology of the Cell/Under the Auspices of the European Cell Biology Organization 78, 41–51 (1993). &
- A new cacao linkage map based on codominant markers: development and integration of 201 new microsatellite markers. Theor. Appl. Genet. 108, 1151–1161 (2004). et al.
- Structural characterization and mapping of functional EST-SSR markers in Theobroma cacao , in the press. et al.
- A high-density consensus genetic map for Theobroma cacao L., in the press. et al.
- Characterisation of the double genome structure of modern sugarcane cultivars (Saccharum spp.) by molecular cytogenetics. Mol. Gen. Genet. 250, 405–413 (1996). et al.
- SpliceMachine: predicting splice sites from high-dimensional local context representations. Bioinformatics 21, 1332–1338 (2005). , , , &
- Engineering a software tool for gene structure prediction in higher organisms. Inf. Softw. Technol. 47, 965–978 (2005). , , &
- Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010). et al.
- OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). , &
- A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994). &
- Supplementary Text and Figures (6M)
Supplementary Note, Supplementary Tables 1–19 and Supplementary Figures 1–18