Abstract
The global supply of vanilla extract is primarily sourced from the cured beans of the tropical orchid species Vanilla planifolia. Vanilla plants were collected from Mesoamerica, clonally propagated and globally distributed as part of the early spice trade. Today, the global food and beverage industry depends on descendants of these original plants that have not generally benefited from genetic improvement. As a result, vanilla growers and processors struggle to meet global demand for vanilla extract and are challenged by inefficient and unsustainable production practices. Here, we report a chromosome-scale, phased V. planifolia genome, which reveals sequence variants for genes that may impact the vanillin pathway and therefore influence bean quality. Resequencing of related vanilla species, including the minor commercial species Vanilla × tahitensis, identified genes that could impact productivity and post-harvest losses through pod dehiscence, flower anatomy and disease resistance. The vanilla genome reported in this study may enable accelerated breeding of vanilla to improve high-value traits.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
OrchidBase 5.0: updates of the orchid genome knowledgebase
BMC Plant Biology Open Access 02 December 2022
-
Genome assembly and chemogenomic profiling of National Flower of Singapore Papilionanthe Miss Joaquim ‘Agnes’ reveals metabolic pathways regulating floral traits
Communications Biology Open Access 15 September 2022
-
Genomes of leafy and leafless Platanthera orchids illuminate the evolution of mycoheterotrophy
Nature Plants Open Access 21 April 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout







Data availability
Publicly available datasets were reanalysed using the complete Daphna genome and included SRX286672 (unspecified V. planifolia accession)33, data from Daphna (PRJNA253813; ref. 29) and GBS data from PRJNA507246 (ref. 18). This whole-genome shotgun project has been deposited at DNA Data Bank of Japan/European Nucleotide Archive/GenBank under the accession JADCNL000000000. New assemblies and sequences generated as part of this study can be found on the National Center for Biotechnology Information website under BioProject IDs PRJNA633886 (for haplotype A; genomic short reads for Guy 1 (SRR12628847), Painter (SRR12628845), Hawaii (SRR12628846), Daphna (SRR12628848), King (SRR12628844), Haapape (SRR12628843) and Sheila (SRR12628842), as well as genomic ONT long reads for Daphna (SRR12628849)) and PRJNA668740 (for haplotype B).
Code availability
All of the software and coding used in this study is publicly available.
References
Childers, N. F. Vanilla Culture in Puerto Rico (US Department of Agriculture, 1948).
Medina, J. D. L. C., Jiménes, G. C. R. & García, H. S. Vanilla: Post-Harvest Operations (Food and Agriculture Organization of the United Nations, 2009).
Vanilla Beans and Extract Market Worth US$ 4.3 Bn by 2025 (Acumen Research and Consulting, 2019).
Correll, D. S. Vanilla—its botany, history, cultivation and economic import. Econ. Bot. 7, 291–358 (1953).
Ecott, T. Vanilla: Travels in Search of the Luscious Substance (Penguin UK, 2005).
Chambers, A. H. Advances in Plant Breeding Strategies: Industrial and Food Crops Ch. 18 (Springer, 2019).
Chambers, A. H., Moon, P., Edmond, V. & Bassil, E. Vanilla Cultivation in Southern Florida (EDIS, 2019).
Sasikumar, B. Vanilla breeding—a review. Agric. Rev. 31, 139–144 (2010).
Lepers-Andrzejewski, S., Causse, S., Caromel, B., Wong, M. & Dron, M. Genetic linkage map and diversity analysis of Tahitian vanilla (Vanilla × tahitensis, Orchidaceae). Crop Sci. 52, 795–806 (2012).
Yang, H. L. et al. A re-evaluation of the final step of vanillin biosynthesis in the orchid Vanilla planifolia. Phytochemistry 139, 33–46 (2017).
Dong, Y. & Wang, Y. Z. Seed shattering: from models to crops. Front. Plant Sci. 6, 476 (2015).
Lapeyre-Montes, F., Conejero, G., Verdeil, J.-L. & Odoux, E. in Vanilla (Medicinal and Aromatic Plants—Industrial Profiles) (eds Odoux, E. & Grisoni, M.) Ch. 10 (CRC Press, 2010).
Soto-Arenas, M. & Cameron, K. in Genera Orchidacearum Vol. 3 (eds Pridgeon, A. M. et al.) 321–334 (Oxford Univ. Press, 2003).
Gigant, R. L. et al. in Microsatellite Markers Ch. 4, 73–93 (IntechOpen, 2016).
National Academies of Sciences, Engineering, and Medicine A Review of the Citrus Greening Research and Development Efforts Supported by the Citrus Research and Development Foundation: Fighting a Ravaging Disease (National Academies Press, 2018).
Ploetz, R. C. Fusarium wilt of banana. Phytopathology 105, 1512–1521 (2015).
Delassus, M. La lutte contre la fusariose du vanillier par les méthodes génétiques. Agron. Trop. 18, 245–246 (1963).
Hu, Y. et al. Genomics-based diversity analysis of vanilla species using a Vanilla planifolia draft genome and genotyping-by-sequencing. Sci. Rep. 9, 3416 (2019).
Brown, S. C. et al. DNA remodeling by strict partial endoreplication in orchids, an original process in the plant kingdom. Genome Biol. Evol. 9, 1051–1071 (2017).
Bory, S. et al. Natural polyploidy in Vanilla planifolia (Orchidaceae). Genome 51, 816–826 (2008).
Lepers-Andrzejewski, S., Siljak-Yakovlev, S., Brown, S. C., Wong, M. & Dron, M. Diversity and dynamics of plant genome size: an example of polysomaty from a cytogenetic study of Tahitian vanilla (Vanilla × tahitensis, Orchidaceae). Am. J. Bot. 98, 986–997 (2011).
Cai, J. et al. The genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 47, 65–72 (2015).
Zhang, G. Q. et al. The Dendrobium catenatum Lindl. genome sequence provides insights into polysaccharide synthase, floral development and adaptive evolution. Sci. Rep. 6, 19029 (2016).
Zhang, G. Q. et al. The Apostasia genome and the evolution of orchids. Nature 549, 379–383 (2017).
Wang, W. et al. The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nat. Commun. 5, 3311 (2014).
Ming, R. et al. The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47, 1435–1442 (2015).
Lubinsky, P. et al. Neotropical roots of a Polynesian spice: the hybrid origin of Tahitian vanilla, Vanilla tahitensis (Orchidaceae). Am. J. Bot. 95, 1040–1047 (2008).
Gallage, N. J. et al. The intracellular localization of the vanillin biosynthetic machinery in pods of Vanilla planifolia. Plant Cell Physiol. 59, 304–318 (2018).
Rao, X. et al. A deep transcriptomic analysis of pod development in the vanilla orchid (Vanilla planifolia). BMC Genomics 15, 964 (2014).
Gallage, N. J. & Møller, B. L. in Biotechnology of Natural Products Ch. 1, 3–24 (Springer, 2018).
Widiez, T. et al. Functional characterization of two new members of the caffeoyl CoA O-methyltransferase-like gene family from Vanilla planifolia reveals a new class of plastid-localized O-methyltransferases. Plant Mol. Biol. 76, 475–488 (2011).
Fock-Bastide, I. et al. Expression profiles of key phenylpropanoid genes during Vanilla planifolia pod development reveal a positive correlation between PAL gene expression and vanillin biosynthesis. Plant Physiol. Biochem. 74, 304–314 (2014).
Gallage, N. J. et al. Vanillin formation from ferulic acid in Vanilla planifolia is catalysed by a single enzyme. Nat. Commun. 5, 4037 (2014).
Odoux, E. & Brillouet, J.-M. Anatomy, histochemistry and biochemistry of glucovanillin, oleoresin and mucilage accumulation sites in green mature vanilla pod (Vanilla planifolia; Orchidaceae): a comprehensive and critical reexamination. Fruits 64, 221–241 (2009).
Zhang, M. P. et al. Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research. Nat. Protoc. 7, 467–478 (2012).
Datema, E. et al. The megabase-sized fungal genome of Rhizoctonia solani assembled from nanopore reads only. Preprint at bioRxiv https://doi.org/10.1101/084772 (2016).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016).
Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Lee, Y. G. et al. Constructing a reference genome in a single lab: the possibility to use Oxford nanopore technology. Plants 8, 270 (2019).
Michael, T. P. et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9, 541 (2018).
Giordano, F. et al. De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms. Sci. Rep. 7, 3935 (2017).
Liao, Y. C. et al. Completing circular bacterial genomes with assembly complexity by using a sampling strategy from a single MinION run with barcoding. Front. Microbiol. 10, 2068 (2019).
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/abs/1303.3997 (2013).
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Kronenberg, Z. N. et al. FALCON-Phase: integrating PacBio and Hi-C data for phased diploid genomes. Preprint at BioRxiv https://doi.org/10.1101/327064 (2018).
Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chin, C. S. Scaffolding of long read assemblies using long range contact information. BMC Genomics 18, 527 (2017).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplots: reference-free profiling of polyploid genomes. Preprint at BioRxiv https://doi.org/10.1101/747568 (2019).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, I351–I358 (2005).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.10.1–4.10.14 (2009).
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003).
Guigo, R., Knudsen, S., Drake, N. & Smith, T. Prediction of gene structure. J. Mol. Biol. 226, 141–157 (1992).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Sato, S. et al. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).
Osuna-Cruz, C. M. et al. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes. Nucleic Acids Res. 46, D1197–D1201 (2018).
Frazee, A. C. et al. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat. Biotechnol. 33, 243–246 (2015).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001).
Aronesty, E. ea-utils (fastqmcf) (2011); https://expressionanalysis.github.io/ea-utils/
Kim, D., Landmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at http://arxiv.org/abs/1207.3907 (2012).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Goodstein, D. M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Yang, Z. H. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Dixon, R. A. in Handbook of Vanilla Science and Technology Ch. 24 (Wiley, 2018).
Acknowledgements
Funding for this research was provided by Elo Life Systems and by funds provided to A.H.C. from the University of Florida Dean for Research.
Author information
Authors and Affiliations
Contributions
F.K., T. Huang and A.H.C. conceived of the project, designed the work and interpreted the data. M.B., H.T. and T. Hasing contributed to data acquisition, analysis and interpretation. All authors wrote, revised and approved the final manuscript and have agreed to be personally accountable for the accuracy and integrity of this work.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary methods, notes, Tables 1–9, 11 and 12, and Figs. 1–8.
Supplementary Table 10
1,458 gene pairs derived from WGD events previously inferred as an orchid-wide WGD event from structural comparison of V. planifolia haplotype A versus itself.
Rights and permissions
About this article
Cite this article
Hasing, T., Tang, H., Brym, M. et al. A phased Vanilla planifolia genome enables genetic improvement of flavour and production. Nat Food 1, 811–819 (2020). https://doi.org/10.1038/s43016-020-00197-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43016-020-00197-2
This article is cited by
-
Diagnostic KASP markers differentiate Vanilla planifolia, V. odorata, V. pompona, and their hybrids using leaf or cured pod tissues
Molecular Biology Reports (2023)
-
Genome assembly and chemogenomic profiling of National Flower of Singapore Papilionanthe Miss Joaquim ‘Agnes’ reveals metabolic pathways regulating floral traits
Communications Biology (2022)
-
A genome-wide assessment of the genetic diversity, evolution and relationships with allied species of the clonally propagated crop Vanilla planifolia Jacks. ex Andrews
Genetic Resources and Crop Evolution (2022)
-
OrchidBase 5.0: updates of the orchid genome knowledgebase
BMC Plant Biology (2022)
-
Genomes of leafy and leafless Platanthera orchids illuminate the evolution of mycoheterotrophy
Nature Plants (2022)