Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A phased Vanilla planifolia genome enables genetic improvement of flavour and production


The global supply of vanilla extract is primarily sourced from the cured beans of the tropical orchid species Vanilla planifolia. Vanilla plants were collected from Mesoamerica, clonally propagated and globally distributed as part of the early spice trade. Today, the global food and beverage industry depends on descendants of these original plants that have not generally benefited from genetic improvement. As a result, vanilla growers and processors struggle to meet global demand for vanilla extract and are challenged by inefficient and unsustainable production practices. Here, we report a chromosome-scale, phased V. planifolia genome, which reveals sequence variants for genes that may impact the vanillin pathway and therefore influence bean quality. Resequencing of related vanilla species, including the minor commercial species Vanilla × tahitensis, identified genes that could impact productivity and post-harvest losses through pod dehiscence, flower anatomy and disease resistance. The vanilla genome reported in this study may enable accelerated breeding of vanilla to improve high-value traits.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The vanilla plant, flowers and beans.
Fig. 2: A chromosome-level, fully phased genome was assembled for V. planifolia cultivar Daphna.
Fig. 3: Phylogenetic tree of Vanilla and other selected taxa.
Fig. 4: Karyotype of the vanilla genome illustrating the pan-orchid αO genome duplication.
Fig. 5: V. planifolia cultivar Daphna comparative genomics analyses.
Fig. 6: Principal component analysis plot from resequencing of accessions in this study and from previously reported GBS data.
Fig. 7: The proposed vanillin biosynthesis pathway, with new insights from the Daphna genome.

Similar content being viewed by others

Data availability

Publicly available datasets were reanalysed using the complete Daphna genome and included SRX286672 (unspecified V. planifolia accession)33, data from Daphna (PRJNA253813; ref. 29) and GBS data from PRJNA507246 (ref. 18). This whole-genome shotgun project has been deposited at DNA Data Bank of Japan/European Nucleotide Archive/GenBank under the accession JADCNL000000000. New assemblies and sequences generated as part of this study can be found on the National Center for Biotechnology Information website under BioProject IDs PRJNA633886 (for haplotype A; genomic short reads for Guy 1 (SRR12628847), Painter (SRR12628845), Hawaii (SRR12628846), Daphna (SRR12628848), King (SRR12628844), Haapape (SRR12628843) and Sheila (SRR12628842), as well as genomic ONT long reads for Daphna (SRR12628849)) and PRJNA668740 (for haplotype B).

Code availability

All of the software and coding used in this study is publicly available.


  1. Childers, N. F. Vanilla Culture in Puerto Rico (US Department of Agriculture, 1948).

  2. Medina, J. D. L. C., Jiménes, G. C. R. & García, H. S. Vanilla: Post-Harvest Operations (Food and Agriculture Organization of the United Nations, 2009).

  3. Vanilla Beans and Extract Market Worth US$4.3Bn by 2025 (Acumen Research and Consulting, 2019).

  4. Correll, D. S. Vanilla—its botany, history, cultivation and economic import. Econ. Bot. 7, 291–358 (1953).

    Article  CAS  Google Scholar 

  5. Ecott, T. Vanilla: Travels in Search of the Luscious Substance (Penguin UK, 2005).

  6. Chambers, A. H. Advances in Plant Breeding Strategies: Industrial and Food Crops Ch. 18 (Springer, 2019).

  7. Chambers, A. H., Moon, P., Edmond, V. & Bassil, E. Vanilla Cultivation in Southern Florida (EDIS, 2019).

  8. Sasikumar, B. Vanilla breeding—a review. Agric. Rev. 31, 139–144 (2010).

    Google Scholar 

  9. Lepers-Andrzejewski, S., Causse, S., Caromel, B., Wong, M. & Dron, M. Genetic linkage map and diversity analysis of Tahitian vanilla (Vanilla × tahitensis, Orchidaceae). Crop Sci. 52, 795–806 (2012).

    Article  CAS  Google Scholar 

  10. Yang, H. L. et al. A re-evaluation of the final step of vanillin biosynthesis in the orchid Vanilla planifolia. Phytochemistry 139, 33–46 (2017).

    Article  CAS  PubMed  Google Scholar 

  11. Dong, Y. & Wang, Y. Z. Seed shattering: from models to crops. Front. Plant Sci. 6, 476 (2015).

    PubMed  PubMed Central  Google Scholar 

  12. Lapeyre-Montes, F., Conejero, G., Verdeil, J.-L. & Odoux, E. in Vanilla (Medicinal and Aromatic Plants—Industrial Profiles) (eds Odoux, E. & Grisoni, M.) Ch. 10 (CRC Press, 2010).

  13. Soto-Arenas, M. & Cameron, K. in Genera Orchidacearum Vol. 3 (eds Pridgeon, A. M. et al.) 321–334 (Oxford Univ. Press, 2003).

  14. Gigant, R. L. et al. in Microsatellite Markers Ch. 4, 73–93 (IntechOpen, 2016).

  15. National Academies of Sciences, Engineering, and Medicine A Review of the Citrus Greening Research and Development Efforts Supported by the Citrus Research and Development Foundation: Fighting a Ravaging Disease (National Academies Press, 2018).

  16. Ploetz, R. C. Fusarium wilt of banana. Phytopathology 105, 1512–1521 (2015).

    Article  PubMed  Google Scholar 

  17. Delassus, M. La lutte contre la fusariose du vanillier par les méthodes génétiques. Agron. Trop. 18, 245–246 (1963).

    Google Scholar 

  18. Hu, Y. et al. Genomics-based diversity analysis of vanilla species using a Vanilla planifolia draft genome and genotyping-by-sequencing. Sci. Rep. 9, 3416 (2019).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  19. Brown, S. C. et al. DNA remodeling by strict partial endoreplication in orchids, an original process in the plant kingdom. Genome Biol. Evol. 9, 1051–1071 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bory, S. et al. Natural polyploidy in Vanilla planifolia (Orchidaceae). Genome 51, 816–826 (2008).

    Article  CAS  PubMed  Google Scholar 

  21. Lepers-Andrzejewski, S., Siljak-Yakovlev, S., Brown, S. C., Wong, M. & Dron, M. Diversity and dynamics of plant genome size: an example of polysomaty from a cytogenetic study of Tahitian vanilla (Vanilla × tahitensis, Orchidaceae). Am. J. Bot. 98, 986–997 (2011).

    Article  PubMed  Google Scholar 

  22. Cai, J. et al. The genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 47, 65–72 (2015).

    Article  CAS  PubMed  Google Scholar 

  23. Zhang, G. Q. et al. The Dendrobium catenatum Lindl. genome sequence provides insights into polysaccharide synthase, floral development and adaptive evolution. Sci. Rep. 6, 19029 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zhang, G. Q. et al. The Apostasia genome and the evolution of orchids. Nature 549, 379–383 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Wang, W. et al. The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nat. Commun. 5, 3311 (2014).

    Article  ADS  CAS  PubMed  Google Scholar 

  26. Ming, R. et al. The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47, 1435–1442 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lubinsky, P. et al. Neotropical roots of a Polynesian spice: the hybrid origin of Tahitian vanilla, Vanilla tahitensis (Orchidaceae). Am. J. Bot. 95, 1040–1047 (2008).

    Article  CAS  PubMed  Google Scholar 

  28. Gallage, N. J. et al. The intracellular localization of the vanillin biosynthetic machinery in pods of Vanilla planifolia. Plant Cell Physiol. 59, 304–318 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Rao, X. et al. A deep transcriptomic analysis of pod development in the vanilla orchid (Vanilla planifolia). BMC Genomics 15, 964 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Gallage, N. J. & Møller, B. L. in Biotechnology of Natural Products Ch. 1, 3–24 (Springer, 2018).

  31. Widiez, T. et al. Functional characterization of two new members of the caffeoyl CoA O-methyltransferase-like gene family from Vanilla planifolia reveals a new class of plastid-localized O-methyltransferases. Plant Mol. Biol. 76, 475–488 (2011).

    Article  CAS  PubMed  Google Scholar 

  32. Fock-Bastide, I. et al. Expression profiles of key phenylpropanoid genes during Vanilla planifolia pod development reveal a positive correlation between PAL gene expression and vanillin biosynthesis. Plant Physiol. Biochem. 74, 304–314 (2014).

    Article  CAS  PubMed  Google Scholar 

  33. Gallage, N. J. et al. Vanillin formation from ferulic acid in Vanilla planifolia is catalysed by a single enzyme. Nat. Commun. 5, 4037 (2014).

    Article  ADS  CAS  PubMed  Google Scholar 

  34. Odoux, E. & Brillouet, J.-M. Anatomy, histochemistry and biochemistry of glucovanillin, oleoresin and mucilage accumulation sites in green mature vanilla pod (Vanilla planifolia; Orchidaceae): a comprehensive and critical reexamination. Fruits 64, 221–241 (2009).

    Article  CAS  Google Scholar 

  35. Zhang, M. P. et al. Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research. Nat. Protoc. 7, 467–478 (2012).

    Article  CAS  PubMed  Google Scholar 

  36. Datema, E. et al. The megabase-sized fungal genome of Rhizoctonia solani assembled from nanopore reads only. Preprint at bioRxiv (2016).

  37. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lee, Y. G. et al. Constructing a reference genome in a single lab: the possibility to use Oxford nanopore technology. Plants 8, 270 (2019).

    Article  CAS  PubMed Central  Google Scholar 

  41. Michael, T. P. et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9, 541 (2018).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  42. Giordano, F. et al. De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms. Sci. Rep. 7, 3935 (2017).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  43. Liao, Y. C. et al. Completing circular bacterial genomes with assembly complexity by using a sampling strategy from a single MinION run with barcoding. Front. Microbiol. 10, 2068 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013).

  46. Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Kronenberg, Z. N. et al. FALCON-Phase: integrating PacBio and Hi-C data for phased diploid genomes. Preprint at BioRxiv (2018).

  49. Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chin, C. S. Scaffolding of long read assemblies using long range contact information. BMC Genomics 18, 527 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplots: reference-free profiling of polyploid genomes. Preprint at BioRxiv (2019).

  53. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, I351–I358 (2005).

    Article  CAS  PubMed  Google Scholar 

  54. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.10.1–4.10.14 (2009).

    Google Scholar 

  55. Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003).

    Article  PubMed  Google Scholar 

  56. Guigo, R., Knudsen, S., Drake, N. & Smith, T. Prediction of gene structure. J. Mol. Biol. 226, 141–157 (1992).

    Article  CAS  PubMed  Google Scholar 

  57. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

    PubMed  PubMed Central  Google Scholar 

  58. Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

    Article  CAS  PubMed  Google Scholar 

  59. Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Sato, S. et al. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

    Article  ADS  CAS  Google Scholar 

  63. Osuna-Cruz, C. M. et al. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes. Nucleic Acids Res. 46, D1197–D1201 (2018).

    Article  CAS  PubMed  Google Scholar 

  64. Frazee, A. C. et al. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat. Biotechnol. 33, 243–246 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001).

    Article  CAS  PubMed  Google Scholar 

  67. Aronesty, E. ea-utils (fastqmcf) (2011);

  68. Kim, D., Landmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at (2012).

  70. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Goodstein, D. M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).

    Article  CAS  PubMed  Google Scholar 

  72. Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Yang, Z. H. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  PubMed  Google Scholar 

  74. Dixon, R. A. in Handbook of Vanilla Science and Technology Ch. 24 (Wiley, 2018).

Download references


Funding for this research was provided by Elo Life Systems and by funds provided to A.H.C. from the University of Florida Dean for Research.

Author information

Authors and Affiliations



F.K., T. Huang and A.H.C. conceived of the project, designed the work and interpreted the data. M.B., H.T. and T. Hasing contributed to data acquisition, analysis and interpretation. All authors wrote, revised and approved the final manuscript and have agreed to be personally accountable for the accuracy and integrity of this work.

Corresponding authors

Correspondence to Tengfang Huang or Alan H. Chambers.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary methods, notes, Tables 1–9, 11 and 12, and Figs. 1–8.

Supplementary Table 10

1,458 gene pairs derived from WGD events previously inferred as an orchid-wide WGD event from structural comparison of V. planifolia haplotype A versus itself.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hasing, T., Tang, H., Brym, M. et al. A phased Vanilla planifolia genome enables genetic improvement of flavour and production. Nat Food 1, 811–819 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene