The genome sequence of the orchid Phalaenopsis equestris

Journal name:
Nature Genetics
Year published:
Published online
Corrected online
Corrected online


Orchidaceae, renowned for its spectacular flowers and other reproductive and ecological adaptations, is one of the most diverse plant families. Here we present the genome sequence of the tropical epiphytic orchid Phalaenopsis equestris, a frequently used parent species for orchid breeding. P. equestris is the first plant with crassulacean acid metabolism (CAM) for which the genome has been sequenced. Our assembled genome contains 29,431 predicted protein-coding genes. We find that contigs likely to be underassembled, owing to heterozygosity, are enriched for genes that might be involved in self-incompatibility pathways. We find evidence for an orchid-specific paleopolyploidy event that preceded the radiation of most orchid clades, and our results suggest that gene duplication might have contributed to the evolution of CAM photosynthesis in P. equestris. Finally, we find expanded and diversified families of MADS-box C/D-class, B-class AP3 and AGL6-class genes, which might contribute to the highly specialized morphology of orchid flowers.

At a glance


  1. Evolution of P. equestris.
    Figure 1: Evolution of P. equestris.

    (a) Comparison of the number of gene families in orchid (P. equestris), rice (O. sativa), grapevine (V. vinifera) and A. thaliana. (b) Phylogenetic tree and gene family expansion and contraction. The phylogenetic tree was constructed from a concatenated alignment of 72 single-copy gene families from 11 green plant species. Gene family expansions are indicated in orange, and gene family contractions are indicated in gray; the corresponding propotions among total changes are shown using the same colors in the pie charts. Inferred divergence dates (in millions of years) are denoted at each node in blue. MRCA, most recent common ancestor. Blue portions of the pie charts represent the conserved gene families.

  2. Dating the paleopolyploidy event in P. equestris.
    Figure 2: Dating the paleopolyploidy event in P. equestris.

    (a) Absolute age distribution obtained by phylogenomic dating of P. equestris homeologs. The solid black line represents the kernel density estimate of the dated homeologs, and the vertical dashed black line represents its peak, used as the WGD age estimate. Gray lines represent the density estimates for the 1,000 bootstrap replicates, and the vertical black dotted lines represent the corresponding 90% confidence intervals for the WGD age estimate. The original raw distribution of dated homeologs is also indicated by dots. The mode used as an estimate for the consensus WGD age is found at 75.57 million years ago with lower and upper 90% confidence interval boundaries at 71.50 and 80.73 million years ago, respectively y axis represents percentage of gene pairs. (b) Phylogenetic tree of the angiosperms. A wave of WGD events (indicated by colored bars) appears to be associated with the Cretaceous-Paleogene extinction event ~66 million years ago27. The orchid-specific WGD is indicated by the unfilled bar.

  3. Overview of crassulacean acid metabolism (CAM) pathway evolution.
    Figure 3: Overview of crassulacean acid metabolism (CAM) pathway evolution.

    Overview of the CAM pathway with components for which the respective gene family underwent gene duplication or loss as indicated. A star indicates components whose gene families underwent gene loss or gain. CA, carbonic anhydrase; CC, Calvin cycle; PEP, phosphoenolpyruvic acid; PEPC, phosphoenolpyruvate carboxylase; PPCK, phosphoenolpyruvate carboxykinase; MDH, malate dehydrogenase; ME, malic enzyme; PPDK, pyruvate phosphate dikinase.

Accession codes

Primary accessions


Change history

Corrected online 09 January 2015
In the version of this article initially published, the divergence time estimates in Figure 1b were incorrect, and the genome size estimation for Phalaenopsis equestris was incorrectly stated as 1.6 × 106 instead of the correct 1.6 × 109. Finally, Figure 3 incorrectly showed maleic acid in the vacuole reaction, which should have been malic acid. The errors have been corrected in the HTML and PDF versions of the article.
Corrected online 06 February 2015
In the version of this article initially published, the legend for Figure 1b referred to red arrows indicating the inferred divergence dates. No arrows are depicted in the figure, so this sentence has been removed from the figure legend in the HTML and PDF versions of the article.


  1. Darwin, C. On the Various Contrivances by Which British and Foreign Orchids are Fertilised by Insects (Cambridge University Press, 2011).
  2. Schiestl, F.P. et al. The chemistry of sexual deception in an orchid-wasp pollination system. Science 302, 437438 (2003).
  3. Cozzolino, S. & Widmer, A. Orchid diversity: an evolutionary consequence of deception? Trends Ecol. Evol. 20, 487494 (2005).
  4. Silvera, K., Santiago, L.S., Cushman, J.C. & Winter, K. Crassulacean acid metabolism and epiphytism linked to adaptive radiations in the Orchidaceae. Plant Physiol. 149, 18381847 (2009).
  5. Lin, S. et al. Nuclear DNA contents of Phalaenopsis sp. and Doritis pulcherrima. J. Am. Soc. Hortic. Sci. 126, 195199 (2001).
  6. Leitch, I.J. et al. Genome size diversity in orchids: consequences and evolution. Ann. Bot. 104, 469481 (2009).
  7. Tsai, W.C. et al. OrchidBase 2.0: comprehensive collection of Orchidaceae floral transcriptomes. Plant Cell Physiol. 54, e7 (2013).
  8. Fu, C.H. et al. OrchidBase: a collection of sequences of the transcriptome derived from orchids. Plant Cell Physiol. 52, 238243 (2011).
  9. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 10611067 (2007).
  10. Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102, 54545459 (2005).
  11. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463467 (2007).
  12. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551556 (2009).
  13. Ramírez, S.R., Gravendeel, B., Singer, R.B., Marshall, C.R. & Pierce, N.E. Dating the origin of the Orchidaceae from a fossil orchid with its pollinator. Nature 448, 10421045 (2007).
  14. Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211D215 (2009).
  15. Proost, S. et al. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21, 37183731 (2009).
  16. Chen, S. et al. De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits. PLoS ONE 5, e15633 (2010).
  17. O'Donoghue, E.M., Somerfield, S.D. & Heyes, J.A. Organization of cell walls in Sandersonia aurantiaca floral tissue. J. Exp. Bot. 53, 513523 (2002).
  18. Lüttge, U. Vascular Plants as Epiphytes: Evolution and Ecophysiology (Springer-Verlag, 1989).
  19. Bosch, M., Poulter, N.S., Vatovec, S. & Franklin-Tong, V.E. Initiation of programmed cell death in self-incompatibility: role for cytoskeleton modifications and several caspase-like activities. Mol. Plant 1, 879887 (2008).
  20. Dixit, R. & Nasrallah, J.B. Recognizing self in the self-incompatibility response. Plant Physiol. 125, 105108 (2001).
  21. Castillo-Davis, C.I., Mekhedov, S.L., Hartl, D.L., Koonin, E.V. & Kondrashov, F.A. Selection for short introns in highly expressed genes. Nat. Genet. 31, 415418 (2002).
  22. Schnable, J.C., Springer, N.M. & Freeling, M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108, 40694074 (2011).
  23. Bell, C.D., Soltis, D.E. & Soltis, P.S. The age and diversification of the angiosperms re-revisited. Am. J. Bot. 97, 12961303 (2010).
  24. Cui, L. et al. Widespread genome duplications throughout the history of flowering plants. Genome Res. 16, 738749 (2006).
  25. Proost, S., Pattyn, P., Gerats, T. & Van de Peer, Y. Journey through the past: 150 million years of plant genome evolution. Plant J. 66, 5865 (2011).
  26. Soltis, D.E. et al. Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336348 (2009).
  27. Vanneste, K., Baele, G., Maere, S. & Van de Peer, Y. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Res. 24, 13341347 (2014).
  28. Vanneste, K., Van de Peer, Y. & Maere, S. Inference of genome duplications from age distributions revisited. Mol. Biol. Evol. 30, 177190 (2013).
  29. Kim, C. et al. Comparative analysis of Miscanthus and Saccharum reveals a shared whole-genome duplication but different evolutionary fates. Plant Cell 26, 24202429 (2014).
  30. Jiao, Y., Li, J., Tang, H. & Paterson, A.H. Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell 26, 27922802 (2014).
  31. Wood, T.E. et al. The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. Sci. USA 106, 1387513879 (2009).
  32. Soltis, D.E., Visger, C.J. & Soltis, P.S. The polyploidy revolution then…and now: Stebbins revisited. Am. J. Bot. 101, 10571078 (2014).
  33. Whitten, W.M. et al. Molecular phylogenetics of Maxillaria and related genera (Orchidaceae: Cymbidieae) based on combined molecular data sets. Am. J. Bot. 94, 18601889 (2007).
  34. Whitten, W.M., Williams, N.H. & Chase, M.W. Subtribal and generic relationships of Maxillarieae (Orchidaceae) with emphasis on Stanhopeinae: combined molecular evidence. Am. J. Bot. 87, 18421856 (2000).
  35. Douzery, E.J. et al. Molecular phylogenetics of diseae (Orchidaceae): a contribution from nuclear ribosomal ITS sequences. Am. J. Bot. 86, 887899 (1999).
  36. Vanneste, K., Maere, S. & Van de Peer, Y. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Phil. Trans. R. Soc. Lond. B 369, 20130353 (2014).
  37. Campbell, C.S., Judd, W.S. & Kellogg, E.A. Plant Systematics: A Phylogenetic Approach (Sinauer Associates, 1999).
  38. Van de Peer, Y., Maere, S. & Meyer, A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10, 725732 (2009).
  39. Lüttge, U. Ecophysiology of crassulacean acid metabolism (CAM). Ann. Bot. 93, 629652 (2004).
  40. Benzing, D.H. Vascular epiphytism: taxonomic participation and adaptive diversity. Ann. Mo. Bot. Gard. 74, 183204 (1987).
  41. Gravendeel, B., Smithson, A., Slik, F.J. & Schuiteman, A. Epiphytism and pollinator specialization: drivers for orchid diversity? Phil. Trans. R. Soc. Lond. B 359, 15231535 (2004).
  42. Pospišilová, J. Vascular plants as epiphytes. Evolution and ecophysiology. Biol. Plant. 33, 500 (1991).
  43. Holtum, J.A., Winter, K., Weeks, M.A. & Sexton, T.R. Crassulacean acid metabolism in the ZZ plant, Zamioculcas zamiifolia (Araceae). Am. J. Bot. 94, 16701676 (2007).
  44. Silvera, K. et al. Evolution along the crassulacean acid metabolism continuum. Funct. Plant Biol. 37, 9951010 (2010).
  45. Ruelens, P. et al. FLOWERING LOCUS C in monocots and the tandem origin of angiosperm-specific MADS-box genes. Nat. Commun. 4, 2280 (2013).
  46. Tapia-López, R. et al. An AGAMOUS-related MADS-box gene, XAL1 (AGL12), regulates root meristem cell proliferation and flowering transition in Arabidopsis. Plant Physiol. 146, 11821192 (2008).
  47. Pan, Z.J. et al. The duplicated B-class MADS-box genes display dualistic characters in orchid floral organ identity and growth. Plant Cell Physiol. 52, 15151531 (2011).
  48. Tsai, W.C. et al. Interactions of B-class complex proteins involved in tepal development in Phalaenopsis orchid. Plant Cell Physiol. 49, 814824 (2008).
  49. Tsai, W.C., Kuoh, C.S., Chuang, M.H., Chen, W.H. & Chen, H.H. Four DEF-like MADS box genes displayed distinct floral morphogenetic roles in Phalaenopsis orchid. Plant Cell Physiol. 45, 831844 (2004).
  50. Pan, Z.J. et al. Flower development of Phalaenopsis orchid involves functionally divergent SEPALLATA-like genes. New Phytol. 202, 10241042 (2014).
  51. Hsiao, Y.Y. et al. Transcriptomic analysis of floral organs from Phalaenopsis orchid by using oligonucleotide microarray. Gene 518, 91100 (2013).
  52. Jiao, Y. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97100 (2011).
  53. Seok, H.Y. et al. Rice ternary MADS protein complexes containing class B MADS heterodimer. Biochem. Biophys. Res. Commun. 401, 598604 (2010).
  54. Favaro, R. et al. Ovule-specific MADS-box proteins have conserved protein-protein interactions in monocot and dicot plants. Mol. Genet. Genomics 268, 152159 (2002).
  55. Parenicová, L. et al. Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15, 15381551 (2003).
  56. Masiero, S., Colombo, L., Grini, P.E., Schnittger, A. & Kater, M.M. The emerging importance of type I MADS box transcription factors for plant reproduction. Plant Cell 23, 865872 (2011).
  57. Leseberg, C.H., Li, A., Kang, H., Duvall, M. & Mao, L. Genome-wide analysis of the MADS-box gene family in Populus trichocarpa. Gene 378, 8494 (2006).
  58. Murray, M.G. & Thompson, W.F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 43214325 (1980).
  59. Zhang, G. et al. Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat. Biotechnol. 30, 549554 (2012).
  60. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  61. Hsu, C.-C. et al. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis. BMC Plant Biol. 11, 3 (2011).
  62. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  63. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573580 (1999).
  64. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, Unit 4.10 (2009).
  65. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462467 (2005).
  66. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265W268 (2007).
  67. Edgar, R.C. & Myers, E.W. PILER: identification and classification of genomic repeats. Bioinformatics 21 (suppl. 1), i152i158 (2005).
  68. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 21 (suppl. 1), i351i358 (2005).
  69. McCarthy, E.M. & McDonald, J.F. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362367 (2003).
  70. Edgar, R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).
  71. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435W439 (2006).
  72. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
  73. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988995 (2004).
  74. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 11051111 (2009).
  75. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511515 (2010).
  76. Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).
  77. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 33893402 (1997).
  78. Enright, A.J., Van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 15751584 (2002).
  79. Proost, S. et al. i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 40, e11 (2012).
  80. Thompson, J.D., Gibson, T.J. & Higgins, D.G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics Chapter 2, Unit 2.3 (2002).
  81. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555556 (1997).
  82. De Bie, T., Cristianini, N., Demuth, J.P. & Hahn, M.W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 12691271 (2006).
  83. International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763768 (2010).
  84. Tuskan, G.A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 15961604 (2006).
  85. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 27312739 (2011).
  86. Letunic, I. et al. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242244 (2002).

Download references

Author information

  1. These authors contributed equally to this work.

    • Jing Cai,
    • Xin Liu,
    • Kevin Vanneste,
    • Sebastian Proost,
    • Wen-Chieh Tsai &
    • Ke-Wei Liu


  1. Shenzhen Key Laboratory for Orchid Conservation and Utilization, National Orchid Conservation Center of China and Orchid Conservation and Research Center of Shenzhen, Shenzhen, China.

    • Jing Cai,
    • Ke-Wei Liu,
    • Li-Jun Chen,
    • Xin-Ju Xiao,
    • Guo-Qiang Zhang,
    • Meina Wang,
    • Gao-Chang Xie,
    • Guo-Hui Liu,
    • Li-Qiang Li,
    • Lai-Qiang Huang &
    • Zhong-Jian Liu
  2. Center for Biotechnology and BioMedicine, Shenzhen Key Laboratory of Gene & Antibody Therapy, State Key Laboratory of Health Science & Technology (prep) and Division of Life & Health Sciences, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China.

    • Jing Cai,
    • Ke-Wei Liu,
    • Lai-Qiang Huang &
    • Zhong-Jian Liu
  3. School of Life Science, Tsinghua University, Beijing, China.

    • Jing Cai,
    • Ke-Wei Liu &
    • Lai-Qiang Huang
  4. BGI-Shenzhen, Shenzhen, China.

    • Xin Liu,
    • Chao Bian,
    • Zhijun Zheng,
    • Fengming Sun,
    • Weiqing Liu,
    • Xun Xu,
    • Jun-Yi Wang &
    • Jun Wang
  5. Department of Plant Systems Biology, VIB, Ghent, Belgium.

    • Kevin Vanneste,
    • Sebastian Proost,
    • Ying He &
    • Yves Van de Peer
  6. Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.

    • Kevin Vanneste,
    • Sebastian Proost,
    • Ying He &
    • Yves Van de Peer
  7. Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan, Taiwan.

    • Wen-Chieh Tsai
  8. State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China.

    • Qing Xu &
    • Yi-Bo Luo
  9. Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan.

    • Yu-Yun Hsiao,
    • Zhao-Jun Pan,
    • Chia-Chi Hsu,
    • Ya-Ping Yang,
    • Yi-Chin Hsu,
    • Yu-Chen Chuang &
    • Hong-Hwa Chen
  10. Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales (AGAP), Montpellier, France.

    • Anne Dievart &
    • Jean-Francois Dufayard
  11. State Forestry Administration, Beijing, China.

    • Xue-Min Zhao &
    • Rong Du
  12. College of Forestry, South China Agriculture University, Guangzhou, China.

    • Yong-Yu Su,
    • Lai-Qiang Huang &
    • Zhong-Jian Liu
  13. Orchid Research Center, National Cheng Kung University, Tainan, Taiwan.

    • Hong-Hwa Chen
  14. Department of Genetics, Genomics Research Institute, Pretoria, South Africa.

    • Yves Van de Peer


J.C., Z.-J.L., L.-Q.H., J.W., H.-H.C., Y.V.d.P., X.L., S.P., K.V., W.-C.T., Y.-B.L., K.-W.L., X.-M.Z. and R.D. planned and coordinated the project and wrote the manuscript. W.-C.T., Y.-Y.S., Z.-J.P., C.-C.H., Y.-P.Y., Y.-C.H., Y.-C.C., L.-J.C., X.-J.X., G.-Q.Z., M.W., G.-C.X., G.-H.L. and L.-Q.L. collected and grew the plant material. J.C., W.-C.T., K.-W.L., L.-J.C. and Q.X. prepared samples. X.L., C.B., Z.Z., W.L., F.S. and K.-W.L. sequenced and processed the raw data. X.L., C.B., Z.Z., X.X., J.W. and F.S. annotated the genome. C.B., X.L., J.C., Y.H. and S.P. analyzed the gene families. C.B., X.L., S.P., K.V., Y.H. and Z.Z. conducted genome evolution analysis. J.C. conducted CAM analysis. J.C., X.L. and F.-M.S. conducted TE insertion analysis. Z.-J.L. and J.-Y.W. conducted protein kinase analysis. W.-C.T., Y.-Y.H., Z.-J.P., C.-C.H., Y.-P.Y., Y.-C.H., Y.-C.C., A.D. and J.-F.D. conducted the MADS-box gene analysis. X.L., J.C. and K.-W.L. conducted transcriptome sequencing and analysis.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,468 KB)

    Supplementary Figures 1–15, Supplementary Tables 1–13, 17 and 21–27, and Supplementary Note.

Excel files

  1. Supplementary Table 14 (9,768 KB)

    Results of the manual check of genes.

  2. Supplementary Table 15 (21 KB)

    GO enrichment analysis of gene family expansion.

  3. Supplementary Table 16 (10 KB)

    GO enrichment analysis of gene family contraction.

  4. Supplementary Table 18 (2,033 KB)

    Reads per kilobase per million mapped reads (RPKM) of all the genes in the four tissues analyzed.

  5. Supplementary Table 19 (37 KB)

    GO enrichment analysis in four tissues.

  6. Supplementary Table 20 (235 KB)

    Allelic genes from heterozygous regions.

Zip files

  1. Supplementary Data Set (76 KB)

    CAM gene alignments.

Additional data