Orchidaceae, renowned for its spectacular flowers and other reproductive and ecological adaptations, is one of the most diverse plant families. Here we present the genome sequence of the tropical epiphytic orchid Phalaenopsis equestris, a frequently used parent species for orchid breeding. P. equestris is the first plant with crassulacean acid metabolism (CAM) for which the genome has been sequenced. Our assembled genome contains 29,431 predicted protein-coding genes. We find that contigs likely to be underassembled, owing to heterozygosity, are enriched for genes that might be involved in self-incompatibility pathways. We find evidence for an orchid-specific paleopolyploidy event that preceded the radiation of most orchid clades, and our results suggest that gene duplication might have contributed to the evolution of CAM photosynthesis in P. equestris. Finally, we find expanded and diversified families of MADS-box C/D-class, B-class AP3 and AGL6-class genes, which might contribute to the highly specialized morphology of orchid flowers.
At a glance
- On the Various Contrivances by Which British and Foreign Orchids are Fertilised by Insects (Cambridge University Press, 2011).
- The chemistry of sexual deception in an orchid-wasp pollination system. Science 302, 437–438 (2003). et al.
- Orchid diversity: an evolutionary consequence of deception? Trends Ecol. Evol. 20, 487–494 (2005). &
- Crassulacean acid metabolism and epiphytism linked to adaptive radiations in the Orchidaceae. Plant Physiol. 149, 1838–1847 (2009). , , &
- Nuclear DNA contents of Phalaenopsis sp. and Doritis pulcherrima. J. Am. Soc. Hortic. Sci. 126, 195–199 (2001). et al.
- Genome size diversity in orchids: consequences and evolution. Ann. Bot. 104, 469–481 (2009). et al.
- OrchidBase 2.0: comprehensive collection of Orchidaceae floral transcriptomes. Plant Cell Physiol. 54, e7 (2013). et al.
- OrchidBase: a collection of sequences of the transcriptome derived from orchids. Plant Cell Physiol. 52, 238–243 (2011). et al.
- CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007). , &
- Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102, 5454–5459 (2005). et al.
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007). et al.
- The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009). et al.
- Dating the origin of the Orchidaceae from a fossil orchid with its pollinator. Nature 448, 1042–1045 (2007). , , , &
- InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009). et al.
- PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21, 3718–3731 (2009). et al.
- De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits. PLoS ONE 5, e15633 (2010). et al.
- Organization of cell walls in Sandersonia aurantiaca floral tissue. J. Exp. Bot. 53, 513–523 (2002). , &
- Vascular Plants as Epiphytes: Evolution and Ecophysiology (Springer-Verlag, 1989).
- Initiation of programmed cell death in self-incompatibility: role for cytoskeleton modifications and several caspase-like activities. Mol. Plant 1, 879–887 (2008). , , &
- Recognizing self in the self-incompatibility response. Plant Physiol. 125, 105–108 (2001). &
- Selection for short introns in highly expressed genes. Nat. Genet. 31, 415–418 (2002). , , , &
- Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108, 4069–4074 (2011). , &
- The age and diversification of the angiosperms re-revisited. Am. J. Bot. 97, 1296–1303 (2010). , &
- Widespread genome duplications throughout the history of flowering plants. Genome Res. 16, 738–749 (2006). et al.
- Journey through the past: 150 million years of plant genome evolution. Plant J. 66, 58–65 (2011). , , &
- Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336–348 (2009). et al.
- Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Res. 24, 1334–1347 (2014). , , &
- Inference of genome duplications from age distributions revisited. Mol. Biol. Evol. 30, 177–190 (2013). , &
- Comparative analysis of Miscanthus and Saccharum reveals a shared whole-genome duplication but different evolutionary fates. Plant Cell 26, 2420–2429 (2014). et al.
- Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell 26, 2792–2802 (2014). , , &
- The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. Sci. USA 106, 13875–13879 (2009). et al.
- The polyploidy revolution then…and now: Stebbins revisited. Am. J. Bot. 101, 1057–1078 (2014). , &
- Molecular phylogenetics of Maxillaria and related genera (Orchidaceae: Cymbidieae) based on combined molecular data sets. Am. J. Bot. 94, 1860–1889 (2007). et al.
- Subtribal and generic relationships of Maxillarieae (Orchidaceae) with emphasis on Stanhopeinae: combined molecular evidence. Am. J. Bot. 87, 1842–1856 (2000). , &
- Molecular phylogenetics of diseae (Orchidaceae): a contribution from nuclear ribosomal ITS sequences. Am. J. Bot. 86, 887–899 (1999). et al.
- Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Phil. Trans. R. Soc. Lond. B 369, 20130353 (2014). , &
- Plant Systematics: A Phylogenetic Approach (Sinauer Associates, 1999). , &
- The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10, 725–732 (2009). , &
- Ecophysiology of crassulacean acid metabolism (CAM). Ann. Bot. 93, 629–652 (2004).
- Vascular epiphytism: taxonomic participation and adaptive diversity. Ann. Mo. Bot. Gard. 74, 183–204 (1987).
- Epiphytism and pollinator specialization: drivers for orchid diversity? Phil. Trans. R. Soc. Lond. B 359, 1523–1535 (2004). , , &
- Vascular plants as epiphytes. Evolution and ecophysiology. Biol. Plant. 33, 500 (1991).
- Crassulacean acid metabolism in the ZZ plant, Zamioculcas zamiifolia (Araceae). Am. J. Bot. 94, 1670–1676 (2007). , , &
- Evolution along the crassulacean acid metabolism continuum. Funct. Plant Biol. 37, 995–1010 (2010). et al.
- FLOWERING LOCUS C in monocots and the tandem origin of angiosperm-specific MADS-box genes. Nat. Commun. 4, 2280 (2013). et al.
- An AGAMOUS-related MADS-box gene, XAL1 (AGL12), regulates root meristem cell proliferation and flowering transition in Arabidopsis. Plant Physiol. 146, 1182–1192 (2008). et al.
- The duplicated B-class MADS-box genes display dualistic characters in orchid floral organ identity and growth. Plant Cell Physiol. 52, 1515–1531 (2011). et al.
- Interactions of B-class complex proteins involved in tepal development in Phalaenopsis orchid. Plant Cell Physiol. 49, 814–824 (2008). et al.
- Four DEF-like MADS box genes displayed distinct floral morphogenetic roles in Phalaenopsis orchid. Plant Cell Physiol. 45, 831–844 (2004). , , , &
- Flower development of Phalaenopsis orchid involves functionally divergent SEPALLATA-like genes. New Phytol. 202, 1024–1042 (2014). et al.
- Transcriptomic analysis of floral organs from Phalaenopsis orchid by using oligonucleotide microarray. Gene 518, 91–100 (2013). et al.
- Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97–100 (2011). et al.
- Rice ternary MADS protein complexes containing class B MADS heterodimer. Biochem. Biophys. Res. Commun. 401, 598–604 (2010). et al.
- Ovule-specific MADS-box proteins have conserved protein-protein interactions in monocot and dicot plants. Mol. Genet. Genomics 268, 152–159 (2002). et al.
- Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15, 1538–1551 (2003). et al.
- The emerging importance of type I MADS box transcription factors for plant reproduction. Plant Cell 23, 865–872 (2011). , , , &
- Genome-wide analysis of the MADS-box gene family in Populus trichocarpa. Gene 378, 84–94 (2006). , , , &
- Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325 (1980). &
- Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat. Biotechnol. 30, 549–554 (2012). et al.
- De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010). et al.
- An overview of the Phalaenopsis orchid genome through BAC end sequence analysis. BMC Plant Biol. 11, 3 (2011). et al.
- BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
- Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
- Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, Unit 4.10 (2009). &
- Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005). et al.
- LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007). &
- PILER: identification and classification of genomic repeats. Bioinformatics 21 (suppl. 1), i152–i158 (2005). &
- De novo identification of repeat families in large genomes. Bioinformatics 21 (suppl. 1), i351–i358 (2005). , &
- LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–367 (2003). &
- MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).
- AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006). et al.
- Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
- GeneWise and Genomewise. Genome Res. 14, 988–995 (2004). , &
- TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009). , &
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). et al.
- Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007). et al.
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). et al.
- An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002). , &
- i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 40, e11 (2012). et al.
- Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics Chapter 2, Unit 2.3 (2002). , &
- PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997).
- CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006). , , &
- International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010).
- The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006). et al.
- MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011). et al.
- Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244 (2002). et al.
- Supplementary Text and Figures (2,468 KB)
Supplementary Figures 1–15, Supplementary Tables 1–13, 17 and 21–27, and Supplementary Note.
- Supplementary Table 14 (9,768 KB)
Results of the manual check of genes.
- Supplementary Table 15 (21 KB)
GO enrichment analysis of gene family expansion.
- Supplementary Table 16 (10 KB)
GO enrichment analysis of gene family contraction.
- Supplementary Table 18 (2,033 KB)
Reads per kilobase per million mapped reads (RPKM) of all the genes in the four tissues analyzed.
- Supplementary Table 19 (37 KB)
GO enrichment analysis in four tissues.
- Supplementary Table 20 (235 KB)
Allelic genes from heterozygous regions.
- Supplementary Data Set (76 KB)
CAM gene alignments.