The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla)

Journal name:
Nature Genetics
Year published:
Published online

Bamboo represents the only major lineage of grasses that is native to forests and is one of the most important non-timber forest products in the world. However, no species in the Bambusoideae subfamily has been sequenced. Here, we report a high-quality draft genome sequence of moso bamboo (P. heterocycla var. pubescens). The 2.05-Gb assembly covers 95% of the genomic region. Gene prediction modeling identified 31,987 genes, most of which are supported by cDNA and deep RNA sequencing data. Analyses of clustered gene families and gene collinearity show that bamboo underwent whole-genome duplication 7–12 million years ago. Identification of gene families that are key in cell wall biosynthesis suggests that the whole-genome duplication event generated more gene duplicates involved in bamboo shoot development. RNA sequencing analysis of bamboo flowering tissues suggests a potential connection between drought-responsive and flowering genes.

At a glance


  1. Assemblies and comparative genomics.
    Figure 1: Assemblies and comparative genomics.

    (a) Comparison of the lengths of assembled scaffolds by the pure SOAPdenovo and Phusion-meta assembly methods. (b) Venn diagram of shared orthologous gene families among five grass genomes. The gene family number is listed in each component. The number of genes within the families is noted in parentheses. (c) Genome duplication in grass genomes. The calculated KS values of the 2-member gene clusters were converted to divergence time, using a substitution rate of 6.5 × 10−9 mutations per site per year34. The y axis shows the percentage of the two-member gene clusters. MYA, million years ago. (d) Evolution of orthologous gene clusters. The black numbers above and below each branch indicate the quantity of expanded (+) or contracted (−) orthologous clusters after the corresponding speciation, respectively. The estimated numbers of clusters in the common ancestors are indicated in the rectangles. The dN/dS ratio of each branch is shown in blue. (e) Divergence time between bamboo and grass species from different subfamilies (mean Ks values are given in Supplementary Table 11). (f) Gene synteny between rice, sorghum and moso bamboo. The collinear region is located on rice chromosome 1 (40,565 to 40,983 kb; MSU RGAP 6.1; ref. 35), sorghum chromosome 3 (71,771 to 72,334 kb; ref. 36) and bamboo scaffold PH01000002 (1,890 to 2,862 kb). Non-hypothetical gene (blue), hypothetical genes (gray), LTR retrotransposons (orange), DNA transposons (purple), miniature inverted-repeat transposable elements (MITEs) (green) and other transposable elements (pink) are represented by boxes. Syntenic loci are connected by gray lines between the genomes.

  2. Recent duplication and the expression of bamboo CesA and Csl genes.
    Figure 2: Recent duplication and the expression of bamboo CesA and Csl genes.

    (a) Phylogenetic neighbor-joining tree of the CesA genes. Red branches indicate a recent duplication after speciation. Filled circles indicate the tissues where the gene had high expression. Clades A, B, C, E and G correspond to the phylogenetic tree in Supplementary Figure 12a. The divergence time of the corresponding duplication is shown in blue. The scale bar represents the bootstrap percentage of each branch. (b) Phylogenic tree of the Csl genes. The clustered CslA, CslC, CslD, CslE and CslF genes were derived from the phylogenic tree in Supplementary Figure 12b. Filled circles indicate the tissues where the gene had high expression. The divergence time of recent duplications is shown in blue beside the corresponding branch in red. (c) History of recent duplication for the CesA and Csl genes. Each bracket indicates a duplication event of the CesA or Csl genes. Divergence time is shown along a bar ranging from 0 to 50 million years ago. Filled red circles indicate genes highly expressed in the shoot.

  3. Gene expression at flowering time.
    Figure 3: Gene expression at flowering time.

    (a) Clustered transcription factor and stress-responsive genes with high expression in panicles. Gene expression was measured by quantified transcription levels (reads per kilobase of exon model per million mapped reads, RPKM37) derived from transcriptome analysis. The gene expression levels in the tip of a 20-cm-long shoot (S20), the tip of a 50-cm-long shoot (S50), the rhizome (RH), the root (RT), the panicle at the early stage (P1) and the panicle at the flowering stage (P2) were normalized to the fold change over the expression levels in the leaf (LF) and are indicated by color. The abbreviations indicating the conserved domains encoded by flowering genes are listed in Supplementary Table 16. (b) Predicted pathway in the control of flowering time in bamboo. Blue arrows indicate that the involved genes are more highly expressed in the floral tissues, whereas red double-headed arrows indicate that the genes are not activated. Single dashed arrows represent pathways that were not used during flowering. Double dashed arrow represents stronger connections between drought-responsive and FMI genes.

Accession codes

Primary accessions




  1. Lobovikov, M., Paudel, S., Piazza, M., Ren, H. & Wu, J. World Bamboo Resources: A Thematic Study Prepared in the Framework of the Global Forest Resources Assessment 2005 (Food and Agriculture Organization of the United Nations, Rome, 2007).
  2. Peng, Z. et al. Genome-wide characterization of the biggest grass, bamboo, based on 10,608 putative full-length cDNA sequences. BMC Plant Biol. 10, 116 (2010).
  3. Zhang, Y.J., Ma, P.F. & Li, D.Z. High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae). PLoS ONE 6, e20596 (2011).
  4. Gui, Y.J. et al. Insights into the bamboo genome: syntenic relationships to rice and sorghum. J. Integr. Plant Biol. 52, 10081015 (2010).
  5. Sungkaew, S., Stapleton, C.M., Salamin, N. & Hodkinson, T.R. Non-monophyly of the woody bamboos (Bambuseae; Poaceae): a multi-gene region phylogenetic analysis of Bambusoideae s.s. J. Plant Res. 122, 95108 (2009).
  6. Sharma, R.K. et al. Evaluation of rice and sugarcane SSR markers for phylogenetic and genetic diversity analyses in bamboo. Genome 51, 91103 (2008).
  7. Das, M., Bhattacharya, S. & Pal, A. Generation and characterization of SCARs by cloning and sequencing of RAPD products: a strategy for species-specific marker development in bamboo. Ann. Bot. (Lond.) 95, 835841 (2005).
  8. Chen, R. et al. Chromosome Atlas of Major Economic Plants Genome in China, Tomus IV—Chromosome Atlas of Various Bamboo Species (Science Press, Beijing, 2003).
  9. Gui, Y. et al. Genome size and sequence composition of moso bamboo: a comparative study. Sci. China C Life Sci. 50, 700705 (2007).
  10. Li, R. et al. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713714 (2008).
  11. Tuskan, G.A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 15961604 (2006).
  12. Velasco, R. et al. A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS ONE 2, e1326 (2007).
  13. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40 Database issue, D109–D114 (2012).
  14. Swigonová, Z. et al. Close split of sorghum and maize genome progenitors. Genome Res. 14, 19161923 (2004).
  15. Wendel, J.F. Genome evolution in polyploids. Plant Mol. Biol. 42, 225249 (2000).
  16. Gaut, B.S. Evolutionary dynamics of grass genomes. New Phytol. 154, 1528 (2002).
  17. Kellogg, E.A. Relationships of cereal crops and other grasses. Proc. Natl. Acad. Sci. USA 95, 20052010 (1998).
  18. Goff, S.A. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92100 (2002).
  19. Guyot, R. & Keller, B. Ancestral genome duplication in rice. Genome 47, 610614 (2004).
  20. Barker, N.P. et al. Phylogeny and subfamilial classification of the grasses (Poaceae). Ann. Mo. Bot. Gard. 88, 373457 (2001).
  21. Sánchen-Ken, J.G., Clark, L.G., Kellogg, E.A. & Kay, E.E. Reinstatement and emendation of subfamily Micrairoideae (Poaceae). Syst. Bot. 32, 7180 (2007).
  22. Bouchenak-Khelladi, Y. et al. Large multi-gene phylogenetic trees of the grasses (Poaceae): progress towards complete tribal and generic level sampling. Mol. Phylogenet. Evol. 47, 488505 (2008).
  23. Cui, K., He, C.Y., Zhang, J.G., Duan, A.G. & Zeng, Y.F. Temporal and spatial profiling of internode elongation-associated protein expression in rapidly growing culms of bamboo. J. Proteome Res. 11, 24922507 (2012).
  24. Somerville, C. Cellulose synthesis in higher plants. Annu. Rev. Cell Dev. Biol. 22, 5378 (2006).
  25. Yin, Y., Huang, J. & Xu, Y. The cellulose synthase superfamily in fully sequenced plants and algae. BMC Plant Biol. 9, 99 (2009).
  26. Schnable, P.S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 11121115 (2009).
  27. Humphreys, J.M. & Chapple, C. Rewriting the lignin roadmap. Curr. Opin. Plant Biol. 5, 224229 (2002).
  28. Boerjan, W., Ralph, J. & Baucher, M. Lignin biosynthesis. Annu. Rev. Plant Biol. 54, 519546 (2003).
  29. Hamberger, B. et al. Genome-wide analyses of phenylpropanoid-related genes in Populus trichocarpa, Arabidopsis thaliana, and Oryza sativa: the Populus lignin toolbox and conservation and diversification of angiosperm gene families. Can. J. Bot. 85, 11821201 (2007).
  30. Arora, R. et al. MADS-box gene family in rice: genome-wide identification, organization and expression profiling during reproductive development and stress. BMC Genomics 8, 242 (2007).
  31. Ehrenreich, I.M. et al. Candidate gene association mapping of Arabidopsis flowering time. Genetics 183, 325335 (2009).
  32. Fornara, F., Montaigu, A. & Coupland, G. SnapShot: control of flowering in. Arabidopsis. Cell 141, 550 e1–550.e2 (2010).
  33. Putterill, J., Robson, F., Lee, K., Simon, R. & Coupland, G. The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. Cell 80, 847857 (1995).
  34. Gaut, B.S., Morton, B.R., McCaig, B.C. & Clegg, M.T. Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci. USA 93, 1027410279 (1996).
  35. Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883D887 (2007).
  36. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551556 (2009).
  37. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621628 (2008).
  38. Peterson, D.G., Tomkins, J.P., Frisch, D.A., Wing, R.A. & Paterson, A.H. Construction of plant bacterial artificial chromosome (BAC) libraries: an illustrated guide. J. Agric. Genomics 5, 3440 (2000).
  39. Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat. Methods 6, 291295 (2009).
  40. Mullikin, J.C. & Ning, Z. The Phusion assembler. Genome Res. 13, 8190 (2003).
  41. Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 11171123 (2009).
  42. Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578579 (2011).
  43. Bonfield, J.K. & Whitwham, A. Gap5—editing the billion fragment sequence assembly. Bioinformatics 26, 16991703 (2010).
  44. Djerbi, S., Lindskog, M., Arvestad, L., Sterky, F. & Teeri, T.T. The genome sequence of black cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes. Planta 221, 739746 (2005).
  45. Suzuki, S., Li, L., Sun, Y.H. & Chiang, V.L. The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase–like genes in Populus trichocarpa. Plant Physiol. 142, 12331245 (2006).
  46. Hazen, S.P., Scott-Craig, J.S. & Walton, J.D. Cellulose synthase–like genes of rice. Plant Physiol. 128, 336340 (2002).
  47. Ehlting, J. et al. Global transcript profiling of primary stems from Arabidopsis thaliana identifies candidate genes for missing links in lignin biosynthesis and transcriptional regulators of fiber differentiation. Plant J. 42, 618640 (2005).
  48. Costa, M.A. et al. Characterization in vitro and in vivo of the putative multigene 4-coumarate:CoA ligase network in Arabidopsis: syringyl lignin and sinapate/sinapyl alcohol derivative formation. Phytochemistry 66, 20722091 (2005).
  49. Wang, L., Feng, Z., Wang, X., Wang, X. & Zhang, X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26, 136138 (2010).
  50. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., B 57, 289300 (1995).
  51. Childs, K.L. et al. The TIGR Plant Transcript Assemblies database. Nucleic Acids Res. 35 Database issue, D846D851 (2007).
  52. Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 21782189 (2003).
  53. Van Bel, M. et al. Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiol. 158, 590600 (2012).
  54. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17, 754755 (2001).
  55. De Bie, T., Cristianini, N., Demuth, J.P. & Hahn, M.W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 12691271 (2006).
  56. Bao, Z. & Eddy, S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 12691276 (2002).
  57. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 21 (suppl. 1), i351i358 (2005).
  58. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
  59. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265W268 (2007).

Download references

Author information

  1. These authors contributed equally to this work.

    • Zhenhua Peng,
    • Ying Lu,
    • Lubin Li,
    • Qiang Zhao,
    • Qi Feng &
    • Zhimin Gao


  1. Research Institute of Forestry, Chinese Academy of Forestry, Key Laboratory of Tree Breeding and Cultivation, State Forestry Administration, Beijing, China.

    • Zhenhua Peng,
    • Lubin Li,
    • Na Yao,
    • Tao Wang,
    • Kun Miao,
    • Caiyun Zhuang,
    • Xiaolu Cao,
    • Jie Chen,
    • Zhenjing Liu,
    • Zhenhua Liu &
    • Zehui Jiang
  2. National Center for Gene Research, Shanghai Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.

    • Ying Lu,
    • Qiang Zhao,
    • Qi Feng,
    • Hengyun Lu,
    • Kunyan Liu,
    • Yan Li,
    • Danlin Fan,
    • Yunli Guo,
    • Wenjun Li,
    • Yiqi Lu,
    • Qijun Weng,
    • CongCong Zhou,
    • Lei Zhang,
    • Tao Huang,
    • Yan Zhao,
    • Chuanrang Zhu,
    • Xuehui Huang,
    • Tingting Lu,
    • Zemin Ning &
    • Bin Han
  3. International Center for Bamboo and Rattan, Beijing, China.

    • Zhimin Gao,
    • Tao Hu,
    • Xinge Liu,
    • Xuewen Yang,
    • Wenli Tang,
    • Guanshui Liu,
    • Yingli Liu,
    • Licai Yuan,
    • Benhua Fei &
    • Zehui Jiang


Z.J., Z.P. and B.H. conceived the project and its components, designed the studies and contributed to the original concept of the project. Q.F., D.F., Y.G., W.L., Yiqi Lu, T. Hu, N.Y., C. Zhou and Q.W. performed DNA preparation and genome sequencing. Ying Lu, Y. Li, K.L., T.L. and X.H. performed genome data analysis. Ying Lu and T.L. performed transcriptome (RNA-seq and cDNA) analyses. Z.N., H.L. and Q.Z. developed the de novo assembly pipeline and performed de novo genome assembly. L.Z. performed BAC sequence assembly. L.L., Z.G., X.Y., T.W., K.M., C. Zhuang, X.C., W.T., G.L., Y. Liu, J.C., Zhenjing Liu, L.Y. and Zhenhua Liu collected bamboo samples and performed cytogenetics studies and functional analysis. T. Huang, Y.Z. and C. Zhu provided IT support. B.F. and X.L. coordinated the project. Ying Lu, B.H., Z.P. and Z.J. analyzed the data as a whole and wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (3 MB)

    Supplementary Note, Supplementary Figures 1–16 and Supplementary Tables 1–19

Additional data