Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The nearly complete genome of Ginkgo biloba illuminates gymnosperm evolution

Abstract

Gymnosperms are a unique lineage of plants that currently lack a high-quality reference genome due to their large genome size and high repetitive sequence content. Here, we report a nearly complete genome assembly for Ginkgo biloba with a genome size of 9.87 Gb, an N50 contig size of 1.58 Mb and an N50 scaffold size of 775 Mb. We were able to accurately annotate 27,832 protein-coding genes in total, superseding the inaccurate annotation of 41,840 genes in a previous draft genome assembly. We found that expansion of the G. biloba genome, accompanied by the notable extension of introns, was mainly caused by the insertion of long terminal repeats rather than the recent occurrence of whole-genome duplication events, in contrast to the findings of a previous report. We also identified candidate genes in the central pair, intraflagellar transport and dynein protein families that are associated with the formation of the spermatophore flagellum, which has been lost in all seed plants except ginkgo and cycads. The newly obtained Ginkgo genome provides new insights into the evolution of the gymnosperm genome.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Features of 12 pseudochromosomes of G. biloba.
Fig. 2: Analyses of gene family evolution and WGD events.
Fig. 3: Analysis of genes involved in leaf polarity.

Similar content being viewed by others

Data availability

The G. biloba genome project has been deposited at the National Genomics Data Center under BioProject no. PRJCA001755. Whole-genome sequencing data were deposited in the Genome Sequence Archive database under accession nos. CRA002032 and CRA002041. Source data are provided with this paper.

Code availability

All custom codes are available for research purposes from the corresponding authors upon request.

References

  1. Hirase, S. On the spermatozoids of Ginkgo biloba (in Japanese). Bot. Mag. Tokyo 10, 325–328 (1896).

    Article  Google Scholar 

  2. Hirase, S. Etudes sur la Fecondation et l’Embryogenie du Ginkgo biloba (second mémoire). J. Coll. Sci. Imp. Univ. Tokyo 12, 103–149 (1898).

    Google Scholar 

  3. Hirase, S. Further studies on fertilisation and embryogeny in Ginkgo biloba (in Japanese). Bot. Mag. Tokyo 32, 83–108 (1918).

    Article  Google Scholar 

  4. Ridge, R. W., Hori, T. & Miyamura, S. I. Ginkgo Biloba—A Global Treasure from Biology to Medicine (eds Hori, T. et al.) 99-107 (Springer, 1997).

  5. Zhao, M. X., Dong, Z. H., Yu, Z. H., Xiao, S. Y. & Li, Y. M. Effects of Ginkgo biloba extract in improving episodic memory of patients with mild cognitive impairment: a randomized controlled trial. J. Chin. Integr. Med. 10, 628–634 (2012).

    Article  Google Scholar 

  6. Shimamura, T. J. C. On the spermatozoid of Ginkgo biloba. Cytologia 1, 416–423 (1937).

    Article  Google Scholar 

  7. Chanderbali, A. S. et al. Conservation and canalization of gene expression during angiosperm diversification accompany the origin and evolution of the flower. Proc. Natl Acad. Sci. USA 107, 22570–22575 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Guan, R. et al. Draft genome of the living fossil Ginkgo biloba. GigaScience 5, 49 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).

  10. Liu, H., Cao, F., Yin, T. & Chen, Y. A highly dense genetic map for Ginkgo biloba constructed using sequence-based markers. Front. Plant Sci. 8, 1041 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  CAS  PubMed  Google Scholar 

  12. Li, Z. et al. Early genome duplications in conifers and other seed plants. Sci. Adv. 1, e1501084 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Leebens-Mack, J. H. et al. One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574, 679–685 (2019).

    Article  Google Scholar 

  14. Jiao, Y. N. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97–100 (2011).

    Article  CAS  PubMed  Google Scholar 

  15. Nystedt, B. et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).

    Article  CAS  PubMed  Google Scholar 

  16. Ruprecht, C. et al. Revisiting ancestral polyploidy in plants. Sci. Adv. 3, e1603195 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Roodt, D. et al. Evidence for an ancient whole genome duplication in the cycad lineage. PLoS ONE 12, e0184454 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zwaenepoel, A. & Van de Peer, Y. Inference of ancient whole-genome duplications and the evolution of gene duplication and loss rates. Mol. Biol. Evol. 36, 1384–1404 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. Damon, L. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2013).

    Article  Google Scholar 

  21. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).

    Article  CAS  PubMed  Google Scholar 

  22. SanMiguel, P., Gaut, B. S., Tikhonov, A., Nakajima, Y. & Bennetzen, J. L. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998).

    Article  CAS  PubMed  Google Scholar 

  23. Devos, K. M., Brown, J. K. M. & Bennetzen, J. L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12, 1075–1079 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Domansky, A. N. et al. Solitary HERV-K LTRs possess bi-directional promoter activity and contain a negative regulatory element in the U5 region. FEBS Lett. 472, 191–195 (2000).

    Article  CAS  PubMed  Google Scholar 

  25. Karp, G. Molekulare Zellbiologie (Springer, 2005).

  26. Cheng, S., Xian, W., Fu, Y., Marin, B. & Melkonian, M. J. C. Genomes of subaerial Zygnematophyceae provide insights into land plant evolution. Cell 179, 1057–1067 (2019).

    Article  CAS  PubMed  Google Scholar 

  27. Yang, P. et al. Radial spoke proteins of Chlamydomonas flagella. J. Cell Sci. 119, 1165–1174 (2006).

    Article  CAS  PubMed  Google Scholar 

  28. Omoto, C. K. et al. Rotation of the central pair microtubules in eukaryotic flagella. Mol. Biol. Cell 10, 1–4 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Hou, Y. et al. Functional analysis of an individual IFT protein: IFT46 is required for transport of outer dynein arms into flagella. J. Cell Biol. 176, 653–665 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Pazour, G. J., Dickert, B. L. & Witman, G. B. The DHC1b (DHC2) isoform of cytoplasmic dynein is required for flagellar assembly. J. Cell Biol. 144, 473–481 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Smaczniak, C., Immink, R. G., Angenent, G. C. & Kaufmann, K. Developmental and evolutionary diversity of plant MADS-domain factors: insights from recent studies. Development 139, 3081–3098 (2012).

    Article  CAS  PubMed  Google Scholar 

  32. Theißen, G. Development of floral organ identity: stories from the MADS house. Curr. Opin. Plant Biol. 4, 75–85 (2001).

    Article  PubMed  Google Scholar 

  33. Krizek, B. A. & Fletcher, J. C. J. N. R. G. Molecular mechanisms of flower development: an armchair guide. Nat. Rev. Genet. 6, 688–698 (2005).

    Article  CAS  PubMed  Google Scholar 

  34. Matsumoto, S. & Fukui, H. J. A. H. ABCDE model for wild rose (Rosa rugosa Thunb. ex Murray) floral development. Acta Hortic. 751, 369–373 (2007).

    Article  Google Scholar 

  35. McConnell, J. R. et al. Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411, 709–713 (2001).

    Article  CAS  PubMed  Google Scholar 

  36. Kidner, C. A. & Timmermans, M. C. Mixing and matching pathways in leaf polarity. Curr. Opin. Plant Biol. 10, 13–20 (2007).

    Article  PubMed  Google Scholar 

  37. Singh, A. et al. Plant small RNAs: advancement in the understanding of biogenesis and role in plant development. Planta 248, 545–558 (2018).

    Article  CAS  PubMed  Google Scholar 

  38. Townsley, B. T. & Sinha, N. R. A new development: evolving concepts in leaf ontogeny. Annu. Rev. Plant Biol. 63, 535–562 (2012).

    Article  CAS  PubMed  Google Scholar 

  39. Xia, R., Xu, J. & Meyers, B. C. The emergence, evolution, and diversification of the miR390-TAS3-ARF pathway in land plants. Plant Cell 29, 1232–1247 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Zhu, W. et al. Altered chromatin compaction and histone methylation drive non-additive gene expression in an interspecific Arabidopsis hybrid. Genome Biol. 18, 157 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Wang, C. M. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246–256 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A. & Korlach, J. J. N. M. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).

    Article  Google Scholar 

  43. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 16, 198 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).

    Article  CAS  PubMed  Google Scholar 

  48. Kent, W. J. J. G. R. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).

    Article  CAS  PubMed  Google Scholar 

  50. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).

    Article  CAS  PubMed  Google Scholar 

  51. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. Nucleic Acids Res. 32, 1792–1797 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).

    Article  Google Scholar 

  56. She, R., Chu, J. S.-C., Wang, K., Pei, J. & Chen, N. J. G. R. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res. 19, 143–149 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Birney, E., Clamp, M. & Durbin, R. Gene wise and genomewise. Genome Res. 14, 988–995 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).

  61. Wan, T. et al. A genome for gnetophytes and early evolution of seed plants. Nat. Plants 4, 82–89 (2018).

    Article  CAS  PubMed  Google Scholar 

  62. Scott, A. D. et al. A reference genome sequence for Giant Sequoia. G3 (Bethesda) 10, 3907–3919 (2020).

    Article  CAS  Google Scholar 

  63. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Zdobnov, E. M. & Apweiler, R. InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).

    Article  CAS  PubMed  Google Scholar 

  66. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Felsenstein, J. P. J. C. PHYLIP (Phylogeny Inference Package) v.3.6 (Univ. of Washington, 2005).

  68. Bie, T. D., Cristianini, N., Demuth, J. P. & Hahn, Bioinformatics, M. W. J. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).

    Article  PubMed  Google Scholar 

  69. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  PubMed  Google Scholar 

  70. Potter, S. C. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Zhang, Z. et al. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 4, 259–263 (2006).

    Article  CAS  PubMed  Google Scholar 

  72. UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the Key Forestry Public Welfare Project (grant no. 201504105 to G.W.), the National Key R&D Programme of China (grant no. 2019YFA0707003 to J.R.), the Natural Science Foundation of China (grant no. 31822029 to J.R.) and the Guangdong Basic and Applied Basic Research Foundation (grant no. 2019A1515111150 to B.H.).

Author information

Authors and Affiliations

Authors

Contributions

F.C., T.Y. and J.R. conceived this research. F.C., T.Y., J.R., G.W., P.C. and H.L. designed the experiments. H.L., G.W., N.H., J.H., B.H. and X.D. collected samples. H.L., N.H. and Y.C. performed DNA/RNA extraction. J.R. and S.W. performed genome assembly. C.A. and H.L. contributed to RNA-seq and the corresponding analysis. X.W., S.W., C.A., A.L. and Z.W. assessed the genome assembly. C.A. characterized the repeat content. X.W., C.A., X.S., H.F. and D.M. performed gene annotation. X.W. conducted gene family classification. X.W. and H.L. conducted synteny analyses. X.W. and C.A. performed analysis of LTR elements. H.L. and X.W. carried out analysis of phenotypic traits. H.L., X.W., G.W. and P.C. wrote most of the manuscript. T.Y., J.R. and F.C. organized and edited the manuscript.

Corresponding authors

Correspondence to Tongming Yin, Jue Ruan or Fuliang Cao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sequential inversion of a 170-Mb region at the terminus of chromosome 11 revealed by a genetic map.

The black columns represent the linkage groups of the genetic map; the grey columns represent the pseudochromosomes.

Source data

Extended Data Fig. 2 Assessment of gene set completeness by BUSCO.

The species involved were labelled along the x axis. The y axis represents the percentage of the coverage of complete gene models (pink), fragmented gene models (orange) and uncovered genes (yellow) in each species.

Extended Data Fig. 3 Self-alignment dotplot based on syntenic gene pairs.

The one-to-one syntenic blocks are highlighted with red circles. The blue shadow was used to mask the same alignment in unshadowed regions.

Extended Data Fig. 4 WGD event revealed by Bayesian inference of retention rates of plants.

Inference is based on 2,841 gene families. The q value along the x axis represents the retention rate. Based on the distribution of q values, no Ginkgo lineage-specific or gymnosperm-specific WGD was detected, while seed plant WGD (ζ WGD) was confirmed. We set the iteration value to 1,000 during the Hamiltonian Monte Carlo sampling process recommended by the developers.

Extended Data Fig. 5 Timing of LTR-RT insertions in different gymnosperms.

Insertion time was calculated based on sequence identity of LTRs identified in each genome assembly. The X-axis displays the insertion time (Mya), and the Y-axis represents the ratio of LTR elements in windows of every 4 Mya.

Supplementary information

Supplementary Information

Supplementary Tables 1–12, 14 and 15 and Figs. 1–8.

Reporting Summary

Supplementary Table 13

Ks values calculated against 25 species for homologous genes in protein categories including radial spoke proteins, central pair proteins, intraflagellar transport proteins, inner dyneins, outer dyneins and dyneins.

Source data

Source Data Fig. 2

Statistical source data for Fig. 2b.

Source Data Extended Data Fig. 1

Statistical source data for Extended Data Fig. 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Wang, X., Wang, G. et al. The nearly complete genome of Ginkgo biloba illuminates gymnosperm evolution. Nat. Plants 7, 748–756 (2021). https://doi.org/10.1038/s41477-021-00933-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-021-00933-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing