Abstract
Gymnosperms are a unique lineage of plants that currently lack a high-quality reference genome due to their large genome size and high repetitive sequence content. Here, we report a nearly complete genome assembly for Ginkgo biloba with a genome size of 9.87 Gb, an N50 contig size of 1.58 Mb and an N50 scaffold size of 775 Mb. We were able to accurately annotate 27,832 protein-coding genes in total, superseding the inaccurate annotation of 41,840 genes in a previous draft genome assembly. We found that expansion of the G. biloba genome, accompanied by the notable extension of introns, was mainly caused by the insertion of long terminal repeats rather than the recent occurrence of whole-genome duplication events, in contrast to the findings of a previous report. We also identified candidate genes in the central pair, intraflagellar transport and dynein protein families that are associated with the formation of the spermatophore flagellum, which has been lost in all seed plants except ginkgo and cycads. The newly obtained Ginkgo genome provides new insights into the evolution of the gymnosperm genome.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The G. biloba genome project has been deposited at the National Genomics Data Center under BioProject no. PRJCA001755. Whole-genome sequencing data were deposited in the Genome Sequence Archive database under accession nos. CRA002032 and CRA002041. Source data are provided with this paper.
Code availability
All custom codes are available for research purposes from the corresponding authors upon request.
References
Hirase, S. On the spermatozoids of Ginkgo biloba (in Japanese). Bot. Mag. Tokyo 10, 325–328 (1896).
Hirase, S. Etudes sur la Fecondation et l’Embryogenie du Ginkgo biloba (second mémoire). J. Coll. Sci. Imp. Univ. Tokyo 12, 103–149 (1898).
Hirase, S. Further studies on fertilisation and embryogeny in Ginkgo biloba (in Japanese). Bot. Mag. Tokyo 32, 83–108 (1918).
Ridge, R. W., Hori, T. & Miyamura, S. I. Ginkgo Biloba—A Global Treasure from Biology to Medicine (eds Hori, T. et al.) 99-107 (Springer, 1997).
Zhao, M. X., Dong, Z. H., Yu, Z. H., Xiao, S. Y. & Li, Y. M. Effects of Ginkgo biloba extract in improving episodic memory of patients with mild cognitive impairment: a randomized controlled trial. J. Chin. Integr. Med. 10, 628–634 (2012).
Shimamura, T. J. C. On the spermatozoid of Ginkgo biloba. Cytologia 1, 416–423 (1937).
Chanderbali, A. S. et al. Conservation and canalization of gene expression during angiosperm diversification accompany the origin and evolution of the flower. Proc. Natl Acad. Sci. USA 107, 22570–22575 (2010).
Guan, R. et al. Draft genome of the living fossil Ginkgo biloba. GigaScience 5, 49 (2016).
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Liu, H., Cao, F., Yin, T. & Chen, Y. A highly dense genetic map for Ginkgo biloba constructed using sequence-based markers. Front. Plant Sci. 8, 1041 (2017).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Li, Z. et al. Early genome duplications in conifers and other seed plants. Sci. Adv. 1, e1501084 (2015).
Leebens-Mack, J. H. et al. One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574, 679–685 (2019).
Jiao, Y. N. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97–100 (2011).
Nystedt, B. et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).
Ruprecht, C. et al. Revisiting ancestral polyploidy in plants. Sci. Adv. 3, e1603195 (2017).
Roodt, D. et al. Evidence for an ancient whole genome duplication in the cycad lineage. PLoS ONE 12, e0184454 (2017).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Zwaenepoel, A. & Van de Peer, Y. Inference of ancient whole-genome duplications and the evolution of gene duplication and loss rates. Mol. Biol. Evol. 36, 1384–1404 (2019).
Damon, L. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2013).
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
SanMiguel, P., Gaut, B. S., Tikhonov, A., Nakajima, Y. & Bennetzen, J. L. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998).
Devos, K. M., Brown, J. K. M. & Bennetzen, J. L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12, 1075–1079 (2002).
Domansky, A. N. et al. Solitary HERV-K LTRs possess bi-directional promoter activity and contain a negative regulatory element in the U5 region. FEBS Lett. 472, 191–195 (2000).
Karp, G. Molekulare Zellbiologie (Springer, 2005).
Cheng, S., Xian, W., Fu, Y., Marin, B. & Melkonian, M. J. C. Genomes of subaerial Zygnematophyceae provide insights into land plant evolution. Cell 179, 1057–1067 (2019).
Yang, P. et al. Radial spoke proteins of Chlamydomonas flagella. J. Cell Sci. 119, 1165–1174 (2006).
Omoto, C. K. et al. Rotation of the central pair microtubules in eukaryotic flagella. Mol. Biol. Cell 10, 1–4 (1999).
Hou, Y. et al. Functional analysis of an individual IFT protein: IFT46 is required for transport of outer dynein arms into flagella. J. Cell Biol. 176, 653–665 (2007).
Pazour, G. J., Dickert, B. L. & Witman, G. B. The DHC1b (DHC2) isoform of cytoplasmic dynein is required for flagellar assembly. J. Cell Biol. 144, 473–481 (1999).
Smaczniak, C., Immink, R. G., Angenent, G. C. & Kaufmann, K. Developmental and evolutionary diversity of plant MADS-domain factors: insights from recent studies. Development 139, 3081–3098 (2012).
Theißen, G. Development of floral organ identity: stories from the MADS house. Curr. Opin. Plant Biol. 4, 75–85 (2001).
Krizek, B. A. & Fletcher, J. C. J. N. R. G. Molecular mechanisms of flower development: an armchair guide. Nat. Rev. Genet. 6, 688–698 (2005).
Matsumoto, S. & Fukui, H. J. A. H. ABCDE model for wild rose (Rosa rugosa Thunb. ex Murray) floral development. Acta Hortic. 751, 369–373 (2007).
McConnell, J. R. et al. Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411, 709–713 (2001).
Kidner, C. A. & Timmermans, M. C. Mixing and matching pathways in leaf polarity. Curr. Opin. Plant Biol. 10, 13–20 (2007).
Singh, A. et al. Plant small RNAs: advancement in the understanding of biogenesis and role in plant development. Planta 248, 545–558 (2018).
Townsley, B. T. & Sinha, N. R. A new development: evolving concepts in leaf ontogeny. Annu. Rev. Plant Biol. 63, 535–562 (2012).
Xia, R., Xu, J. & Meyers, B. C. The emergence, evolution, and diversification of the miR390-TAS3-ARF pathway in land plants. Plant Cell 29, 1232–1247 (2017).
Zhu, W. et al. Altered chromatin compaction and histone methylation drive non-additive gene expression in an interspecific Arabidopsis hybrid. Genome Biol. 18, 157 (2017).
Wang, C. M. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246–256 (2015).
Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A. & Korlach, J. J. N. M. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 16, 198 (2015).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Kent, W. J. J. G. R. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. Nucleic Acids Res. 32, 1792–1797 (2004).
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
She, R., Chu, J. S.-C., Wang, K., Pei, J. & Chen, N. J. G. R. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res. 19, 143–149 (2009).
Birney, E., Clamp, M. & Durbin, R. Gene wise and genomewise. Genome Res. 14, 988–995 (2004).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Wan, T. et al. A genome for gnetophytes and early evolution of seed plants. Nat. Plants 4, 82–89 (2018).
Scott, A. D. et al. A reference genome sequence for Giant Sequoia. G3 (Bethesda) 10, 3907–3919 (2020).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Zdobnov, E. M. & Apweiler, R. InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Felsenstein, J. P. J. C. PHYLIP (Phylogeny Inference Package) v.3.6 (Univ. of Washington, 2005).
Bie, T. D., Cristianini, N., Demuth, J. P. & Hahn, Bioinformatics, M. W. J. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Potter, S. C. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204 (2018).
Zhang, Z. et al. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 4, 259–263 (2006).
UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008).
Acknowledgements
This work was funded by the Key Forestry Public Welfare Project (grant no. 201504105 to G.W.), the National Key R&D Programme of China (grant no. 2019YFA0707003 to J.R.), the Natural Science Foundation of China (grant no. 31822029 to J.R.) and the Guangdong Basic and Applied Basic Research Foundation (grant no. 2019A1515111150 to B.H.).
Author information
Authors and Affiliations
Contributions
F.C., T.Y. and J.R. conceived this research. F.C., T.Y., J.R., G.W., P.C. and H.L. designed the experiments. H.L., G.W., N.H., J.H., B.H. and X.D. collected samples. H.L., N.H. and Y.C. performed DNA/RNA extraction. J.R. and S.W. performed genome assembly. C.A. and H.L. contributed to RNA-seq and the corresponding analysis. X.W., S.W., C.A., A.L. and Z.W. assessed the genome assembly. C.A. characterized the repeat content. X.W., C.A., X.S., H.F. and D.M. performed gene annotation. X.W. conducted gene family classification. X.W. and H.L. conducted synteny analyses. X.W. and C.A. performed analysis of LTR elements. H.L. and X.W. carried out analysis of phenotypic traits. H.L., X.W., G.W. and P.C. wrote most of the manuscript. T.Y., J.R. and F.C. organized and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Sequential inversion of a 170-Mb region at the terminus of chromosome 11 revealed by a genetic map.
The black columns represent the linkage groups of the genetic map; the grey columns represent the pseudochromosomes.
Extended Data Fig. 2 Assessment of gene set completeness by BUSCO.
The species involved were labelled along the x axis. The y axis represents the percentage of the coverage of complete gene models (pink), fragmented gene models (orange) and uncovered genes (yellow) in each species.
Extended Data Fig. 3 Self-alignment dotplot based on syntenic gene pairs.
The one-to-one syntenic blocks are highlighted with red circles. The blue shadow was used to mask the same alignment in unshadowed regions.
Extended Data Fig. 4 WGD event revealed by Bayesian inference of retention rates of plants.
Inference is based on 2,841 gene families. The q value along the x axis represents the retention rate. Based on the distribution of q values, no Ginkgo lineage-specific or gymnosperm-specific WGD was detected, while seed plant WGD (ζ WGD) was confirmed. We set the iteration value to 1,000 during the Hamiltonian Monte Carlo sampling process recommended by the developers.
Extended Data Fig. 5 Timing of LTR-RT insertions in different gymnosperms.
Insertion time was calculated based on sequence identity of LTRs identified in each genome assembly. The X-axis displays the insertion time (Mya), and the Y-axis represents the ratio of LTR elements in windows of every 4 Mya.
Supplementary information
Supplementary Information
Supplementary Tables 1–12, 14 and 15 and Figs. 1–8.
Supplementary Table 13
Ks values calculated against 25 species for homologous genes in protein categories including radial spoke proteins, central pair proteins, intraflagellar transport proteins, inner dyneins, outer dyneins and dyneins.
Source data
Source Data Fig. 2
Statistical source data for Fig. 2b.
Source Data Extended Data Fig. 1
Statistical source data for Extended Data Fig. 1.
Rights and permissions
About this article
Cite this article
Liu, H., Wang, X., Wang, G. et al. The nearly complete genome of Ginkgo biloba illuminates gymnosperm evolution. Nat. Plants 7, 748–756 (2021). https://doi.org/10.1038/s41477-021-00933-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41477-021-00933-x