Abstract
The model plant Physcomitrium patens has played a pivotal role in enhancing our comprehension of plant evolution and development. However, the current genome harbours numerous regions that remain unfinished and erroneous. To address these issues, we generated an assembly using Oxford Nanopore reads and Hi-C mapping. The assembly incorporates telomeric and centromeric regions, thereby establishing it as a near telomere-to-telomere genome except a region in chromosome 1 that is not fully assembled due to its highly repetitive nature. This near telomere-to-telomere genome resolves the chromosome number at 26 and provides a gap-free genome assembly as well as updated gene models to aid future studies using this model organism.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The genome assembly and de novo annotations have been deposited in Figshare at https://doi.org/10.6084/m9.figshare.22975925.v2. The Illumina reads (genomic sequencing and Hi-C) and Nanopore reads generated in this study were deposited into the NCBI SRA with the BioProject ID PRJNA742485. The V5 genome assembly was submitted to NCBI with the WGS accession ABEU00000000 under BioProject ID PRJNA13064. The updated gene model lookup table and merged annotation are available for download from https://peatmoss.plantcode.cup.uni-freiburg.de/ppatens_db/downloads.php. The ChIP-seq data generated in this study are available through NGDC (https://ngdc.cncb.ac.cn) with accession PRJCA016808. The V6 genome and annotation data have been submitted to Phytozome (https://phytozome-next.jgi.doe.gov/) and will be made available in their upcoming release.
References
Cove, D. The moss Physcomitrella patens. Annu. Rev. Genet. 39, 339–358 (2005).
Engel, P. The induction of biochemical and morphological mutants in the moss Physcomitrella patens. Am. J. Bot. 55, 438–446 (1968).
Frank, W., Ratnadewi, D. & Reski, R. Physcomitrella patens is highly tolerant against drought, salt and osmotic stress. Planta 220, 384–394 (2005).
Schaefer, D. A new moss genetics: targeted mutagenesis in Physcomitrella patens. Annu. Rev. Plant Biol. 53, 477–501 (2001).
Xu, B. et al. Contribution of NAC transcription factors to plant adaptation to land. Science 343, 1505–1508 (2014).
Rensing, S. A., Goffinet, B., Meyberg, R., Wu, S. & Bezanilla, M. The moss Physcomitrium (Physcomitrella) patens: a model organism for non-seed plants. Plant Cell 32, 1361–1376 (2020).
Vidali, L. & Bezanilla, M. Physcomitrella patens: a model for tip cell growth and differentiation. Curr. Opin. Plant Biol. 15, 625–631 (2012).
Ishikawa, M. et al. Physcomitrella STEMIN transcription factor induces stem cell formation with epigenetic reprogramming. Nat. Plants 5, 681–690 (2019).
Reski, R., Bae, H. & Toft, H. Physcomitrella patens, a versatile synthetic biology chassis. Plant Cell Rep. 37, 1409–1417 (2018).
Rensing, S. et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319, 64–69 (2008).
The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 61, 796–815 (2014).
Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).
Merchant, S. et al. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–250 (2007).
Lang, D. et al. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. Plant J. 93, 515–533 (2018).
Zimmer, A. D. et al. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions. BMC Genomics 14, 498 (2013).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 21, 1674–2052 (2021).
Li, K. et al. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. Mol. Plant 21, 1674–2052 (2021).
Han, X. et al. Two haplotype-resolved, gap-free genome assemblies of Actinidia latifolia and Actinidia chinensis shed light on regulation mechanisms of vitamin C and sucrose metabolism in kiwifruit. Mol. Plant 16, 452–470 (2022).
Yue, J. et al. Telomere-to-telomere and gap-free reference genome assembly of the kiwifruit Actinidia chinensis. Hortic. Res. 10, uhac264 (2023).
Deng, Y. et al. A telomere-to-telomere gap-free reference genome of watermelon and its mutation library provide important resources for gene discovery and breeding. Mol. Plant 15, 1268–1284 (2022).
Payne, Z. L. et al. A gap-free genome assembly of Chlamydomonas reinhardtii and detection of translocations induced by CRISPR-mediated mutagenesis. Plant Commun. 4, 100493 (2022).
Hu, J. et al. An efficient error correction and accurate assembly tool for noisy long reads. Preprint at bioRxiv https://doi.org/10.1101/2023.03.09.531669 (2023).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Podlevsky, J. D. et al. The telomerase database. Nucleic Acids Res. 36, D339–D3343 (2007).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 46, e126 (2018).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2003).
Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 1 (2019).
Goel, M., Sun, H., Jiao, W. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://doi.org/10.48550/arXiv.1207.3907 (2012).
Haas, F. B. et al. Single nucleotide polymorphism charting of P. patens reveals accumulation of somatic mutations during in vitro culture on the scale of natural variation by selfing. Front. Plant Sci. 11, 813 (2020).
Zhou, Y. & Song, B.-L. An urgent call on revisions to current genome annotation strategies. Sci. China Life Sci. 66, 1942–1943 (2023).
Parry, G. The plant nuclear envelope and regulation of gene expression. J. Exp. Bot. 66, 1673–1685 (2015).
Imaizumi, T. et al. Cryptochrome light signals control development to suppress auxin sensitivity in the moss Physcomitrella patens. Plant Cell 14, 373–386 (2002).
Prigge, M. J. et al. Physcomitrella patens auxin-resistant mutants affect conserved elements of an auxin-signaling pathway. Curr. Biol. 20, 1907–1912 (2010).
Bryan, V. S. Cytotaxonomic studies in the Ephemeraceae and Funariaceae. Bryologist 60, 103–126 (1957).
Reski, R., Faust, M. & Wang, X. Genome analysis of the moss Physcomitrella patens (Hedw.) B.S.G. Mol. Gen. Genet. 244, 352–359 (1994).
Neumann, P. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mob. DNA 2, 4 (2011).
Zhang, R.-G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Carey, S. B. et al. Gene-rich UV sex chromosomes harbor conserved regulators of sexual development. Sci. Adv. 7, eabh2488 (2021).
McClintock, B. The stability of broken ends of chromosomes in Zea mays. Genetics 26, 234–282 (1941).
Bryant, P. & Slijepcevic, P. E. Chromosome healing, telomere capture and mechanisms of radiation-induced chromosome breakage. Int. J. Radiat. Biol. 73, 1 (1998).
Kurzhals, R. L. et al. Chromosome healing is promoted by the telomere cap component Hiphop in Drosophila. Genetics 207, 949–959 (2017).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Fortin, J.-P. & Kasper, D. H. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 16, 180 (2015).
Nothjunge, S. et al. DNA methylation signatures follow preformed chromatin compartments in cardiac myocytes. Nat. Commun. 8, 1667 (2017).
Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).
Bian, Q. et al. Histone H3K9 methylation promotes formation of genome compartments in Caenorhabditis elegans via chromosome compaction and perinuclear anchoring. Proc. Natl Acad. Sci. USA 117, 11459–11470 (2020).
Yung, W.-S. et al. Histone modifications and chromatin remodelling in plants in response to salt stress. Physiol. Plant. 173, 1495–1513 (2021).
Widiez, T. et al. The chromatin landscape of the moss Physcomitrella patens and its dynamics during development and drought stress. Plant J. 79, 67–81 (2014).
Ashton, N. W. & Cove, D. J. The isolation and preliminary characterisation of auxotrophic and analogue resistant mutants of the moss, Physcomitrella patens. Mol. Gen. Genet. 154, 87–95 (1977).
Schlink, K. & Reski, R. Preparing high-quality DNA from moss (Physcomitrella patens). Plant Mol. Biol. Report. 20, 423–423 (2002).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, 884–890 (2018).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 64–770 (2011).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Ensembl/treebest. Ensembl. https://github.com/Ensembl/treebest (2016).
Chen, Y. et al. Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat. Commun. 12, 60 (2021).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Langmead, B. & Salzberg, S. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Boratyn, G. M. et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Res. 41, 29–33 (2013).
Wick, R., Schultz, M., Zobel, J. & Holt, K. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
Vaser, R. et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Aury, J.-M. & Istace, B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genom. Bioinform. 3, lqab034 (2021).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2019).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Pruitt, K., Tatusova, T. & Maglott, D. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, 501–504 (2007).
Beier, S., Tappu, R. & Huson, D. H. in Functional Metagenomics: Tools and Applications (eds Charles, T. C. et al.) 65–74 (Springer Cham, 2017).
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Zimin, A. V. & Salzberg, S. L. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLoS Comput. Biol. 16, e1007981 (2020).
Zimin, A. et al. The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 (2013).
Davey, J., Davis, S., Mottram, J. & Ashton, P. Tapestry: validate and edit small eukaryotic genome assemblies with long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.04.24.059402 (2020).
Simão, F. R., Waterhouse, R., Ioannidis, P., Kriventseva, E. & Zdobnov, E. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 5, 4–10 (2004).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Edgar, R. & Myers, E. PILER: identification and classification of genomic repeats. Bioinformatics 21, 152–158 (2005).
Price, A., Jones, N. C. & Pevzner, P. De novo identification of repeat families in large genomes. Bioinformatics 21, 351–358 (2005).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2007).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Rensing, S. et al. An ancient genome duplication contributed to the abundance of metabolic genes in the moss Physcomitrella patens. BMC Evol. Biol. 7, 130 (2007).
Sun, P. et al. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant 15, 1841–1851 (2022).
Filippova, D., Patro, R., Duggal, G. & Kingsford, C. Identification of alternative topological domains in chromatin. Algorithms Mol. Biol. 9, 14 (2014).
Lopez-Delisle, L. et al. pyGenomeTracks: reproducible plots for multivariate genomic data sets. Bioinformatics 37, 422–423 (2021).
Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Paulsen, J., Ali, T. M. & Collas, P. Computational 3D genome modeling using Chrom3D. Nat. Protoc. 13, 1137–1152 (2018).
Pettersen, E. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Haas, B. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 7 (2007).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006).
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).
Keilwagen, J., Hartung, F. & Grau, J. in Gene Prediction: Methods and Protocols (ed. Kollmar, M.) 161–177 (Humana, 2019).
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, 353–361 (2017).
Aramaki, T. et al. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, 309–314 (2019).
Mitchell, A. et al. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 47, 351–360 (2019).
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, 427–432 (2019).
Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 48, 265–268 (2020).
Törönen, P., Medlar, A. & Holm, L. PANNZER2: a rapid functional annotation web server. Nucleic Acids Res. 46, 84–88 (2018).
Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).
Chan, P. & Lowe, T. tRNAscan-SE: searching for tRNA genes in genomic sequences. In Gene Prediction: Methods and Protocols Vol. 1962 (ed. Kollman, M.) 1–14 (Humana, 2019).
Nawrocki, E. P. & Eddy, S. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Shumate, A. & Steven, L. S. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).
Wu, T. D. et al. in Statistical Genomics: Methods and Protocols (eds Mathé, E. & Davis, S.) 283–334 (Humana, 2016).
Gremme, G., Steinbiss, S. & Kurtz, S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 645–656 (2013).
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res. 9, 304 (2020).
Quinlan, A. & Hall, I. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Li, G. et al. ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing. Genome Biol. 11, R22 (2009).
Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011).
Vollger, M. R. et al. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Liu, Y. & Vidali, L. Efficient polyethylene glycol (PEG) mediated transformation of the moss Physcomitrella patens. J. Vis. Exp. 50, e2560 (2011).
Gendrel, A.-V. et al. Profiling histone modification patterns in plants using genomic tiling microarrays. Nat. Methods 2, 213–218 (2005).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Feng, J. et al. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).
Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).
Acknowledgements
We thank C. Chen from Huazhong Agricultural University and C. Yu from the Agricultural Genomics Institute at Shenzhen for their advice on chromosome analysis. We also thank H. Chen at Tsinghua University for providing assistance with P. patens genetics. This work was supported by grants from the National Key Research and Development Program of China (no. 2019YFA0906200 to J. Yan), the National Natural Science Foundation of China (nos. 31725002, 32150025 and 32030004 to J.D.), the Bureau of International Cooperation, Chinese Academy of Sciences (no. 172644KYSB20180022 to J.D.), the Shenzhen Science and Technology Program (no. KQTD20180413181837372 to J.D.), the Science Technology and Innovation Commission of Shenzhen Municipality of China (no. ZDSYS20200811142605017 to J. Yan) and the Shenzhen Outstanding Talents Training Fund to J.D. J. Yan acknowledges funding from the Innovation Program of the Chinese Academy of Agricultural Sciences and the Elite Young Scientists Program of CAAS. The gene annotation was carried out in the framework of MAdLand (http://madland.science, DFG priority program 2237). S.A.R. is grateful for funding from the DFG (RE 1697/15–1, 20–1).
Author information
Authors and Affiliations
Contributions
J. Yan, Y.M. and J.D. conceived the study. J. Yan and J.D. managed the major scientific objectives. G.B. designed the T2T genome assembly, evaluation and data analysis. J. Yao generated the plant materials. H.W. helped with the assembly and annotation. S.Z., M.Z., Y.S. and X.H. collected the sequenced samples. J. Yao, Y.J., Y.M. and J.D. designed and performed ChIP. F.B.H., D.V., M.P. and S.A.R. combined and quality-checked the gene annotations. J. Yan and G.B. led the article preparation, together with J. Yao, H.W. and J.D. All authors read and approved the final article.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 The process of graph-based gap-filling for the remaining 16 gaps.
a, Fifteen chromosome karyotype maps exhibit the remaining 16 gaps, which are marked with numerical values. The position of the centromere is the region where the chromosome constriction is situated. b, Upstream and downstream sequences of breakpoints (or gaps) that are aligned with the corresponding graphs. The Hi-C heatmaps accurately reveal gaps (indicated by the intersection of green lines) at a resolution of 1 kb. The upstream sequences (blue band) and downstream sequences (green band) of each gap are aligned with the graphs. On the corresponding pathway, the precise locations of 14 intervals where gaps occur on the graph are highlighted (blue and green bands). The gray region between the two bands represents the sequence that requires filling. Overlapping between the two bands suggests sequence redundancy, necessitating trimming and merging. In the case of gap number 11, its existence is attributed to tandem repeats, demanding estimation of the corresponding copy number before initiating gap repair. c, Two gaps occur in the complex region. The upstream and downstream intervals of two gaps are labeled using four distinct colors. The area is characterized by brief repetitive sequences, and to determine the precise paths for gap-filling, the process of mapping nanopore reads onto this region is utilized. Long reads that are capable of traversing the repetitive structure are then extracted to facilitate path building.
Extended Data Fig. 2 Genome assembly validation achieved by analyzing sequencing coverage and depth in relation to the 26 chromosomes in P. patens.
The coverage (0-100%) and depth information of Illumina and ONT sequencing reads on 26 chromosomes are illustrated in the left and right images, respectively. The statistical analysis was performed using a nonoverlapping window of 50 kbp. Except for the repetitive region adjacent to the centromere of Chr01, which was deliberately omitted from the secondary mapping findings, the ONT reads demonstrated comprehensive coverage of all other chromosomal regions. Furthermore, the sequencing depth of the multicopy rRNA region was markedly elevated, exceeding the typical chromosome sequencing depth of 66x, which further supports the notion of the presence of several copies of rRNA.
Extended Data Fig. 3 Taxonomy distribution obtained by analyzing the assembled unmapped short reads against the NR database using MEGAN6.
The tree represents the taxonomic classification of the matched sequences at the class level, with node size indicating the number of matched sequences. The word cloud displays the sequence matching results at the phylum level, with larger words indicating a greater number of matched sequences.
Extended Data Fig. 4 A fundamental overview of the main content presented in this work.
a, Distribution of 17-23 K-mer frequencies in the P. patens genome. b, A radar chart was utilized to show the quality disparity between the V6 genome and its antecedent, V3. The evaluation was based on six distinct indicators, and the findings were scrutinized to identify any discrepancies in quality between the two versions. c, A concise diagram illustrating the process of V6 assembly. d, Results of SyRI analysis showing genome sequence collinearity and structural variants. To ensure the utmost precision in capturing the genuine discrepancies between the two genome versions, the V3 sequence was fragmented into contigs (where N bases were interrupted). Then, using RaGOO software, 26 pseudochromosomes were created to align with V6.
Extended Data Fig. 5 The neighbor-joining cladogram tree of five P. patens accessions built by SNPs derived from Haas et al.31.
The genome sequencing material used in this study is denoted on the tree by an arrow. Bootstrap values under 100 replicates are shown on nodes.
Extended Data Fig. 6 Assembly accuracy validation for Chr25 in V6 by read mapping.
The above depiction aims to compare the level of collinearity displayed by Chr25 in the V3 and V6 versions. The top section of the diagram portrays the amalgamation of two pseudochromosomes in V3, with their boundaries demarcated by a solid black line. The position of the breakpoint in V6 is indicated by a dashed black line. The middle segment of the diagram illustrates the mapping results of nanopore reads (above 10 kbp). The bottom section of the illustration offers a more comprehensive view of the 5 kbp interval encompassing the breakpoint for detailed scrutiny.
Extended Data Fig. 7 Whole genome-wide Hi-C heatmap.
Hi-C interactions among 26 chromosomes at a 500 kbp resolution.
Supplementary information
Supplementary Information
Supplementary Figs. 1–30 and Note.
Supplementary Data 1
Supplementary Tables 1–23.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bi, G., Zhao, S., Yao, J. et al. Near telomere-to-telomere genome of the model plant Physcomitrium patens. Nat. Plants 10, 327–343 (2024). https://doi.org/10.1038/s41477-023-01614-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41477-023-01614-7
This article is cited by
-
Near telomere-to-telomere genome assemblies of two Chlorella species unveil the composition and evolution of centromeres in green algae
BMC Genomics (2024)
-
Synthetic moss
Nature Plants (2024)
-
Designing a synthetic moss genome using GenoDesigner
Nature Plants (2024)
-
Unlocking plant genetics with telomere-to-telomere genome assemblies
Nature Genetics (2024)
-
Plasmodesmata dynamics in bryophyte model organisms: secondary formation and developmental modifications of structure and function
Planta (2024)