First telomere-to-telomere gapless assembly of the rice blast fungus Pyricularia oryzae

Li, Zhigang; Yang, Jun; Ji, Xiaobei; Liu, Jintao; Yin, Changfa; Bhadauria, Vijai; Zhao, Wensheng; Peng, You-Liang

doi:10.1038/s41597-024-03209-z

Download PDF

Data Descriptor
Open access
Published: 13 April 2024

First telomere-to-telomere gapless assembly of the rice blast fungus Pyricularia oryzae

Scientific Data volume 11, Article number: 380 (2024) Cite this article

1773 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Rice blast caused by Pyricularia oryzae (syn., Magnaporthe oryzae) was one of the most destructive diseases of rice throughout the world. Genome assembly was fundamental to genetic variation identification and critically impacted the understanding of its ability to overcome host resistance. Here, we report a gapless genome assembly of rice blast fungus P. oryzae strain P131 using PacBio, Illumina and high throughput chromatin conformation capture (Hi-C) sequencing data. This assembly contained seven complete chromosomes (43,237,743 bp) and a circular mitochondrial genome (34,866 bp). Approximately 14.31% of this assembly carried repeat sequences, significantly greater than its previous assembled version. This assembly had a 99.9% complement in BUSCO evaluation. A total of 14,982 genes protein-coding genes were predicted. In summary, we assembled the first telomere-to-telomere gapless genome of P. oryzae, which would be a valuable genome resource for future research on the genome evolution and host adaptation.

Telomere-to-telomere genome assembly of sorghum

Article Open access 02 August 2024

The phased telomere-to-telomere reference genome of Musa acuminata, a main contributor to banana cultivars

Article Open access 16 September 2023

Near telomere-to-telomere genome of the model plant Physcomitrium patens

Article 26 January 2024

Background & Summary

Pyricularia oryzae (syn., Magnaporthe oryzae), an ascomycete fungal pathogen, causes rice blast, one of the most destructive diseases of rice throughout the world^1,2. The pathogen is an important and long-established model species for understanding fungal-plant interactions^3,4. Previously, we sequenced and assembled the first genomes of field strains (P131 and Y34) and performed a comparative analysis between the laboratory and field strains, which demonstrated that translocation of transposable elements (TEs), gain or loss of isolate-specific genes and gene family expansion are essential factors, delimiting genomic plasticity and adaptability of P. oryzae⁵. Although these assemblies had facilitated the understanding of the genome characteristics of P. oryzae, the genome of the two strains were highly fragmented to more than one thousand scaffolds, for Sanger (2-fold) and 454 (18-fold) sequencing technologies were used in the previous study. Recently, over 50 genomes of different strains of P. oryzae have been available in public genome databases. These genomes were sequenced on the next-generation sequencing platforms, such as second-generation sequencing platforms (e.g., Illumina sequencers) and/or third-generation sequencing platforms [e.g., Pacific Biosciences (PacBio)], which facilitated the genetic studies of genomic changes and pathogenicity variation within P. oryzae^6,7,8. However, currently most of these assemblies are fragmented and contain a large number of unplaced contigs and/or gaps owing to the presence of repetitive DNA elements in the P. oryzae genomes, which prevented the dissection of molecular mechanisms of adaptive evolution. Since the importance of genome assembly completeness in genomic analysis, we re-assemble the genome of P. oryzae stain P131 by combining Illumina, PacBio sequencing and high throughput chromatin conformation capture (Hi-C) mapping, which was the first telomere-to-telomere gapless assembly of the P. oryzae genome.

A total of 10.03 Gb PacBio long-read sequencing data (~250x genome coverage) and 4.44 Gb Illumina short-read sequencing data were generated (Table 1). Hi-C library was prepared, sequenced and generated 5.57 Gb sequencing data (~140x genome coverage). The long reads were de novo assembled and corrected. The short reads were used to polish the assembly. Redundant genomic contigs or mitochondrial contigs were then removed. The Hi-C sequencing data were used to anchor and refined remained contigs. The mitochondrial genome was assembled independently by Mitochondrial Long-read Iterative Assembly (MLIA) pipeline⁹. The final polishing of the complete genome was performed. Finally, seven gapless chromosomes (43,237,743 bp with a contig N50 of 7.05 Mb; Fig. 1a) and a circular mitochondrial genome (34,866 bp; Fig. 1b) were constructed in the final assembly (Fig. 2). The new assembly represented a significant improvement over the previous version GCA_000292605.1^5,10 (1,823 assembled contigs and contig N50 = 12.3 kb; see Table 2 and Fig. 3).

Table 1 Summary of sequencing raw data of P. oryzae strain P131.

Full size table

Table 2 Summary of the genome assembly.

Full size table

The nuclear genome was annotated by Braker2 pipeline¹¹. The mitochondrial genome was annotated by MFannot¹² using genetic code 4. In conclusion, the nuclear genome is predicted to contain 14,968 genes (including 20,797 transcripts), and the mitochondrial genome is likely to carry 14 conserved protein-coding genes (Table 3). A total of 99.9% of the BUSCOs were mapped onto the P131 genome assembly. Approximately 14.31% of the genome carried repeat sequences, most of which were TEs, which was significantly greater than the previous version (Table 4).

Table 3 Detailed summary of assembled chromosomes.

Full size table

Table 4 Classification of repeat sequences.

Full size table

The telomere repeat sequence (TRS) (TTAGGG)_n was presented on both ends of chromosomes 2, 4, 5, 6, and 7 and one end of chromosomes 1 and 3 in our assembly. We then compared the TRS in the published near-complete assembled genome of P. oryzae strains with the genome assembly generated in this study. Interestingly, minority deficiency and telomere variability of TRSs in P. oryzae were extensively observed, which may play subtle roles in pathogenic adaptation^13,14,15. In summary, we assembled the first telomere-to-telomere gapless genome of P. oryzae, which can be instrumental in understanding the genome evolution and host adaptation in the rice blast fungus.

Methods

Sampling and DNA extraction

The P. oryzae strain P131 was grown and maintained on oatmeal tomato agar (OTA) plates¹⁶. Conidia were produced on OTA plates and harvested from 7-day culture plates grown at 25 °C under constant fluorescent light. Hyphae were collected from 2-day-old cultures in complete medium shaken at 150 rpm at 25 °C. Genomic DNA extracted from vegetative mycelia using cetyltrimethylammonium bromide (CTAB) protocol was used for genome sequencing¹⁷.

Illumina, PacBio and Hi-C sequencing

Genome sequencing was conducted on Pacific Biosciences Sequel (PacBio, Menlo Park, CA) at CapitalBio Technology Co., Ltd (Beijing, China). Qualified genomic DNA was fragmented with G-tubes (Covaris, Woburn, MA, USA) and end-repaired to prepare SMRTbell DNA template libraries (with fragment size of >10 kb selected). Library quality was detected by Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA, Q33230). The average fragment size was estimated on Bioanalyzer 2100 (Agilent, Santa Clara, CA). SMRT sequencing was performed on the Pacific Biosciences RSII sequencer (PacBio, Menlo Park, CA) according to standard protocols using the P4-C2 chemistry. A total of 10.03 Gb PacBio sequencing data with a subread N50 of 14.5 kb. In addition, Illumina HiSeq X Ten sequencer using paired-end technology was also used to perform genome sequencing and 4.44 Gb sequencing data (150 bp paired-end reads) were yielded at CapitalBio Technology Co., Ltd (Beijing, China).

Hi-C library was prepared from cross-linked chromatins of fungal mycelia by Novogene Co., Ltd (Beijing, China). In brief, the tissue was ground and then cross-linked with 4% formaldehyde solution. After the sample of crosslinking reaction and cell lysis, nuclei were digested with 4-cutter restriction enzyme DpnII. Subsequently, ligated DNA was purified and fragmented into 300 bp size on average. The constructed Hi-C library was sequenced by Illumina NovaSeq 6000. 5.57 Gb paired-end sequencing data (150-bp length) were generated. The Hi-C maps from raw data were performed by Juicer (v1.6)¹⁸, followed by using a manually correction with Juicebox (v2.13.07)¹⁹.

RNA sequencing and analysis

Total RNA was extracted from conidia and hyphae with the Trizol reagent (Invitrogen, Carlsbad, CA, USA, 15596026) and then enriched by RNeasy Pure mRNA Bead Kit (Qiagen, Germany), respectively. High-throughput cDNA libraries were prepared according to the Illumina whole transcriptome library preparation protocol and sequenced on the Illumina GA platform by the BGI Genomics (Shenzhen, China)²⁰. Quality control was performed by FastQC (v0.11; https://github.com/s-andrews/FastQC). RNA-Seq data were mapped to P. oryzae by HISAT2 (v2.2.1)²¹, and SAMTools (v1.12)²² were used to evaluate read alignments.

Genome assembly

The de novo long-read assembler Canu v2.1.1²³ (parameters: genomeSize = 44 m corOutCoverage = 200 corMinCoverage = 2 minReadLength = 4000 minOverlapLength = 800 correctedErrorRate = 0.050) was used to assemble PacBio reads to generated draft contigs, which were then corrected by GCpp v1.9 (https://github.com/PacificBiosciences/gcpp; parameters:–algorithm = arrow -x 5 -X 200 -q 40) using PacBio long-reads. The polishing step was performed by Pilon v1.23²⁴ (parameters:–changes–vcf) using the Illumina short reads. Contigs were considered redundant if they aligned concordantly (identity >99%) with another contig, and the redundant contigs, along with mitochondrial contigs, were removed, resulting in a total of 13 contigs. The Hi-C sequencing data were used to anchor all 13 contigs using Juicer v1.6¹⁸, resulting in 7 scaffolds, which were further refined using Juicebox v2.13.07¹⁹. Gaps within the scaffolds were filled using LR_Gapcloser²⁵. We then manually checked whether long reads aligned the bridging cross the gaps, or whether overlapping contig ends (>20 kb length and 99.9% sequence identity) existed.The mitochondrial genome was assembled independently by Mitochondrial Long-read Iterative Assembly (MLIA) pipeline⁹. The final polishing of the complete genome was performed again using Pilon v1.23²⁴.

Gene model and function annotations

Repetitive sequences of P. oryzae strain P131 was firstly de novo identified via RepeatModeler (v2.0.1)²⁶ and masked by RepeatMasker (v4.1.1)²⁷ (parameters: -e rmblast -pa 30 -xsmall -nolow -norna -gff -a). The nuclear genome was annotated by Braker2 pipeline¹¹ (parameters: –softmasking –gff3 –fungus –gth2traingenes –prg = gth), combining three aspects evidences: ab initio prediction, homologous proteins, and RNA-Seq evidences. The AUGUSTUS v3.4.0²⁸ and Genemark-EP+²⁹ was used as ab initio prediction tools in the pipeline. All proteins of the genus Pyricularia in the Uniref100 database³⁰ were collected and the 100% non-redundant protein dataset was built by cd-hit³¹ (parameters: -c 1.00 -aS 1.00 -aL 1.00 -n 5 -M 20000), which was used as the protein-based training evidence. The GenomeThreader v1.7.3³² was used as the alignment tool. RNA-Seq data previous used²⁰ (i.e. SRR15170638³³, SRR15170637³⁴ and SRR15170636³⁵) were aligned by HISAT2 (v2.2.1)²¹ (parameters: -t -dta). The mitochondrial genome was annotated by MFannot¹² with genetic code 4.

Data Recodes

The raw genomic sequencing data used and/or analyzed during the current study are available at NCBI Sequence Read Archive database (Accession number SRR24890910³⁶, SRR24890911³⁷ and SRR24890912³⁸). The assembled genome was deposited under the same BioProject with P. oryzae strain P131 at NCBI (Accession number: GCA_000292605.2³⁹; BioProject ID: PRJNA82693; BioSample ID: SAMN31867770). The accession numbers from Chr1 to Chr7 chromosome sequences were CP114135 to CP114141, respectively. And the accession number corresponding to the mitochondrial genome sequence was CP114142.

Technical Validation

DNA sample quality

The DNA quality was detected using Qubit (Thermo Fisher Scientific, Waltham, MA) and Nanodrop (Thermo Fisher Scientific, Waltham, MA).

Sequencing data assessment

The short read data were assessed by fastp v0.23⁴⁰. The genomic short sequencing reads had 49.75% GC content. The Q20 and Q30 percentages were 97.1% and 92.06%, respectively. The Hi-C sequencing data had 50.5% GC content, and had quality scores of 97.67% (Q20) and 93.64% (Q30), respectively.

Evaluation of the genome assembly

The genome assembly quality was evaluated through the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool with the “fungi_odb10” lineage as a reference dataset. The results showed that 99.9% of all 758 BUSCO markers were assembled, implying a high level of completeness of the assembly. In addition, the results generated from “ascomycota_odb10” lineage showed 99.4% of all 1706 BUSCO markers were include (Table 2).

Code availability

The published softwares used in this work were cited in the Methods section. If no detailed parameters were mentioned for the software, default parameters were applied.

References

Valent, B. & Chumley, F. G. Molecular genetic analysis of the rice blast fungus, magnaporthe grisea. Annu Rev Phytopathol 29, 443–67 (1991).
Article CAS PubMed Google Scholar
Talbot, N. J. On the trail of a cereal killer: Exploring the biology of Magnaporthe grisea. Annu Rev Microbiol 57, 177–202 (2003).
Article CAS PubMed Google Scholar
Ebbole, D. J. Magnaporthe as a model for understanding host-pathogen interactions. Annu Rev Phytopathol 45, 437–56 (2007).
Article CAS PubMed Google Scholar
Dean, R. A. et al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature 434, 980–6 (2005).
Article ADS CAS PubMed Google Scholar
Xue, M. et al. Comparative analysis of the genomes of two field isolates of the rice blast fungus Magnaporthe oryzae. PLoS Genet 8, e1002869 (2012).
Article PubMed PubMed Central Google Scholar
Zhang, H., Zheng, X. & Zhang, Z. The Magnaporthe grisea species complex and plant pathogenesis. Mol Plant Pathol 17, 796–804 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bao, J. et al. PacBio Sequencing Reveals Transposable Elements as a Key Contributor to Genomic Plasticity and Virulence Variation in Magnaporthe oryzae. Mol Plant 10, 1465–1468 (2017).
Article CAS PubMed Google Scholar
Wang, Y. et al. Genome Sequence of Magnaporthe oryzae EA18 Virulent to Multiple Widely Used Rice Varieties. Molecular Plant-Microbe Interactions 35, 727–730 (2022).
Article CAS PubMed Google Scholar
Ji, X. et al. Mitochondrial characteristics of the powdery mildew genus Erysiphe revealed an extraordinary evolution in protein-coding genes. Int J Biol Macromol 230, 123153 (2023).
Article CAS PubMed Google Scholar
Xue, M. et al. Genome assembly MoP131_2.0. GenBank https://identifiers.org/ncbi/GCA_000292605.1 (2013).
Brůna, T. et al. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3, lqaa108 (2021).
Article PubMed PubMed Central Google Scholar
Valach, M. et al. Widespread occurrence of organelle genome-encoded 5S rRNAs including permuted molecules. Nucleic Acids Res 42, 13764–77 (2014).
Article CAS PubMed PubMed Central Google Scholar
Farman, M. L. Telomeres in the rice blast fungus Magnaporthe oryzae: the world of the end as we know it. FEMS Microbiol Lett 273, 125–32 (2007).
Article CAS PubMed Google Scholar
Peng, Z. et al. Effector gene reshuffling involves dispensable mini-chromosomes in the wheat blast fungus. PloS Genet 15, e1008272 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rehmeyer, C. et al. Organization of chromosome ends in the rice blast fungus, Magnaporthe oryzae. Nucleic Acids Res 34, 4685–701 (2006).
Article CAS PubMed PubMed Central Google Scholar
Peng, Y.-L. & Shishiyama, J. Temporal sequence of cytological events in rice leaves infected with Pyricularia oryzae. Botany 66, 730–735 (1988).
Google Scholar
Liu, X. et al. Prp19-associated splicing factor Cwf15 regulates fungal virulence and development in the rice blast fungus. Environ Microbiol. 10, 5901–5916 (2021).
Article Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–8 (2016).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, Z. et al. Transcriptional Landscapes of Long Non-coding RNAs and Alternative Splicing in Pyricularia oryzae Revealed by RNA-Seq. Front Plant Sci 12, 723636 (2021).
Article PubMed PubMed Central Google Scholar
Kim, D. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–9 (2009).
Article PubMed PubMed Central Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Xu, G. C. et al. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 8, giy157 (2019).
Article PubMed Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 4, Unit 4.10 (2004).
Google Scholar
Keller, O. et al. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27, 757–63 (2011).
Article CAS PubMed Google Scholar
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform 2, lqaa026 (2020).
Article PubMed PubMed Central Google Scholar
Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–32 (2015).
Article CAS PubMed Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–9 (2006).
Article CAS PubMed Google Scholar
Gremme, G. et al. Engineering a software tool for gene structure prediction in higher organisms. Information and Software Technology 47, 965–978 (2005).
Article Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR15170638 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR15170637 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR15170636 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR24890910 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR24890911 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR24890912 (2023).
Li, Z. et al. Genome assembly PoP131. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000292605.2 (2023).
Chen, S. et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34, 884–890 (2018).
Article Google Scholar

Download references

Acknowledgements

This study was funded by China Agricultural Research System (Grant No. CARS-01-43). The funders had no roles in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

These authors contributed equally: Zhigang Li, Jun Yang.

Authors and Affiliations

MARA Key Laboratory of Pest Monitoring and Green Management, College of Plant Protection, China Agricultural University, Beijing, 100193, China
Zhigang Li, Jun Yang, Jintao Liu, Changfa Yin, Vijai Bhadauria, Wensheng Zhao & You-Liang Peng
Sanya Institute of Breeding and Multiplication/School of Tropical Agriculture and Forestry, Hainan University, Haikou, 570228, China
Zhigang Li & Xiaobei Ji

Authors

Zhigang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobei Ji
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Changfa Yin
View author publications
You can also search for this author in PubMed Google Scholar
Vijai Bhadauria
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
You-Liang Peng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zhigang Li, Jun Yang, Xiaobei Ji, Jintao Liu, and Changfa Yin performed the experiments. All authors analyzed the data. You-Liang Peng, Zhigang Li, and Jun Yang designed the study. You-Liang Peng, Zhigang Li, Jun Yang, and Vijai Bhadauria wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to You-Liang Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Z., Yang, J., Ji, X. et al. First telomere-to-telomere gapless assembly of the rice blast fungus Pyricularia oryzae. Sci Data 11, 380 (2024). https://doi.org/10.1038/s41597-024-03209-z

Download citation

Received: 23 August 2023
Accepted: 02 April 2024
Published: 13 April 2024
DOI: https://doi.org/10.1038/s41597-024-03209-z