Abstract
Fusarium oxysporum is an asexual filamentous fungus that causes vascular wilt in hundreds of crop plants and poses a threat to public health through Fusariosis. F. oxysporum f. sp. conglutinans strain Fo5176, originally isolated from Brassica oleracea, is pathogenic to Arabidopsis, making it a model pathosystem for dissecting the molecular mechanisms underlying host-pathogen interactions. Assembling the F. oxysporum genome is notoriously challenging due to the presence of repeat-rich accessory chromosomes. Here, we report a gap-free genome assembly of Fo5176 using PacBio HiFi and Hi-C data. The 69.56 Mb assembly contained 18 complete chromosomes, including all centromeres and most telomeres (20/36), representing the first gap-free genome sequence of a pathogenic F. oxysporum strain. In total, 21,460 protein-coding genes were annotated, a 26.3% increase compared to the most recent assembly. This high-quality reference genome for F. oxysporum f. sp. conglutinans Fo5176 provides a valuable resource for further research into fungal pathobiology and evolution.
Similar content being viewed by others
Background & Summary
Fusarium oxysporum species complex is a group of soil-borne plant pathogens with a broad host range worldwide, causing severe economic losses in valuable crops such as cotton, tomato, banana and watermelon1,2,3,4,5. This species complex consists of various formae speciales (f. sp.) that infect hundreds of plant species, causing vascular wilt diseases4,6,7,8,9. They are also natural producers of various toxic metabolites, such as fusaric acid, posing threats to plant and human health10,11. Furthermore, they can act as disease agents for immune-compromised humans or other mammals12,13. Thus, elucidating the molecular and evolutionary mechanisms underlying the pathogenesis of F. oxysporum is crucial for both agricultural safety and public health. F. oxysporum strain Fo5176 was first isolated from Brassica oleracea (cabbage)14,15 and is pathogenic to multiple ecotypes of Arabidopsis thaliana16. The Arabidopsis-Fo5176 pathosystem has been established to study host-pathogen interactions16,17. F. oxysporum has a dynamic genome organization, containing conserved “core” chromosomes and lineage-specific (LS) “accessory” chromosomes. The accessory chromosomes are typically repeat-rich (>74%) due to the enrichment of various transposons18, making them extremely challenging to assemble accurately and completely. To date, there has been no reported complete, gap-free genome assembly of pathogenic F. oxysporum. Although a chromosome-level genome sequence of the Fo5176 strain was assembled using PacBio Sequel long reads and Illumina short-reads, it still has many gaps, leading to incomplete and incorrect annotation of genes17,19. Moreover, telomere and centromere sequences were unreported in previous assemblies, leaving these complex genomic regions unknown. A gap-free reference genome resource for Fo5176 would provide new insights into the molecular and evolutionary mechanisms underlying fungal growth, development, pathogenicity and mycotoxin production.
To produce a gapless reference genome of Fo5176, we extracted genomic DNA from fungal mycelia and generated 6.27 Gb of continuous long reads (CLR) (290 × coverage) from the PacBio single-molecule real-time (SMRT) sequencing platform, and 2.40 Gb of PacBio circular consensus sequencing (CCS) high-fidelity (HiFi) reads (34.17 × coverage) (Table 1). We also used a previously reported Hi-C (high-throughput chromatin conformation capture sequencing) dataset17 to anchor contigs onto chromosomes. During the genome assembly process, the raw CCS HiFi reads of Fo5176 were assembled with four long-read assemblers: Hifiasm20, HiCanu21, Flye22, and NextDenovo23. The CLR reads were first corrected by HiCanu and then assembled by Hifiasm. All draft genome assemblies were then merged one by one using quickmerge after being individually polished by Nextpolish24. The raw contigs were then scaffolded and corrected using Hi-C reads with the Juicer25/3D-DNA pipeline26. The final assembly contained 18 chromosomes (Fig. 1A) with a contig N50 of 4.37 Mb, a significant improvement ( + 29.2%) over the previous assembly by Fokkens et al. (hereafter Fokkens assembly)17 (Table 2). The number of chromosomes was determined by centromeric interaction regions detected in the Hi-C contact map (Fig. 1A)27. Consistently, each contig was represented as a single chromosome, indicating a gapless assembly. Comparing the two assemblies using the D-GENIES dotplot28, we found that Chr19 and Chr17 of the previous assembly were merged into a single chromosome in the new assembly (Fig. 2A–C), which was supported by mapped HiFi reads spanning the junction (Fig. 2D).
It has been reported that F. oxysporum strain Fo47, a fungal endophyte and biocontrol agent, carries 11 core chromosomes and a single accessory (Chr7) chromosome29. To identify the characteristics of chromosomes in Fo5176, a whole-genome sequence alignment between Fo5176 and Fo47 using D-GENIES dotplot28 revealed that the size of the Fo5176 sequence mapped to the 10 core chromosomes of Fo47 ranged from 1.9 Mb to 6.5 Mb, including Chr01, Chr03, Chr05, Chr06, Chr07, Chr08, Chr09, Chr12, Chr15, and Chr17 (Fig. S1), suggesting a set of 10 entire core chromosomes. Furthermore, a translocation between the chimeric (core/lineage-specific) chromosomes Chr04 and Chr13 of Fo5176 was observed. For LS chromosomes, the Chr14 in Fo5176 is the counterpart to the LS Chr7 in Fo47. In addition to Chr14, five chromosomes (Chr02, Chr10, Chr11, Chr16, and Chr18) did not align to any Fo47 chromosomes (Fig. S1), indicative of five additional LS chromosomes in the Fo5176 genome. In total, Fo5176 contained 10 core chromosomes, two chimeric chromosomes (Chr04 and Chr13) and 6 LS chromosomes.
This gap-free assembly genome captured all centromeres on 18 chromosomes (Figs. 3A,B, 4). A total of 20 telomeres were identified, missing 16 telomeres including the 5′ end of chromosomes 1, 6, 8, 10, 13, 16 and 17, and the 3′ end of chromosomes 2, 3, 4, 5, 7, 8, 11, 13 and 16 (Figs. 3C,D, 4). Compared to the Fokkens assembly sized at 67.98 Mb with an N50 of 3.38 Mb, the 69.56 Mb gap-free assembly was larger with a contig N50 of 4.37 Mb (Table 2). A more comprehensive comparison between the previous study and this assembly using GenomeSyn plot30 showed that six gaps on Chr02, Chr03 and Chr11 were closed (Fig. 4). Furthermore, four small inversions on Chr03, Chr07 and Chr14, and one large inversion in the region of the centromere on Chr18 were corrected in the gap-free Fo5176 genome (Fig. 4).
For annotation of the protein-coding genes, three different sources of evidence were integrated, including ab initio prediction, Fusarium homologous proteins and RNA-seq data. The same annotation pipeline was also applied to the Fokkens assembly17, allowing a fair comparison of the two assemblies. A total of 21,460 protein-coding genes were identified in our gap-free assembly (Fig. 1B; Table 2), 26.3% more than previously reported for the Fokkens assembly. Repetitive elements (REs) are enriched in F. oxysporum accessory chromosomes and associated with virulence factors responsible for pathogenicity18. In our gap-free assembly, REs accounted for 20.13% with a high density in accessory chromosomes such as chromosomes 2, 16, and 18. The most abundant REs were DNA transposons (~7.89% of REs), of which tc1-is630-pogo (2.98%) and hobo-activator (2.26%) are the most abundant families. The proportions of REs in the gapless assembly differed greatly from the Fokkens assembly, specifically adding 100% (8,155 bp) small RNA and losing all Penelope and SINEs in the new assembly, respectively (Table 2). Moreover, the unclassified elements accounted for 4.72% of the genome (Table 2).
Fusarium oxysporum species are natural producers of bioactive secondary metabolites (SMs), some of which play important roles in pathogenicity, antivirals, defense response and nutrition acquisition10,11. For example, Fusaric acid contributes to the virulence of F. oxysporum in plant and mammalian hosts10,11. In the Fo5176 genome, 58 putative secondary metabolite (SM) gene clusters containing 71 key biosynthetic genes were predicted by antiSMASH v6.1.131, including 15 nonribosomal peptide synthetases (NRPS), 14 polyketide synthases (PKS), 13 NRPS-like genes, 11 terpenes, 5 indoles and 2 betalactones (Figs. 1B, 5; Table S1), with no predicted SM gene clusters on chromosomes 10, 14, 16 and 18. Effectors are important virulence factors of F. oxysporum during plant infection. F. oxysporum is known to secrete a group of small cysteine-rich proteins called effectors, such as SIX (secreted in xylem) proteins, which can suppress the host immunity and modify host cellular activities to promote infection9,32,33,34,35,36. In the Fo5176 gap-free genome, a total of 579 putative effectors were predicted across 18 chromosomes (Table S2) based on a pipeline integrating three tools: EffectorP37, SignalP38 and TMHMM39. The predicted effectors and putative secondary metabolite gene clusters provide a resource for future characterization of new pathogenicity factors of Fo5176 and elucidation of their molecular mechanisms.
In summary, this gap-free genome assembly is the first for the pathogenic F. oxysporum, one of the most common and devastating fungal pathogens of crop plants, as well as human pathogens. This genome represents a major improvement in terms of contiguity and accuracy and will serve as an essential resource for researchers studying this fungus. The genome assembly and annotation are not only important for decoding the mechanisms of plant-pathogen interactions using the model plant Arabidopsis as a host, but also beneficial for elucidating the dynamic evolution of Fusarium species.
Methods
Fungal cultivation, DNA extraction, PacBio library preparation and sequencing
The Fo5176 strain used in this study was routinely grown on potato dextrose agar (PDA) medium at 25 °C. For genomic DNA isolation, the hyphae of Fo5176 were harvested from a 2-day-old potato dextrose broth (PDB) culture set at 150 rpm and immediately frozen in liquid nitrogen. High-molecular-weight genomic DNA was extracted using a cetyltrimethylammonium bromide (CTAB) method40. A PacBio HiFi sequencing library was prepared for sequencing according to PacBio’s standard protocol. The genomic DNA was fragmented using a Covaris g-TUBE device, and the size and concentration were measured with an Agilent 2100 Bioanalyzer. DNA damage repair, end repair, A-tailing and hairpin adapter ligation were performed using the PacBio SMRT Express Template Prep Kit 2.0. The library was then treated with nuclease using SMRTbell Enzyme Cleanup Kits (PacBio). For the PacBio CLR library construction, the high-quality genomic DNA was extracted using the standard CTAB extraction protocol. The DNA concentration was measured using the Qubit dsDNA BR Assay Kit on the Qubit Fluorometer, and the DNA integrity was evaluated using the Agilent 2100 Bioanalyzer system. The final CLR SMRTbell sequencing library was prepared using the SMRTbell Express Prep kit v2.0 Protocol and sequenced on the PacBio Sequel II system in CLR mode with the Sequel II Sequencing Kit 2.0. The resulting library was sequenced at Biomarker Technologies Corporation (Qingdao, China) and the raw CCS data was generated using the CCS algorithm.
Genome assembly
To improve the quality of the genome assembly, raw contigs of Fo5176 were assembled with four long-read de novo assemblers using CCS reads: Hifiasm v.0.16.1 with the ‘-primary’ option20, HiCanu v.1.4 with the ‘-pacbio-hifi oeaErrorRate = 0.001’ options21, Flye v.2.9 with the ‘-pacbio-hifi’ option22, and NextDenovo with the ‘parameters: minimap2_options_cns = -x ava-hifi’ option23. Additionally, 6.27 Gb of PacBio CLR reads (290 × read depth; average length: 14,715 bp) were generated using the PacBio Sequel II system and used in the genome assembly. The CLR reads were assembled using Hifiasm following the HiCanu v.1.4 correction. All draft assemblies were assessed and then merged one by one using the single-scaffolding tool quickmerge v.0.3 under default parameters. Firstly, the draft assemblies of NextDenovo and Flye were merged; then the Hifiasm CCS draft assembly was merged on top of that, followed by the HiCanu draft assembly, and finally the Hifiasm CLR draft assembly. The final merged assembly was further polished by NextPolish v.1.4.024 with the HiFi reads.
RNA extraction and sequencing
To annotate the protein-coding genes, total RNA was extracted using the TRIzol approach (Invitrogen, USA)17 from 2-day-old PDB hyphae of Fo5176. After evaluating RNA integrity using the RNA Nano 6000 Assay Kit, the high-quality RNA was used for total mRNA-Seq library construction by the Illumina TruSeq RNA Library Prep kit (Illumina, CA), following the manufacturer’s instructions. RNA-seq analysis was performed on an Illumina Novaseq. 6000 sequencer at Biomarker Technologies Corporation (Qingdao, China). A total of 5.1 Gb of clean, paired-end (2 × 150 bp) RNA-seq data were obtained, which were then checked for quality control using fastp v.0.23.241 and mapped to the assembled genome sequence using Hisat2 v.2.1.042 under default parameters. Mapping ratios were calculated using SAMtools v.1.1543.
Gene annotation and repetitive elements analysis
To annotate the assembly, three approaches were combined: ab initio prediction, homology-based protein predictions, and RNA-seq evidence. During the ab initio prediction process, the repetitive sequences dispersed in the genome were de novo predicted and compiled by RepeatModeler v.1.0.1144 (parameters: -database -engine ncbi -pa). For further analyses, RepeatMasker v. 4.1.2.p145 was used to extract and softmask the repetitive elements. The GeneMark-ET model46 was trained to predict gene models, followed by ab initio gene prediction using BRAKER2 for five rounds47 (parameters:–species = Fo–fungus–softmasking–genome–bam–prot_seq–prg = gth–gff3–rounds = 5). Then, the MAKER v.3.01.03 pipeline48, a genome annotation and data management tool, was applied to train the SNAP49 semi-HMM model for two rounds. The gene finder AUGUSTUS50 was used to predict the built-in Fusarium genome feature model. For homology-based protein prediction, a set of protein sequences (anchored chromosome level) of Fo5176, Fol4287 and Fo47 were downloaded and combined from the Fusarium public database. For transcriptomic data, the RNA-seq reads of Fo5176 were aligned to the assembled genome using Hisat2 v.2.1.042. Reference-based assembly and de novo assembly of transcripts were also generated using Scallop (v.0.10.5)51 and Trinity (v.2.8.4)52 (parameters:–min_kmer_cov 3–normalize_max_read_cov 100), respectively. To predict the final gene model, the above transcript and protein datasets were integrated and aligned to the genome using the MAKER v.3.01.03 pipeline48. Moreover, the non-coding RNAs in the Fo5176 genome were predicted by Rfam/Infernal v.1.1.453 (parameters: ‘cmscan–cut_ga–rfam–nohmmonly–fmt 2 -clanin -tblout’) and tRNAscan-SE v.2.0.954 (parameters: ‘-E -X 20 -f -m -b -j–detail’).
Fungal effector prediction
For effector prediction, EffectorP v3.037 and SignalP v.6.038 were independently used to identify putative secreted proteins from the Fo5176 proteome, and the overlapping effectors from the two predictions were obtained. The candidate effectorome was then scanned using TMHMM v.2.039 to exclude those containing transmembrane helices, yielding the final set of predicted effectors.
Data Records
The raw sequencing data of PacBio HiFi, CLR, and RNA-seq have been deposited in the National Center for Biotechnology Information (NCBI) under the BioProject number PRJNA91052955 with the accession number of SRR2274692156, SRR2274692057 and SRR2274691858, respectively. The final assembled genome is deposited under the same BioProject at NCBI (GCA_030345115.1)59 and also in Genome Warehouse of National Genomics Data Center (https://ngdc.cncb.ac.cn/) at China National Center for Bioinformation under the accession number of GWHDOBK0000000060. The genome annotations including CDS and protein-coding regions files have been submitted to the online open access repository Figshare61.
Technical Validation
Manual correction, validation and evaluation of genome assembly
To obtain a nearly complete and error-free reference genome, we manually corrected the misassembly and removed redundant contigs using Hi-C reads alignment within Juicebox visualization62. To remove potential contamination sequences such as mitochondrial genomes and sequencing adaptors, we used megaBLAST63 to align our assembly to the species’ mitochondrial genome, common database (ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/contam_in_euks.fa.gz), (ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_euks.fa), and the nucleotide sequence database (remote mode), and found no errors. Compared to the previous assembly (NCBI: GCA_014154955.1), our gap-free assembly has reduced the gap length from 4,000 to 0 based on high-coverage HiFi, CLR, and Hi-C reads. Furthermore, all centromeres and most telomeres (20/36) (TTAGGG) are captured via StainedGlass64 and trf v.4.09.165 (parameters: 2 7 7 80 10 90 2000 -d -m -l 2), and then visualized using IGV66 (Fig. 3). A one-to-one correspondence between the previous and new assemblies showed 14,713 coding region genes via liftoff v.1.6.367 and BEDtools v.2.30.068 (parameters: intersect -wa -wb -f 0.9), which account for 86.6% of the previous assembly and 68.6% of this study’s genome. Compared to the previous version (NCBI: GCA_014154955.1), our assembly has corrected six gaps on Chr02, Chr03 and Chr11, and five major inversions on Chr03, Chr07, Chr14 and Chr18 (Fig. 4), visualized via GenomeSyn plot30.
Assessment of assembly quality and completeness
The accuracy and completeness of the genome assemblies were assessed by BUSCO (Benchmarking Universal Single-Copy Ortholog)69 and CEGMA v.2.5 (Core Eukaryotic Genes Mapping Approach)70 analyses. First, HiFi and CLR reads were mapped to the assembly using Winnowmap2 v.2.0371 with parameters ‘-W repetitive_k15.txt –t 104 -ax map-pb’. Completeness evaluation for our gap-free assembly showed BUSCO and CEGMA values of 98.9% and 99.6%, respectively (Table 2), suggesting a highly accurate and complete assembly.
Code availability
All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software and described in the Methods section, along with the versions. If no detailed parameters were mentioned for the software in this study, default parameters were used as suggested by the developer.
References
Ploetz, R. C. Fusarium wilt of banana. Phytopathology 105, 1512–1521, https://doi.org/10.1094/phyto-04-15-0101-rvw (2015).
Gordon, T. R. Fusarium oxysporum and the Fusarium wilt syndrome. Annual review of phytopathology 55, 23–39, https://doi.org/10.1146/annurev-phyto-080615-095919 (2017).
Cox, K. L. Jr., Babilonia, K., Wheeler, T., He, P. & Shan, L. Return of old foes - recurrence of bacterial blight and Fusarium wilt of cotton. Current opinion in plant biology 50, 95–103, https://doi.org/10.1016/j.pbi.2019.03.012 (2019).
Hudson, O., Fulton, J. C., Dong, A. K., Dufault, N. S. & Ali, M. E. Fusarium oxysporum f. sp. niveum molecular diagnostics past, present and future. International Journal of Molecular Sciences 22, https://doi.org/10.3390/ijms22189735 (2021).
Srinivas, C. et al. Fusarium oxysporum f. sp. lycopersici causal agent of vascular wilt disease of tomato:biology to diversity- A review. Saudi journal of biological sciences 26, 1315–1324, https://doi.org/10.1016/j.sjbs.2019.06.002 (2019).
Edel-Hermann, V. & Lecomte, C. Current status of Fusarium oxysporum formae speciales and races. Phytopathology 109, 512–530, https://doi.org/10.1094/phyto-08-18-0320-rvw (2019).
Jangir, P. et al. Secreted in xylem genes: drivers of host adaptation in Fusarium oxysporum. Frontiers in plant science 12, 628611, https://doi.org/10.3389/fpls.2021.628611 (2021).
Yu, F. et al. Genome sequence of Fusarium oxysporum f. sp. conglutinans, the etiological agent of Cabbage Fusarium Wilt. Molecular plant-microbe interactions: MPMI 34, 210–213, https://doi.org/10.1094/mpmi-08-20-0245-a (2021).
Ayukawa, Y. et al. A pair of effectors encoded on a conditionally dispensable chromosome of Fusarium oxysporum suppress host-specific immunity. Communications biology 4, 707, https://doi.org/10.1038/s42003-021-02245-4 (2021).
Ibrahim, S. R. M., Sirwi, A., Eid, B. G., Mohamed, S. G. A. & Mohamed, G. A. Bright Side of Fusarium oxysporum: secondary metabolites bioactivities and industrial relevance in biotechnology and nanotechnology. Journal of fungi (Basel, Switzerland) 7, https://doi.org/10.3390/jof7110943 (2021).
López-Díaz, C. et al. Fusaric acid contributes to virulence of Fusarium oxysporum on plant and mammalian hosts. Molecular Plant Pathology 19, 440–453, https://doi.org/10.1111/mpp.12536 (2018).
O’Donnell, K. et al. Genetic diversity of human pathogenic members of the Fusarium oxysporum complex inferred from multilocus DNA sequence data and amplified fragment length polymorphism analyses: evidence for the recent dispersion of a geographically widespread clonal lineage and nosocomial origin. Journal of clinical microbiology 42, 5109–5120, https://doi.org/10.1128/jcm.42.11.5109-5120.2004 (2004).
Ortoneda, M. et al. Fusarium oxysporum as a multihost model for the genetic dissection of fungal virulence in plants and mammals. Infection and immunity 72, 1760–1766, https://doi.org/10.1128/iai.72.3.1760-1766.2004 (2004).
Thatcher, L. F., Gardiner, D. M., Kazan, K. & Manners, J. M. A highly conserved effector in Fusarium oxysporum is required for full virulence on Arabidopsis. Molecular Plant-Microbe Interactions: MPMI 25, 180–190, https://doi.org/10.1094/mpmi-08-11-0212 (2012).
Chen, Y. C. et al. Root defense analysis against Fusarium oxysporum reveals new regulators to confer resistance. Scientific reports 4, 5584, https://doi.org/10.1038/srep05584 (2014).
Wang, L., Calabria, J., Chen, H. W. & Somssich, M. The Arabidopsis thaliana-Fusarium oxysporum strain 5176 pathosystem: an overview. Journal of Experimental Botany 73, 6052–6067, https://doi.org/10.1093/jxb/erac263 (2022).
Fokkens, L. et al. A chromosome-scale genome assembly for the Fusarium oxysporum strain Fo5176 to establish a model Arabidopsis-fungal pathosystem. G3 (Bethesda, Md.) 10, 3549–3555, https://doi.org/10.1534/g3.120.401375 (2020).
Ma, L. J. et al. Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 464, 367–373, https://doi.org/10.1038/nature08850 (2010).
Guo, L. et al. Metatranscriptomic comparison of endophytic and pathogenic Fusarium-Arabidopsis interactions reveals plant transcriptional plasticity. Molecular Plant-Microbe Interactions: MPMI 34, 1071–1083, https://doi.org/10.1094/mpmi-03-21-0063-r (2021).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome research 30, 1291–1305, https://doi.org/10.1101/gr.263566.120 (2020).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nature biotechnology 37, 540–546, https://doi.org/10.1038/s41587-019-0072-8 (2019).
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biology 25, 107, https://doi.org/10.1101/2023.03.09.531669 (2024).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics (Oxford, England) 36, 2253–2255, https://doi.org/10.1093/bioinformatics/btz891 (2020).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science (New York, N.Y.) 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. eLife 3, e03318, https://doi.org/10.7554/eLife.03318 (2014).
Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958, https://doi.org/10.7717/peerj.4958 (2018).
Wang, B. et al. Fo47-Chromosome-scale genome assembly of Fusarium oxysporum strain Fo47, a fungal endophyte and biocontrol agent. Molecular Plant-Microbe Interactions: MPMI 33, 1108–1111, https://doi.org/10.1094/mpmi-05-20-0116-a (2020).
Zhou, Z. W. et al. GenomeSyn: a bioinformatics tool for visualizing genome synteny and structural variations. Journal of genetics and genomics = Yi chuan xue bao 49, 1174–1176, https://doi.org/10.1016/j.jgg.2022.03.013 (2022).
Blin, K. et al. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic acids research 47, W81–w87, https://doi.org/10.1093/nar/gkz310 (2019).
Houterman, P. M., Cornelissen, B. J. & Rep, M. Suppression of plant resistance gene-based immunity by a fungal effector. PLoS pathogens 4, e1000061, https://doi.org/10.1371/journal.ppat.1000061 (2008).
Rep, M. et al. A small, cysteine-rich protein secreted by Fusarium oxysporum during colonization of xylem vessels is required for I-3-mediated resistance in tomato. Molecular microbiology 53, 1373–1383, https://doi.org/10.1111/j.1365-2958.2004.04177.x (2004).
Houterman, P. M. et al. The effector protein Avr2 of the xylem-colonizing fungus Fusarium oxysporum activates the tomato resistance protein I-2 intracellularly. The Plant journal: for cell and molecular biology 58, 970–978, https://doi.org/10.1111/j.1365-313X.2009.03838.x (2009).
An, B. et al. The effector SIX8 is required for virulence of Fusarium oxysporum f.sp. cubense tropical race 4 to Cavendish banana. Fungal biology 123, 423–430, https://doi.org/10.1016/j.funbio.2019.03.001 (2019).
Redkar, A. et al. Conserved secreted effectors contribute to endophytic growth and multihost plant compatibility in a vascular wilt fungus. The Plant Cell 34, 3214–3232, https://doi.org/10.1093/plcell/koac174 (2022).
Sperschneider, J. & Dodds, P. N. EffectorP 3.0: prediction of apoplastic and cytoplasmic effectors in fungi and oomycetes. Molecular Plant-Microbe Interactions: MPMI 35, 146–156, https://doi.org/10.1094/mpmi-08-21-0201-r (2022).
Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. Journal of molecular biology 340, 783–795, https://doi.org/10.1016/j.jmb.2004.05.028 (2004).
Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of molecular biology 305, 567–580, https://doi.org/10.1006/jmbi.2000.4315 (2001).
Allen, G. C., Flores-Vergara, M. A., Krasynanski, S., Kumar, S. & Thompson, W. F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nature protocols 1, 2320–2325, https://doi.org/10.1038/nprot.2006.384 (2006).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics (Oxford, England) 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics (Oxford, England) 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352 (2009).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics Chapter 4, Unit 4.10, https://doi.org/10.1002/0471250953.bi0410s05 (2004).
Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic acids research 42, e119, https://doi.org/10.1093/nar/gku557 (2014).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108, https://doi.org/10.1093/nargab/lqaa108 (2021).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC bioinformatics 12, 491, https://doi.org/10.1186/1471-2105-12-491 (2011).
Korf, I. Gene finding in novel genomes. BMC bioinformatics 5, 59, https://doi.org/10.1186/1471-2105-5-59 (2004).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics (Oxford, England) 24, 637–644, https://doi.org/10.1093/bioinformatics/btn013 (2008).
Shao, M. & Kingsford, C. Accurate assembly of transcripts through phase-preserving graph decomposition. Nature biotechnology 35, 1167–1169, https://doi.org/10.1038/nbt.4020 (2017).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology 29, 644–652, https://doi.org/10.1038/nbt.1883 (2011).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics (Oxford, England) 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic acids research 49, 9077–9096, https://doi.org/10.1093/nar/gkab688 (2021).
Wang, H. This study aimed to obtain high quality genomic sequence of Fusarium oxysporum Fo5176. BioProject https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA910529 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR22746921 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR22746920 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR22746918 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_030345115.1 (2023).
Wang, H. Fusarium oxysporum Fo5176, whole genome sequencing project. NGDC Genome Warehouse https://ngdc.cncb.ac.cn/gwh/Assembly/64001/show (2023).
Wang, H. The annotated file for Fusarium oxysporum strain Fo5176. Figshare https://doi.org/10.6084/m9.figshare.21696389 (2023).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell systems 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Camacho, C. et al. BLAST+: architecture and applications. BMC bioinformatics 10, 421, https://doi.org/10.1186/1471-2105-10-421 (2009).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics (Oxford, England) 38, 2049–2051, https://doi.org/10.1093/bioinformatics/btac018 (2022).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Robinson, J. T. et al. Integrative genomics viewer. Nature biotechnology 29, 24–26, https://doi.org/10.1038/nbt.1754 (2011).
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics (Oxford, England) 37, 1639–1643, https://doi.org/10.1093/bioinformatics/btaa1016 (2021).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 26, 841–842, https://doi.org/10.1093/bioinformatics/btq033 (2010).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular biology and evolution 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics (Oxford, England) 23, 1061–1067, https://doi.org/10.1093/bioinformatics/btm071 (2007).
Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nature Methods 19, 705–710, https://doi.org/10.1038/s41592-022-01457-8 (2022).
Acknowledgements
This project was supported by the Key R&D Program of Shandong Province (ZR202211070163) and the Natural Science Foundation for Distinguished Young Scholars of Shandong Province (ZR2023JQ010). L.G. is also supported by Young Taishan Scholars Program. We would like to thank the Bioinformatics Platform at Peking University Institute of Advanced Agricultural Sciences and Plant Science Data Center of Chinese Academy of Sciences for providing the high-performance computing resources.
Author information
Authors and Affiliations
Contributions
L.G. and L.Z. conceived and supervised the project. H.W., L.Z., X.G. and D.M. conducted the experiments; G.Y., W.C., H.W., X.W., J.S., S.Y., T.M. and S.C. performed the genomic data analysis; H.W., L.Z., D.H.A. and L.G. wrote and revised the manuscript. All authors read and prove the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, H., Yao, G., Chen, W. et al. A gap-free genome assembly of Fusarium oxysporum f. sp. conglutinans, a vascular wilt pathogen. Sci Data 11, 925 (2024). https://doi.org/10.1038/s41597-024-03763-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03763-6