Study of the whole genome, methylome and transcriptome of Cordyceps militaris

Chen, Yujiao; Wu, Yuqian; Liu, Li; Feng, Jianhua; Zhang, Tiancheng; Qin, Sheng; Zhao, Xingyu; Wang, Chaoxia; Li, Dongmei; Han, Wei; Shao, Minghui; Zhao, Ping; Xue, Jianfeng; Liu, Xiaomin; Li, Hongjie; Zhao, Enwei; Zhao, Wen; Guo, Xijie; Jin, Yongfeng; Cao, Yaming; Cui, Liwang; Zhou, Zeqi; Xia, Qingyou; Rao, Zihe; Zhang, Yaozhou

doi:10.1038/s41598-018-38021-4

Download PDF

Article
Open access
Published: 29 January 2019

Study of the whole genome, methylome and transcriptome of Cordyceps militaris

Yujiao Chen^1,2,3,4,5^na1,
Yuqian Wu^1,3,4,5,6^na1,
Li Liu^3,5^na1,
Jianhua Feng³,
Tiancheng Zhang³,
Sheng Qin⁷,
Xingyu Zhao⁷,
Chaoxia Wang³,
Dongmei Li³,
Wei Han³,
Minghui Shao³,
Ping Zhao¹,
Jianfeng Xue³,
Xiaomin Liu^2,4,
Hongjie Li²,
Enwei Zhao^2,4,
Wen Zhao³,
Xijie Guo⁷,
Yongfeng Jin⁸,
Yaming Cao⁹,
Liwang Cui ORCID: orcid.org/0000-0002-8338-1974^3,10,
Zeqi Zhou¹¹,
Qingyou Xia⁶,
Zihe Rao¹² &
…
Yaozhou Zhang^1,2,3,4,5,12

Scientific Reports volume 9, Article number: 898 (2019) Cite this article

4999 Accesses
58 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The complete genome of Cordyceps militaris was sequenced using single-molecule real-time (SMRT) sequencing technology at a coverage over 300×. The genome size was 32.57 Mb, and 14 contigs ranging from 0.35 to 4.58 Mb with an N50 of 2.86 Mb were assembled, including 4 contigs with telomeric sequences on both ends and an additional 8 contigs with telomeric sequences on either the 5′ or 3′ end. A methylome database of the genome was constructed using SMRT and m4C and m6A methylated nucleotides, and many unknown modification types were identified. The major m6A methylation motif is GA and GGAG, and the major m4C methylation motif is GC or CG/GC. In the C. militaris genome DNA, there were four types of methylated nucleotides that we confirmed using high-resolution LCMS-IT-TOF. Using PacBio Iso-Seq, a total of 31,133 complete cDNA sequences were obtained in the fruiting body. The conserved domains of the nontranscribed regions of the genome include TATA boxes, which are the initial regions of genome replication. There were 406 structural variants between the HN and CM01 strains, and there were 1,114 structural variants between the HN and ATCC strains.

Complete genome sequences of Streptococcus pyogenes type strain reveal 100%-match between PacBio-solo and Illumina-Oxford Nanopore hybrid assemblies

Article Open access 15 July 2020

A high-quality genome compendium of the human gut microbiome of Inner Mongolians

Article 05 January 2023

Highly accurate long-read HiFi sequencing data for five complex genomes

Article Open access 17 November 2020

Introduction

C. militaris is as highly valued in Chinese traditional medicine as the ascomycete Cordyceps sinensis (syn. Ophiocordyceps sinensis), which possesses antitumor properties¹. There are currently more than 680 documented species of the ascomycete genus Cordyceps. C. militaris, which is a pathogen of the lepidopteran insect pupae^2,3, has been successfully cultivated and grown on grain or Bombyx mori pupae. C. militaris HN is an edible fungus that was approved as the first novel food of the Cordyceps species by the Ministry of Public Health of China in 2009¹. In recent years, advanced techniques have demonstrated that the nutrients and bioactive compounds in the fruiting body of C. militaris are similar to those of the traditional Chinese invigorant, O. sinensis^4,5. Therefore, analyses of the C. militaris genome, transcriptome and methylome are important for understanding the biology of this fungus.

Of the available sequencing platforms, SMRT technology has the unique advantage of significantly longer read lengths that produce high-quality genomes⁶. Further, SMRT Iso-Seq has the great advantage of not requiring sequence assembly, thus increasing the integrity of the assembled transcriptome and the reliability of transcriptome sequencing^7,8.

Recently, using a Roche 454 GS FLX system, the C. militaris genome was assembled at a 147× coverage into 597 contigs and 33 scaffolds with a scaffold N50 of 4.6 Mb and a total genome size of 32.2 Mb; however, due to the limitations of the sequencing technology used, several gaps remain in the assembled genome⁹. Using SMRT sequencing and Optical Mapping, the fungal genome of V. dahliae was assembled at the chromosome level¹⁰. To date, the genomes of 12 fungal species have been assembled at the chromosome level using SMRT sequencing^11,12,13. In addition, the genome of the ATCC strain of C. militaris with 7 contigs has been reported¹⁴.

DNA methylation is among the most common forms of DNA modification in prokaryotic and eukaryotic genomes. DNA methylation has various effects on fundamental biological processes, including the silencing of transposable elements (TEs) and the regulation of chromatin structure, gene expression, genetic recombination and sexual development^15,16,17. Bisulfite sequencing (BS-Seq) and SMRT technology have been widely used in the sequencing of the genomes and methylomes of fungi^18,19,20. Based on the CM01 genome database⁹, the methylome of C. militaris at a single-base resolution has been used to assess the DNA methylation patterns during sexual development using genomic BS-Seq¹⁷. The results showed that approximately 0.40% of cytosines are methylated, which is similar to the DNA methylation level during asexual development (0.39%). More recently, in a study using SMRT technology, up to 2.80% of all adenines were methylated in 16 early-diverging fungi and N6-methyldeoxyadenine (6 mA) was identified as a widespread epigenetic marker in early diverging fungi that is associated with transcriptionally active genes²¹.

In this study, the genome, transcriptome and methylome of the C. militaris HN strain were assembled and analyzed. The genomic nontranscribed region structures were identified. The methylation types of genomic DNA on all four nucleotides were detected using high-resolution LCMS-IT-TOF. These results provide a new approach to performing relevant genomic studies.

Results

Sequencing and assembly of the C. militaris genome

We assembled the genome using the Hierarchical Genome Assembly Process 3 (HGAP3) of SMRT⁶. More than 300× coverage of the C. militaris genome was achieved, with an average polymerase read length of 14 kb. The C. militaris genome was assembled into 14 contigs, and the total genome size was 32.57 Mb. The contig sizes ranged from 0.35 to 4.57 Mb, and the contig N50 was 2.86 Mb (Fig. 1, Table 1). Of the 14 contigs, contigs 1, 9, 10 and 12 contained GGGTAA or TTACCC telomeric repeat sequences of approximately 120 bp in length on both ends, indicating that the four contigs were complete chromosomes. Eight additional contigs contained telomeric repeat sequences on the 5′ or 3′ end (Fig. 1a). The distribution of DNA methylation is shown in Fig. 1b,c. The GC content in the C. militaris HN strain genome was 51.5% and was not evenly distributed among the individual contigs (Fig. 1d). Contig 14 was unique in terms of its GC content, and 2/3 of the contig had less than 40% GC content. Additionally, the frequency of repeat sequences was higher in regions with a lower GC content combined with a lower frequency of coding sequences (Fig. 1d–f). Such regions may function as gene regulatory regions or chromosomal regions with an ultra-complex structure. The C. militaris genome has many genome duplications greater than 5 kb (Fig. 1g).

Table 1 Assembly summary statistics of C. militaris HN compared with the ATCC 34164 and CM01 C. militaris genomes.

Full size table

We compared our genome database with the database of the CM01 strain of C. militaris from the Roche 454 GS FLX platform⁹, and the number of contigs in the genome was reduced from 594 to 14. N50 and the genomic size increased 26-fold and by 0.3 Mb. As shown in Table 1, the average gene length increased by 128 bp, the protein coding genes increased by 411 bp and the average intergenic length decreased by 226 bp. We also compared our genome with the recently released genome of an ATCC strain sequenced by PacBio sequencing technology; as shown in Table 1, the genomic size decreased by 1.05 Mb, the number of genes increased by 808, the average intergenic length decreased by 434 bp, and the number of exons increased by 3,637.

SMRT sequencing of the C. militaris genome revealed many interchromosome translocations from the shotgun CM01 sequencing database (Fig. 2a,c). As shown in Fig. 2a,c, contig 3 of the HN strain genome is composed of scaffolds 1, 5 and 7 from the CM01 genome; contig 4 is composed of scaffolds 1, 5 and 6; contig 7 is composed of scaffolds 1 and 7; and contig 8 is composed of scaffolds 1, 6, 7 and 10. Contig 4 is a part of scaffold 4 with an inverse direction. The coverage distribution of the genome and transcriptome sequencing were also investigated. Compared with the other contigs, contigs 5 and 11 had lower coverage, suggesting that these two contigs have distinct spatial structural features. Furthermore, this finding suggests that the gaps in the genome are not due exclusively to random repeat sequences or a high GC content, and many unknown factors must be considered (Fig. 2b). The translocations between the genome of the HN and ATCC strains were also investigated, and we found that contig 3 of the HN genome existed as an inverted duplication (Supplement 1).

DNA methylation analysis in the genome of the C. militaris HN strain

The methylome and its distribution on the 14 contigs of the C. militaris genome were also determined by SMRT sequencing (Fig. 3). Two major types of methylation, including m4C and m6A, were identified, and their distribution patterns are shown in Fig. 4. The distributions of the methylated nucleotides among the different contigs are shown in Table 2. In total, 0.016% and 0.085% of m6A and m4C were observed in contigs 1 and 13, respectively, while contig 14 contained 0.032% of m6A and 0.042% of 4mC. An in-depth analysis of the m6A methylation motifs in the contig showed that GA is the most common motif, accounting for 80% of all methylation sites, including GAG, GGA and GGAG at 6%, 23% and 17%, respectively (Fig. 4c). The GO and KEGG annotation information for the methylated genes and the top 14 GO enrichment terms are shown in Fig. 5.

Table 2 Distributions and methylation motifs in 14 contigs in HN.

Full size table

Genomic DNA methylation detected by LCMS-IT-TOF

To determine whether all 4 nucleotides were methylated in the C. militaris genome, the molecular weight of each nucleotide in the C. militaris genomic DNA was determined by performing high-resolution mass spectrometry. Each nucleotide in the genomic DNA was isolated by performing large-scale HPLC. The eight fractions are shown in Fig. 6a. The molecular weight of the separated nucleotides was determined by performing LCMS-IT-TOF. The results are shown in Fig. 6b; four types of molecular weights were confirmed among the methylated nucleotides, demonstrating that the types of methylated nucleotides in the genomic DNA included not only m4C or m6A but also mG or mT.

Analysis of the C. militaris transcriptome

We performed the initial data processing using a SMRT analysis 2.3.0 Iso-Seq pipeline. From 5 SMRT cells, we produced 5.39 Gb of raw data, with mean read length of insert 1,037 bp to 1,814 bp (Supplement 2). The Iso-Seq pipeline produced 42.0 Mb of polished high-quality consensus isoforms and 26.2 Mb of polished low-quality consensus isoforms. The high-quality consensus isoforms, which covered 8,132 gene loci with 3,756 loci, had more than two isoforms, a maximum length of 5,889 bp, a median length of 1,176 bp, a mean length of 1,275 bp, an N50 length of 1,520 bp and a total number of 31,133 transcripts. BUSCO analysis showed that the transcriptome covered 1,030 (78.3%) of the universal orthologs in Ascomycota, indicating that many genes were silenced in the fruiting stage. In contrast to Illumina RNA-Seq, PacBio Iso-Seq does not require assembly to obtain the full-length transcripts; thus, the errors caused by the short-read assembly are reduced and the integrity and reliability of the transcriptome are improved. A violin plot was generated to show the size of the fruit body. The PacBio set of full-length transcripts was between 350 bp and 2,500 bp (Fig. 7a). Compared with the Illumina RNA-Seq set, the PacBio Iso-Seq set produced more isoforms with additional splicing gene loci. This advantage of PacBio Iso-Seq allows for the direct generation of full-length transcripts and avoids the misassembly of multiple similar isoforms into one transcript. For example, the Cm02g002286.1 gene has an antisense transcript (Cm02g002610.1) that was annotated to produce a single transcript but was found to generate 35 splice variants, as shown in Fig. 7b,d. In addition, 355 lncRNAs with two or more exons and larger than 300 bp were identified and compared with coding transcripts that exhibited shorter sequences (Fig. 7c). Alternative splicing (AS) plays a crucial role in fungal development as well as stress responses; however, alternative splicing events in C. militaris are poorly understood. Both IR and ES events were identified in the Cm01g001055.1 gene (Fig. 7e). Additionally, untranslated regions (URT) were extended by PacBio Iso-Seq (Fig. 7f), resulting in 4,418 (43.8%) genes with either an extended 5′-UTR or 3′-UTR and 2,309 (22.9%) genes with both UTRs extended. We detected 4,000 AS events from the Iso-Seq reads (Figs. 8), and 1,337 gene loci were involved in the AS events. Intron-retain (IR) events occurred in 3.127% (1,485/4,000) of the reads and were the most frequent AS events in C. militaris, whereas only 40 exon-skip events (ES) were detected. We also identified 67 potential polycistronic transcripts, including 61 gene loci involved in read-through transcripts. Protein-coding mRNAs with general functions (class R) are the most abundant protein-coding mRNAs, and their number approached 3,000, accounting for 28.7% of all predicted proteins identified using KOG annotation (Supplement 3). The pyrimidine metabolic pathway in the C. militaris fruiting body is shown in Supplement 4. These proteins are all involved in house-keeping functions in the fungus. In addition, 632 proteins were related to the biosynthesis, transport and catabolism of secondary metabolites. Approximately 25% (2,490/10,095) of the genes were annotated in the KEGG database²² and were distributed in 66 pathways. Of these genes, 769 genes were involved in metabolic pathways, 106 genes were involved in carbon metabolism, 98 genes were involved with ribosomal proteins and 98 genes were involved in RNA transport (Supplement 5).

Structure of the nontranscribed regions

The distribution of the transcribed genes in the fruiting body is shown in Fig. 9a. In total, 6,881 nontranscribed regions were identified with an average length of 2.7 kb; the longest region was 80.7 kb. Of the nontranscribed regions, 182 regions were 5–10 kb and 18 regions had >10 kb repetitive sequences with >90% homology. Of the >10 kb homologous fragments, most fragments were mainly adjacent to the two ends of the contigs, whereas the 5–10 kb repeats were distributed throughout each contig. Further analysis of the >50 kb nontranscribed regions among the 6 contigs identified seven regions larger than 7 kb that were homologous repeats. Two repeats were located in contig 1, and the remaining six repeats were distributed in contigs 4, 6, 8, 9 and 11. In addition, 9 homologous sequences (>10 kb) existed within the <50 kb nontranscribed regions. An alignment of these 16 repeats indicated that 71.1% of the sequences were conserved and were AT-rich (>87%). A more detailed analysis showed that the structure of the repeats was palindromic (Fig. 9b). We also found 5–8 bp TATA motifs within those regions; the sequences and frequencies of the top 5 motifs are shown in Fig. 9b.

Structural variants in the HN strain compared with the CM01 and ATCC 34164 strains

To examine the genetic variations between the HN and CM01 strains, whole-genome alignment was performed using MUMMER²³, and many structural variants (SV) were identified according to an assembly based on the SV detection tool Assemblytics²⁴. As summarized in Table 3, we identified 1761 insertions, 561 deletions, 8 tandem expansions, 19 tandem contractions, 77 repeat expansions and 215 repeat contractions ranging from 2 bp to 10 kb between the HN and CM01 strains; the size distribution of these structural variations is depicted in Fig. 10. The SV between the HN and ATCC 34164 strains was also examined and 22,158 insertions, 21,130 deletions, 3 tandem expansions, 8 tandem contractions, 322 repeat expansions and 301 repeat contractions ranging from 2 bp to 10 kb were identified. Additionally, 21,885 insertions, 22,408 deletions, 5 tandem expansions, 4 tandem contractions, 336 repeat expansions and 454 repeat contractions, were identified between the CM01 and ATCC 34,162 strains.

Table 3 Size distribution of the structural variants in the SMRT assembly relative to the CM01 genome.

Full size table

Discussion

We used SMRT sequencing technology to assemble the complete genome of the C. militaris HN strain, which is 32.57 Mb in size with 14 chromosomes, at the chromosome level, significantly improving our knowledge of the genome.

The genome of the ATCC 34164 strain of C. militaris, a strain isolated from butterfly pupae, has 7 contigs, four of which have telomeric repeats (GGTAA or TTAGGG) on either the 5′ or 3′ end of the contig¹⁴. The genome of Cordyceps guangdongensis has 9 scaffolds and a genome size of 29.05 Mb²⁵. The haploid genomes of C. militaris and Cordyceps subsessilis both contain seven chromosomes. However, in our study, 4 contigs had telomeric sequences on both ends and the other 8 contigs had telomeric sequences on the 5′ or 3′ end, suggesting that the actual number of chromosomes in C. militaris needs to be further verified by karyotype analysis. These three public strains were isolated from different insect hosts, and they vary in the number of repeats, the GC content, and gene numbers, providing us with valuable resources for a fungi-insect host interaction and relationship study.

The genome of the C. militaris HN strain was determined to have both MAT 1–1–1 and MAT 1-1-2 mating-type genes on contig 3, while there were no MAT 1-2-1 mating-type genes in our present assembled genome and raw subreads, supporting the notion that C. militaris is heterothallic (Supplement 6). A previous study showed that both the MAT 1-1- and MAT 1-2-containing isolates are able to fruit. The materials used for genome sequencing may have come from asexual fruiting bodies and are consistent with a relatively low heterozygosity rate by GenomeScope analysis²⁶ (Supplement 7).

We obtained 31,133 high-quality transcripts, which covered 8,132 gene loci, with 3,756 loci having more than two isoforms. In contrast, a previous study showed that 9,010 genes can be mapped in the fruiting body by Illumina RNA-Seq²⁷. The 878 genes that could not be mapped will be studied in the future, and the two technologies will be compared. AS is an important mechanism for regulating gene expression and generating proteome diversity^27,28,29. In this study, 1,337 (13.2%) genes associated with AS were detected in the fruiting body, while 368 (3.6%) genes in the same tissue were detected by Illumina RNA-Seq, suggesting that Iso-Seq may increase the number of AS events that are detected. The AS rate of C. militaris was much lower than those of animals and plants; these results are similar to those of a previous study in Fusarium graminearum³⁰. Furthermore, 352 AS genes were annotated with KEGG pathway information. These results suggest that stage-specific AS genes might have important functions in fungi development. Widespread polycistronic transcripts in several Agaricomycetes were identified by SMART Iso-Seq³¹, involving up to 8% of the transcribed genes. In our study, 67 potential polycistronic transcripts, including 61 gene loci that were involved in read-through transcripts, were discovered. However, the function of these polycistronic transcripts requires further experimental characterization. This finding suggests that polycistronic transcripts may be a conserved feature throughout the fungal transcriptomes.

Using the genome and transcriptome data, we obtained the complete, high-quality nontranscribed region. The longest region in the nontranscribed region can reach over 80 kb. By analyzing the structural features of the DNA in the nontranscribed regions, 5–8 bp TATA motifs within these regions were found. TATA-box and Initiator (Inr) elements are two main key cis-regulatory elements within a core promoter³², suggesting that nontranscribed regions are the starting regions of genomic DNA replication and may function as regulatory elements to control gene expression. These regions exhibit the structural characteristic of having high AT content; thus, the double helix structure of the DNA can be easily opened³³.

A genome-wide methylation map was constructed using SMRT. The methylation characteristics of C. militaris were mainly in the form of m6A and m4C, with methylation rates of 0.0164% and 0.0846%, respectively. In addition, many other DNA modification patterns were observed in the genome at a modification rate of 2.0017%. However, previous reports indicated that in fungi that have genomic 5-methylcytosine (m5C), only repetitive DNA sequences are methylated³⁴. Therefore, many unknown forms of DNA modification remain to be explored. This difference may be due to variations in sequencing technologies, and it is worthwhile for us to discover new forms of methylated nucleotides.

In 1980, HPLC was used to detect and analyze methylation levels in DNA samples³⁵. To detect and analyze DNA methylation in depth, we obtained a sufficient quantity of genomic DNA from C. militaris by performing a large-scale extraction, and then, many single nucleotides were prepared using large-scale separations. Using high-resolution LC-MS to analyze the molecular weights of the four nucleotides in the C. militaris genome, we discovered that four types of nucleotide methylation existed in the genomic DNA, especially the methylation of thymine, which proved its existence for the first time. Thus, all four nucleotides were likely methylated in the genomic DNA from C. militaris. This result may provide favorable evidence and new ideas for studying genomic DNA modifications. It also provides indirect evidence that supports the existence of a large number of unknown DNA modifications based on the PacBio methylation assay.

Large-scale interchromosomal translocation events were detected in the whole-genome alignments among the paired genomes of the HN, CM01 and ATCC strains. An in-depth investigation of the translocation breakpoint revealed transposable elements (TEs) and the composition of the flanking sequence of the translocation breakpoint, suggesting that TEs play a crucial role in driving genomic plasticity. In total, 2,816 structural variants were identified using an assembly-based SV detection tool. The translocation and structural variants identified herein contributed significantly to our understanding of the complexity of insect-pathogenic fungus biology and the biosynthesis pathway of pharmacologically active compounds.

In conclusion, our study provides genome, transcriptome and methylome data for a new strain of C. militaris, paving the way for research that comprehensively assesses genetic variation at all size scales and methylation at a single-base resolution. The methylation motifs of m6A and m4C in the genome of the HN strain of C. militaris were analyzed, and the four methylated nucleotides were identified. Through the transcriptome obtained from Iso-Seq, many unknown RNA splicing patterns were discovered. At the same time, there are many conserved TATA-box structures in the nontranscribed regions of the genome. The results will provide a basis for further research on the molecular biology of fungi.

Methods

Fungus strain and maintenance

The C. militaris strain Haining (HN) was isolated from a single spore by Zhejiang Chinagene Biomedical Co. Ltd and was identified by the Institute of Microbiology Chinese Academy of Sciences³⁶. The culture was maintained on either artificial medium or silkworm pupae at 23 °C. C. militaris was cultured for 90 days in our laboratory, and the fruiting bodies were used for the extraction of the genomic DNA and total RNA.

Genomic DNA extraction

The C. militaris genomic DNA was extracted using the sodium dodecyl sulfate (SDS)-phenol method. First, the C. militaris fruiting body was lysed with 3% SDS (0.1 M Tris-HCl (pH 8.0), 0.5 M NaCl, 0.05 M EDTA, 3% SDS) and proteinase K at a final concentration of 50 μg/ml was added to the mixture, which was incubated at 65 °C for 12 hours. After centrifugation at 10,000 rpm for 10 min, the supernatant was extracted three times with an equal volume of 0.1 M Tris-phenol (pH > 7.5). The flocculated DNA was obtained by adding 2.5 volumes of ethanol to the supernatant at 4 °C for 30 min after centrifugation at 10,000 rpm for 10 min, and then, the DNA was dissolved in H₂O and digested with RNase A for 30 min; the solution was re-precipitated with 70% ethanol. Finally, the DNA was purified using a PowerClean DNA cleanup kit (MoBio, Carlsbad, CA). The quality of the extracted DNA was checked using 0.7% agarose gel electrophoresis and was determined using a NanoDrop spectrophotometer and quantified using Qubit (Thermo Fisher Scientific). The extracted DNA was stored at −80 °C until further analysis.

DNA library preparation and sequencing

A large-insert PacBio library was prepared using a SMRTbell™ Template Prep Kit version 1.0 (Pacific Biosciences) according to the manufacturer’s instructions. In brief, the fungal DNA was sheared to a targeted size of approximately 20 kb using g-TUBEs (Covaris, Inc., USA). The sheared genomic DNA was subjected to DNA damage repair/end repair and blunt-end adaptor ligation, followed by exonuclease digestion. The purified digestion products were loaded onto pre-cast 0.6% agarose gels for a 7–50 kb size selection using a BluePippin Size Selection System (Sage Science), and the recovered size-selected library products were purified using 0.5× pre-washed PB AMPure beads (Beckman Coulter). The library concentration was determined using a Qubit 2.0 Fluorometer (Life Technologies). The libraries were sequenced using P6C4 polymerase and chemistry on a PacBio RS II instrument with 240 min movie times at Tianjin Lakeside Powergene Science Development Co. Ltd. (Tianjin, China). In total, 13 SMRT Cells were used to yield 10.8 Gbp.

Total RNA extraction, Iso-Seq library preparation and PacBio sequencing

Total RNA was isolated using a UNIQ-10 column TRIzol total RNA extraction kit (Sangon Biotech) according to the manufacturer’s instructions, followed by treatment with DNase I. The mRNA was purified by a poly T column separation and stored at −80 °C until further analysis. The Iso-Seq library was prepared according to the PacBio Isoform Sequencing protocol (Iso-Seq™). The RNA was reverse transcribed using a SMARTer® PCR cDNA Synthesis Kit and was PCR amplified using KAPA HiFi PCR Kits. These cDNA products were purified using a SMRTbell DNA Template Prep Kit 3.0 for library construction. The libraries were sequenced using P6C4 polymerase and chemistry on a PacBio RS II platform with 240 min movie times at Tianjin Lakeside Powergene Science Development Co. Ltd. In total, 7 SMRT Cells were used to generate 4.4 Gbp of transcriptome cDNA sequencing data.

De novo genome assembly

The de novo assembly of the whole C. militaris genome was performed using the RS_HGAP_Assembly.3 protocol implemented in SMRT Analysis Portal 2.3.0.p5⁶ (Supplement 8). All parameters were set to the default settings with the following exceptions: subread length = 9,000; minimum seed read length = 11,000; genome size 35,000,000; and target coverage = 30. The filtered reads were mapped to the contigs using Blasr³⁷ and the contigs were polished using Quiver⁶ to generate a high-quality genome and then visualized using the Integrative Genomics Viewer (IGV)³⁸.

Repeat and noncoding RNA annotation

The telomeric repeats and tandem repeats were identified using Tandem Repeat Finder (v. 4.07b)³⁹. Known transposable element repeats were annotated using RepeatMasker (v. 4.0.7) and RepeatProteinMasker⁴⁰ to search against the Repbase library (Repbase Library 20150807)⁴¹. The de novo transposable element prediction was performed using RepeatScout (version 1.0.5)⁴⁰. The combined results generated the comprehensive C. militaris TE database. The noncoding RNA, including rRNA and tRNA, were predicted using rRNAmmer 1.2⁴² and tRNAscan1.23⁴³.

Gene prediction and functional annotation

The gene prediction was performed using the MAKER (version 2.31.8) pipeline. All RefSeq protein sequences in Hypocreomycetidae were downloaded from GenBank and used as protein evidence in MAKER. The EST sequence from C. militaris and the high-quality Iso-Seq full-length CDS set were combined and used as EST evidence. First, we used Augustus, trained for Fusarium graminearum, and GeneMark-ES and SNAP, trained for Caenorhabditis elegans, for the ab initio gene prediction. Based on these MAKER results, we trained the Augustus and SNAP gene prediction model. Next, MAKER was run using the in-house training Augustus and SNAP parameters, and a gene set was generated as the gene models of the C. militaris genome. The gene models were functionally annotated using the NCBI nonredundant (NR), UniProt⁴⁴, GO, COG, and KEGG²² databases. Matches with an e-value <1e-5 and >40% sequence identity were selected. The gene families were established using the Interpro database using BlastProDOM, HMMPIR, HMMPfam, SuperFamily, SignalPHMM, and HMMPanther⁴⁵. The secondary metabolite genes and gene clusters were predicted using both AntiSMASH, fungal version 4.0.0 and SMURF (accessed June 2017)^46,47.

Iso-Seq data analysis

The standard RS_IsoSeq. 1 protocol (SMRT Analysis 2.3.0p5) was used to process the raw sequencing data. In summary, the ROIs were generated and separated into full-length and non-full-length ROIs using ‘pbtranscript.py classify’. The full-length ROIs were clustered and assembled into consensus sequences by performing isoform-level clustering using an ICE algorithm with estimated cDNA sizes between 1–2 kb. Subsequently, the consensus sequences were polished based on the non-full-length ROIs and categorized as HQ (above 99% accuracy) or LQ full-length polished consensus transcripts using Quiver. All high-quality (HQ) transcripts were mapped to the C. militaris genome using GMAP with the parameters ‘–cross-species -B 5 -K 8000 -t 40 -f 2 -n 1’ and filtered for a >99% alignment coverage and >85% alignment identity⁴⁸. The above GFF3 format was transferred into the GTF format using an in-house python script. Then, the alternative splicing (AS) events were identified based on the above GTF file using the ASTALAVISTA algorithm⁴⁹. High-quality (HQ) transcripts that could not be aligned were considered novel transcripts. The long noncoding RNAs (lncRNAs) were identified as described in our previous study⁵⁰. The genome‐wide detection of base modifications was performed using the “RS_Modification_and_Motif_Analysis.1” protocol (SMRT Analysis 2.3.0p5 with the default parameter settings; the C. militaris genome was used as a reference, and only unambiguously mapped reads were used for the base modification detection. Then, we further filtered the modified sites with a less than 50× coverage and a quality value (QV) score less than 20. For each m6A and m4C, we extracted 2 bp from the upstream and downstream sequences. MEME-ChIP⁵¹ was used to identify the motifs in each group.

LC-MS analysis of base methylation types (m6A and m4C)

Based on the approximately 0.1% methylation rate in the genome, we used single-clone HN 30 kg to extract the genomic DNA. In total, 30 g of genomic DNA were obtained. The DNA was digested by DNase P1. Then, we used Agela’s FLEXA HPLC purification system with a chromatographic column as follows: X-AMIDE, 10.0 × 250 mm; and Venusil XBP-C18. The separated products were dried at an ultra-low temperature. The sample was concentrated by a rotary evaporator and dissolved in water. The sample was separated, and the molecular weight was determined using a Shimadzu mass spectrometer (LCMS-IT-TOF). Methylation was identified by comparing the molecular weight with the predicted molecular weights of the methylated four types of nucleotides. The detailed protocols follow.

Genomic DNA extraction of C. militaris

Genomic DNA was extracted with 3% SDS. The 4,000 g fruiting bodies were subjected to superfine grinding using an ultralow temperature crusher at −80 °C. We added 20 L of DNA extraction buffer (0.1 M Tris HCl (pH 8.0), 0.5 M NaCl and 0.05 M EDTA, 3% SDS) and 50 µg/ml Protease K (20 mg/ml) and digested the mixture overnight at 65 °C. Isovolumetric phenol (0.1 M Tris saturated phenol, pH > 7.5) was used three times at 10,000 rpm for 10 min for the extraction. An equal volume of chloroform:isoamyl alcohol (24:1) was used twice at 10,000 rpm for 10 min for the extraction. We added 2.5 times the volume of anhydrous ethanol precipitate and mixed it well with a cryogenic static >30 min. After centrifugation at 10,000 rpm for 8 min, the precipitate was collected, washed 3 times with 75% ethanol, dried in ethanol at 20 °C and resuspended in water. The sample was checked by 0.7% agar gel electrophoresis.

Preparation of genomic DNA

We added RNase A (10 mg/ml) to a final concentration of 100 µg/ml at 37 °C and incubated for 1 hour. Isovolumetric phenol (0.1 M Tris-saturated phenol, pH > 7.5) was used at 10,000 rpm for 10 min for the extraction. An equal volume of chloroform:isoamyl alcohol (24:1) was used at 10,000 rpm for 10 min for extraction. The supernatant was collected, and we added 2.5 times the volume of ethanol for the precipitation, which occurred at −20 °C for 30 min. The sample was then centrifuged at 10,000 rpm for 8 min, and the centrifugal sedimentation was used to obtain the genomic DNA, while the supernatant was used to obtain the RNA degradation products. The sample was subjected to centrifugal precipitation with 75% ethanol, washed 3 times, blown dry, suspended in water and stored at −20 °C. The sample quality was checked using 0.7% agar gel electrophoresis.

Ultrasonication and digestion of heat-denatured DNA with DNase P1

We added the DNA to the ultrasonic cell disrupter and applied ultrasonication three times for 3 seconds. The DNA solution was adjusted to a pH of 6.5 with hydrochloric acid; then, we added ZnSO₄ to a final concentration of 2 mM in a water bath at 100 °C for 2 min and transferred the sample to a 70 °C-water bath. We incubated the sample with 20–30% (w/w) DNase P1 for 5 hours. We performed HPLC to determine whether the reaction had reached completeness. After the reaction was complete, we added EDTA-2Na to a final concentration of 10 mM to inactivate the enzyme.

Separation of DNA degradation products using the Agela FLEXA purification system and detection using a Shimadzu mass spectrometer LCMS-IT-TOF

Purification by chromatography was performed using the following: Column: X-AMIDE, 10 × 250 mm; Phase A: 0.2% acetic acid; Phase B: acetonitrile; Flow rate: 4 mL/min; UV detection wavelength: 260 nm; Sample loading: 1 mL (1 mg/mL); Elution conditions: 5% A to 23% A for 15 min; 23% A to 26% A for 5 min; 26% A to 29% A for 5 min; 29% A to 32% A for 5 min; A, 5 min; and 35% A ~ 40% A, 5 min. The separated products were concentrated in an ultra-low temperature dryer and dissolved in water. The nucleotide molecular weights were identified using a Shimadzu mass spectrometer (LCMS-IT-TOF). The MS liquid phase conditions were as follows: Column: ACQUITY UPLC BEH (2.1 × 100 mm, 1.7 µm); UV detection wavelength: 260 nm; Flow rate: 0.3 mL/min; Phase A: 0.1% formic acid; B phase: acetonitrile; Column temperature: 40 °C; Elution conditions: 2% acetonitrile isocratic elution 10 min; and load sample: 1 µL.

Whole-genome alignment and structural variation analysis

We downloaded the previously released genomes of the C. militaris CM01 strain (GCF_000225605.1)⁹ and the C. militaris ATCC 34164 strain (PRJNA323705)¹⁴ from GenBank. To identify the structural variations between the genomes, we used MUMmer to perform a whole-genome alignment using HN as a reference genome and the downloaded genomes as query genomes. Then, the Assemblytics algorithm was used to identify the structural variations in six classes of variants: insertions, deletions, tandem expansions, tandem contractions, repeat expansions and repeat contractions²⁴. Dot plots of the alignments were generated using Gepard v. 1.4⁵². The alignments of the raw SMRT genome reads to the assembled genomes were performed using Blasr; the Iso-Seq reads were aligned using GMAP⁴⁸, and we visualized the structural variations using the Integrative Genomics Viewer (IGV)³⁸.

Statistics and analysis

The Gene Ontology term analysis of the genes with methylation motifs was conducted using the GOseq Bioconductor package⁵³. We considered over-represented GO terms with a Benjamini Hochberg FDR adjusted p-value < 0.05 significantly enriched. We performed a KEGG pathway enrichment analysis of the genes with methylation sites using KOBAS 2.0⁵⁴.

Data Availability

The genome and transcriptome data from Cordyceps militaris by single molecule real time sequencing were deposited into GenBank. The GenBank number of the genome is MQTM00000000.1. The GenBank number of the transcriptome is GEZI00000000.1.

References

Ministry of Health of the People’s Republic of China The Ministry of Health on approval of C.militaris as new resources food announcement No. 3. (Ministry of Health of the People’s Republic of China, 2009).
Sung, G. H. et al. Phylogenetic classification of Cordyceps and the clavicipitaceous fungi. Studies in mycology 57, 5–59 (2007).
Article MathSciNet Google Scholar
Stensrud, O., Hywel-Jones, N. L. & Schumacher, T. Towards a phylogenetic classification of Cordyceps: ITS nrDNA sequence data confirm divergent lineages and paraphyly. Mycological research 109, 41–56 (2005).
Article CAS Google Scholar
Yang, N. N. et al. In Journal of Asian natural products research 1–7 (2018).
Chan, J. S., Barseghyan, G. S., Asatiani, M. D. & Wasser, S. P. Chemical composition and medicinal value of fruiting bodies and submerged cultured mycelia of caterpillar medicinal fungus Cordyceps militaris CBS-132098 (Ascomycetes). International journal of medicinal mushrooms 17, 649–659 (2015).
Article CAS Google Scholar
Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature methods 10, 563–569 (2013).
Article CAS Google Scholar
Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proceedings of the National Academy of Sciences of the United States of America 110, E4821–4830 (2013).
Article ADS CAS Google Scholar
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nature biotechnology 31, 1009–1014 (2013).
Article CAS Google Scholar
Zheng, P. et al. Genome sequence of the insect pathogenic fungus Cordyceps militaris, a valued traditional Chinese medicine. Genome biology 12, R116 (2011).
Article CAS Google Scholar
Faino, L. et al. Single-molecule real-time sequencing combined with optical mapping yields completely finished fungal genome. mBio 6, e00936–00915 (2015).
Article CAS Google Scholar
Liu, H. et al. Genomes and virulence difference between two physiological races of Phytophthora nicotianae. GigaScience 5, 3 (2016).
Article Google Scholar
Olsen, R. A. et al. De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping. GigaScience 4, 56 (2015).
Article Google Scholar
Tufariello, J. M. et al. The complete genome sequence of the emerging pathogen mycobacterium haemophilum explains its unique culture requirements. mBio 6, e01313–01315 (2015).
Article CAS Google Scholar
Kramer, G. J. & Nodwell, J. R. Chromosome level assembly and secondary metabolite potential of the parasitic fungus Cordyceps militaris. BMC genomics 18, 912 (2017).
Article Google Scholar
Colome-Tatche, M. et al. Features of the Arabidopsis recombination landscape resulting from the combined loss of sequence variation and DNA methylation. Proceedings of the National Academy of Sciences of the United States of America 109, 16240–16245 (2012).
Article ADS CAS Google Scholar
Greer, E. L. et al. DNA Methylation on N6-Adenine in C. elegans. Cell 161, 868–878 (2015).
Article CAS Google Scholar
Wang, Y. L., Wang, Z. X., Liu, C., Wang, S. B. & Huang, B. Genome-wide analysis of DNA methylation in the sexual stage of the insect pathogenic fungus Cordyceps militaris. Fungal biology 119, 1246–1254 (2015).
Article CAS Google Scholar
Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nature methods 7, 461–465 (2010).
Article CAS Google Scholar
Clark, T. A. et al. Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucleic acids research 40, e29 (2012).
Article CAS Google Scholar
Clark, T. A. et al. Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation. BMC biology 11, 4 (2013).
Article CAS Google Scholar
Mondo, S. J. et al. Widespread adenine N6-methylation of active genes in fungi. Nature genetics 49, 964–968 (2017).
Article CAS Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research 45, D353–D361 (2017).
Article CAS Google Scholar
Delcher, A. L., Salzberg, S. L. & Phillippy, A. M. Using MUMmer to identify similar regions in large sequence sets. Current protocols in bioinformatics 10, 10 13 (2003).
Google Scholar
Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Article CAS Google Scholar
Zhang, C., Deng, W., Yan, W. & Li, T. Whole genome sequence of an edible and potential medicinal fungus. Cordyceps guangdongensis. G3 8, 1863–1870 (2018).
Article Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article CAS Google Scholar
Yin, Y. et al. Genome-wide transcriptome and proteome analysis on different developmental stages of Cordyceps militaris. PloS one 7, e51853 (2012).
Article ADS CAS Google Scholar
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Article CAS Google Scholar
Suparmin, A., Kato, T., Dohra, H. & Park, E. Y. Insight into cordycepin biosynthesis of Cordyceps militaris: comparison between a liquid surface culture and a submerged culture through transcriptomic analysis. PloS one 12, e0187052 (2017).
Article Google Scholar
Zhao, C., Waalwijk, C., de Wit, P. J., Tang, D. & van der Lee, T. RNA-Seq analysis reveals new gene models and alternative splicing in the fungal pathogen Fusarium graminearum. BMC genomics 14, 21 (2013).
Article CAS Google Scholar
Gordon, S. P. et al. Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing. PloS one 10, e0132628 (2015).
Article Google Scholar
Basehoar, A. D., Zanton, S. J. & Pugh, B. F. Identification and distinct regulation of yeast TATA box-containing genes. Cell 116, 699–709 (2004).
Article CAS Google Scholar
Yuan, Z. Y. et al. TATA boxes in gene transcription and poly (A) tails in mRNA stability: new perspective on the effects of berberine. Scientific reports 5, 18326 (2015).
Article ADS CAS Google Scholar
Selker, E. U. et al. The methylated component of the Neurospora crassa genome. Nature 422, 893–897 (2003).
Article ADS CAS Google Scholar
Kuo, K. C., McCune, R. A., Gehrke, C. W., Midgett, R. & Ehrlich, M. Quantitative reversed-phase high performance liquid chromatographic determination of major and modified deoxyribonucleosides in DNA. Nucleic acids research 8, 4763–4776 (1980).
Article CAS Google Scholar
Chen, G., Xu, C., Gong, C. & Zhang, Y. Pharmacology of cultivated haining strain of silkworm Cordeceps militaris. Chin. J. Appl. Env. Biol. 11, 453–458 (2005).
Google Scholar
Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC bioinformatics 13, 238 (2012).
Article CAS Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nature biotechnology 29, 24–26 (2011).
Article CAS Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27, 573–580 (1999).
Article CAS Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics Chapter 4, Unit 4 10 (2009).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11 (2015).
Article Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids research 35, 3100–3108 (2007).
Article CAS Google Scholar
Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic acids research 44, W54–57 (2016).
Article CAS Google Scholar
The UniProt, C. UniProt: the universal protein knowledgebase. Nucleic acids research 45, D158-D169 (2017).
Finn, R. D. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic acids research 45, D190–D199 (2017).
Article CAS Google Scholar
Khaldi, N. et al. SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal genetics and biology: FG & B 47, 736–741 (2010).
Article CAS Google Scholar
Blin, K. et al. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification. Nucleic acids research 45, W36–W41 (2017).
Article CAS Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS Google Scholar
Foissac, S. & Sammeth, M. ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic acids research 35, W297–299 (2007).
Article Google Scholar
Wu, Y. et al. Systematic identification and characterization of long non-coding RNAs in the silkworm, Bombyx mori. PloS one 11, e0147147 (2016).
Article Google Scholar
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Article CAS Google Scholar
Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007).
Article CAS Google Scholar
Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome biology 11, R14 (2010).
Article Google Scholar
Xie, C. et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic acids research 39, W316–322 (2011).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank Professor Leland H. Hartwell for his guidance, Dr. Bo Wang for his technical assistance with DNA sequencing and Drs. Chengliang Gong and Xiaojian Zheng for the use of their fungi strains.

Author information

Yujiao Chen, Yuqian Wu and Li Liu contributed equally.

Authors and Affiliations

Human Genome Research Center, Tianjin University, Tianjin, 300309, China
Yujiao Chen, Yuqian Wu, Ping Zhao & Yaozhou Zhang
Zheng-Yuan-Tang (Tianjin) Biotechnology Co. Ltd, Tianjin, 300457, China
Yujiao Chen, Xiaomin Liu, Hongjie Li, Enwei Zhao & Yaozhou Zhang
Tianjin Lakeside Powergene Science Development Co. Ltd, Tianjin, 300309, China
Yujiao Chen, Yuqian Wu, Li Liu, Jianhua Feng, Tiancheng Zhang, Chaoxia Wang, Dongmei Li, Wei Han, Minghui Shao, Jianfeng Xue, Wen Zhao, Liwang Cui & Yaozhou Zhang
Zhejiang Chinagene Biomedicine Co. Ltd, Jiaxing, 314400, China
Yujiao Chen, Yuqian Wu, Xiaomin Liu, Enwei Zhao & Yaozhou Zhang
Guizhou Gui’an Academy of Precision Medicine Co. Ltd, Gui’an, 561113, China
Yujiao Chen, Yuqian Wu, Li Liu & Yaozhou Zhang
State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400715, China
Yuqian Wu & Qingyou Xia
College of Life Sciences, Jiangsu University of Science and Technology, Zhenjiang, 212000, China
Sheng Qin, Xingyu Zhao & Xijie Guo
College of Life Science, Zhejiang University, Hangzhou, 310058, China
Yongfeng Jin
Department of Biochemistry and Molecular Biology, China Medical University, Shenyang, 110001, China
Yaming Cao
Department of Entomology, Penn State University, PA, 16802, USA
Liwang Cui
Dynamiker Biotechnology (Tianjin) Co., Ltd, Tianjin, 300467, China
Zeqi Zhou
Tianjin International Joint Academy of Biomedicine, Tianjin, 300457, China
Zihe Rao & Yaozhou Zhang

Authors

Yujiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuqian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Tiancheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xingyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chaoxia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Han
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Shao
View author publications
You can also search for this author in PubMed Google Scholar
Ping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Enwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xijie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yongfeng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yaming Cao
View author publications
You can also search for this author in PubMed Google Scholar
Liwang Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zeqi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qingyou Xia
View author publications
You can also search for this author in PubMed Google Scholar
Zihe Rao
View author publications
You can also search for this author in PubMed Google Scholar
Yaozhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yujiao Chen extracted the compounds from the C. militaris fruiting bodies, analyzed the structure of the compounds, extracted the genomic DNA and RNA from the C. militaris fruiting bodies, conducted the reverse transcription to obtain the cDNA and wrote the manuscript. Yuqian Wu executed the genomic KEGG, metabolomic, bioinformatic analyses and wrote the manuscript. Li Liu conducted the statistical analysis using R software. Jianhua Feng participated in the comparison of the reference genome and transcriptome sequence, saved and managed the data and analyzed and summarized the bioinformatics results. Tiancheng Zhang conducted the PacBio RS II sequencing procedure, built the genomic DNA library and performed the quality analysis of the genomic DNA and RNA. Xingyu Zhao performed the genome and transcriptome assembly. Chaoxia Wang performed the methylation analysis and participated in the genome mapping. Sheng Qin analyzed the structural features of the genomic nontranscribed region. Wei Han performed the bioinformatics analysis and statistical plots. Minghui Shao performed the gene annotation and comparisons between the reference genome and transcriptome sequence. Ping Zhao analyzed the metabolic pathways of ergosterol and N6-(2-hydroxyethyl) adenosine in the C. militaris fruiting bodies, prepared the genomic single nucleotides and measured the methylation nucleotides using mass spectrometry. Jianfeng Xue conducted the gene sequencing and constructed the genomic DNA and transcriptome cDNA libraries. Hongjie Li performed the large-scale cultivation of the C. militaris fruiting bodies. Enwei Zhao participated in the large-scale preparation and degradation of the genomic DNA from the C. militaris fruiting bodies. Xiaomin Liu participated in the large-scale preparation and degradation of the genomic DNA from the C. militaris fruiting bodies. Wen Zhao prepared the genomic DNA from the C. militaris fruiting bodies. Dongmei Li assisted in the bioinformatics analysis. Xijie Guo performed the structure analysis of the nontranscribed region. Yongfeng Jin provided molecular biology advice to the Master’s degree students and summarized the results. Yaming Cao conducted the analysis of the immunological function of the monomer compound. Liwang Cui performed the methylation analysis and modified the English manuscript. Zeqi Zhou transcribed and modified the manuscript. Qingyou Xia designed and guided the bioinformatics analysis. Yaozhou Zhang designed the overall experiment, performed the genome assembly, qualified and determined the process for the extraction of the compounds from the C. militaris fruiting bodies and conducted the analysis of the methylation of the metabolic pathway, the analyses of the transcription level and its regulation and the analysis of the structured nontranscribed region. Zihe Rao determined the overall framework of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Zihe Rao or Yaozhou Zhang.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplement information

C. militaris Genomic data

C. militaris DNA methylation data

C.militaris Transcriptomic data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Y., Wu, Y., Liu, L. et al. Study of the whole genome, methylome and transcriptome of Cordyceps militaris. Sci Rep 9, 898 (2019). https://doi.org/10.1038/s41598-018-38021-4

Download citation

Received: 14 November 2017
Accepted: 19 December 2018
Published: 29 January 2019
DOI: https://doi.org/10.1038/s41598-018-38021-4

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Complete genome sequences of Streptococcus pyogenes type strain reveal 100%-match between PacBio-solo and Illumina-Oxford Nanopore hybrid assemblies

A high-quality genome compendium of the human gut microbiome of Inner Mongolians

Highly accurate long-read HiFi sequencing data for five complex genomes

Introduction

Results

Sequencing and assembly of the C. militaris genome

DNA methylation analysis in the genome of the C. militaris HN strain

Genomic DNA methylation detected by LCMS-IT-TOF

Analysis of the C. militaris transcriptome

Structure of the nontranscribed regions

Structural variants in the HN strain compared with the CM01 and ATCC 34164 strains

Discussion

Methods

Fungus strain and maintenance

Genomic DNA extraction

DNA library preparation and sequencing

Total RNA extraction, Iso-Seq library preparation and PacBio sequencing

De novo genome assembly

Repeat and noncoding RNA annotation

Gene prediction and functional annotation

Iso-Seq data analysis

LC-MS analysis of base methylation types (m6A and m4C)

Genomic DNA extraction of C. militaris

Preparation of genomic DNA

Ultrasonication and digestion of heat-denatured DNA with DNase P1

Separation of DNA degradation products using the Agela FLEXA purification system and detection using a Shimadzu mass spectrometer LCMS-IT-TOF

Whole-genome alignment and structural variation analysis

Statistics and analysis

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

Supplementary information

Supplement information

C. militaris Genomic data

C. militaris DNA methylation data

C.militaris Transcriptomic data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links