Comparative mitochondrial genomics toward exploring molecular markers in the medicinal fungus Cordyceps militaris

Cordyceps militaris is a fungus used for developing health food, but knowledge about its intraspecific differentiation is limited due to lack of efficient markers. Herein, we assembled the mitochondrial genomes of eight C. militaris strains and performed a comparative mitochondrial genomic analysis together with three previously reported mitochondrial genomes of the fungus. Sizes of the 11 mitochondrial genomes varied from 26.5 to 33.9 kb mainly due to variable intron contents (from two to eight introns per strain). Nucleotide variability varied according to different regions with non-coding regions showing higher variation frequency than coding regions. Recombination events were identified between some locus pairs but seemed not to contribute greatly to genetic variations of the fungus. Based on nucleotide diversity fluctuations across the alignment of all mitochondrial genomes, molecular markers with the potential to be used for future typing studies were determined.

Recently, the mitochondrial genome of C. militaris was reported 22 , and mitochondrial intron presence/ absence dynamics (two to eight introns per strain) was documented for the fungus 23 , indicating a high variability of mitochondrial DNA in the fungus. The objectives for this work were 1) to investigate if there are unknown insertion sites for mitochondrial introns in C. militaris, 2) to compare nucleotide variations among different mitochondrial regions, 3) to identify mitochondrial regions with high variability and determine novel molecular markers suitable for typing studies. Therefore, we assembled the complete mitochondrial genomes of eight additional C. militaris strains and performed a comparative mitochondrial genomic analysis together with three previously published mitochondrial genomes of the fungus.

Results
Overview of different C. militaris mitochondrial genomes. The mitochondrial genomes of eight additional C. militaris strains were successfully assembled in this study. Together with three strains, CM01, V26-17, and EFCC-C2, whose mitochondrial genomes were reported previously 22,23 , we performed a comparative mitochondrial genomic analysis. Each mitochondrial genome consists of two ribosomal RNA genes (rnl and rns), 27 tRNA genes, and 14 standard protein-coding genes of the oxidative phosphorylation system. Sizes of these mitochondrial genomes, however, varied from 26,535 to 33,967 bp mainly due to different numbers of introns, two to eight depending on strains (Table 1). Compared to our previous publication 23 , no novel insertion sites of introns were found. That is, mitochondrial introns of C. militaris seemed to occur only at eight possible sites, four in the rnl gene and one each in cob, cox1, cox2, and cox3 (Fig. 1). The second rnl intron (i.e., rnl-i2) was present in all strains, while each of the remaining seven introns could be absent in one or many strains (Table 1). Every intron contains an ORF encoding for either GIY-YIG homing endonucleases (for cob-i1, cox2-i1, rnl-i1, and rnl-i2), the LAGLIDADG homing endonucleases (for cox1-i1, cox3-i1, and rnl-i3), or ribosomal protein S3 (for rnl-i4). Besides introns, the rns-cox3 intergenic region (IR) also contributed to mitochondrial genome size variations. This region was 650-661 bp in two strains (EFCC-C2 and CM06) but 1,343 bp in other nine strains. A 327-bp ORF coding for a hypothetical protein was present in the longer rns-cox3 IR, but absent in the shorter rns-cox3 IR. Finally, although trivial, various small insertion/deletion (indel) regions, as depicted in the following sections, also contributed to variations of mitochondrial genome sizes.
Nucleotide variations at genic regions. Alignment of the 11 C. militaris mitochondrial genomes generated 34,067 positions, where approximately 87% comprises genic regions. Nucleotide variations at genic regions were inspected according to exonic (58% of the alignment) and intronic (29%) regions. At exonic regions of the 14 standard protein-coding genes, no indels were detected, but SNPs, from one in nad3 to 24 in nad5, occurred at all genes with exception of atp8, atp9, and nad6 (Table 2). Most SNPs are synonymous, but some from four genes (cox3, nad2, nad4, and nad5) resulted in one to five amino acid changes. For tRNA genes, only one SNP (either unique to one strain or shared by more than one strain) was detected in each of seven tRNA genes (Table 2); no any variation was found in other tRNA genes. For the two rRNA genes, both SNPs and indels were detected (Table 2). Actually, at exonic regions of all mitochondrial genes, indels were detected just from rns and rnl, and each indel was unique to one strain.
At intronic regions, no variation was found in cox3-i1, but SNPs and often indels were found in the remaining seven introns (Table 3). SNPs from introns of protein-coding genes occurred mostly at intronic ORF regions, while SNPs from introns of rnl occurred mostly at regions beyond intronic ORFs. Changes of amino acids were detected for all intronic proteins with exception of those present in cox3-i1 and rnl-i3, each of which was only  Table 1. C. militaris isolates used in this study and information on their mitochondrial genomes. a "+ " Indicates the presence of an intron or the large rns-cox3 intergenic region (IR). b The clade here referred to those classified according to intron presence/absence patterns as depicted in our previous paper .
possessed by two strains. Indels were found mainly at regions beyond intronic ORFs, but there were two exceptions. One is a 6-bp deletion found in the intronic ORF of rnl-i2 from CM09-9-24, which led to the deletion of two amino acids. The other is a truncated version of the intronic ORF of cox2-i1 from EFCC-C2, where a 62-bp sequence repeated twice in tandem and led to early translation stop. Unfortunately, mitochondrial sequences of EFCC-C2 reported by Sung (2015) could not be verified in this study. When we compared the divergence of exons and introns, exons were more divergent than introns in protein-encoding genes, while introns were more divergent than exons in rnl. When looking at all intron-containing genes, variability at intronic regions was similar to that at exonic regions (Tables 2 and 3).

Nucleotide variations at intergenic regions.
Intergenic regions only accounted for approximately 13% (5,713-6,415 bp) of C. militaris mitochondrial genome, but both SNPs and indels were frequently detected (Table 4). Some indels (e.g., those found at nad3-atp9, cox1-nad1, and cox3-nad6 IRs) were shared by more than one strain. For the rns-cox3 IR, 9 of the 11 strains possessed a large region with an additional ORF, and two strains possessed a short region lacking the ORF. Nucleotide variability at the short rns-cox3 IR, however, was higher than that at the large rns-cox3 IR. When comparing the nucleotide variation frequencies, we found that intergenic regions were more variable than genic regions (Tables 2-4).
To determine whether the above six VGs were not under positive selection, we performed the Tajima's D, and Fu and Li's D* and F* tests of neutrality for each of them. The values obtained were not significantly different from zero with exception of VG5 (Table 5). Therefore, VGs1-4 and VG6 showed no deviation from the neutral model of evolution and are probably not under selective pressure. Mutations in VG5 are probably affected by natural selection, weakening its potential use in typing studies due to its constrained variability.
The combined sequence of VGs1-4 and 6 revealed eight haplotypes among the 11 strains (Table 5). When considering the complete mitochondrial genomes, the 11 strains also belonged to eight genotypes. Phylogeny constructed using the concatenated five-locus dataset was congruent to that constructed using the whole mitochondrial genome (Fig. 4). Using designed primers (Table S3), expected bands of the five fragments can be amplified from C. militaris strains (Fig. S1).

Discussion
In this study, we assembled successfully the complete mitochondrial genomes of eight C. militaris strains and performed a comparative mitochondrial genomic analysis together with three previously published mitochondrial genomes. All mitochondrial genomes were assembled based on Sanger sequencing results, and sequence errors were reduced to a minimum level as a result of the use of high-fidelity DNA polymerase and the manual check of sequencing chromatograms. The genome alignment of 34,067 positions revealed 286 polymorphic sites in genic regions (222 at exonic regions and > 64 at intronic regions) and 160 polymorphic sites in intergenic regions among the 11 C. militaris strains (Tables 2-4). However, nucleotide variability at intergenic regions was higher than that at genic regions because intergenic regions (~13% of the whole mitochondrial genome) were much shorter than genic regions (~87%). With exception of one to five non-synonymous changes in cox3, nad2, nad4, and nad5, the majority of mutations in exonic regions of the 14 standard protein-coding genes were synonymous and located in the third codon positions ( Table 2). In contrast, intronic ORFs were prone to amino acid changes (Table 3). Overall, variation levels are similar between intronic and exonic regions.
Although we detected recombination events between rnl exons and some other coding or non-coding regions (Table S2), the overall evidence for panmictic recombination was insignificant (Table S1). Therefore, recombination seems not to contribute a lot to the above observed nucleotide variability. Because the mechanism for mitochondrial inheritance of C. militaris has not been investigated, how mitochondrial inheritance affects  mitochondrial variations cannot be determined. However, as a heterothallic fungus 6 , there are chances for the generation of heteroplasmons when two C. militaris strains with opposite mating types fuse. As a fungus with a worldwide distribution and a broad host range, C. militaris is expected to display a high intraspecific genetic diversity. Previous studies, however, failed to reveal a high intraspecific genetic differentiation based on nuclear DNA fragments 10 . The population genetic structure of C. militaris is far from clear due to lack of efficient molecular markers. By comparing mitochondrial genomes of different C. militaris strains, this study identified six mitochondrial DNA fragments (VG1-6) with potential as molecular markers. Among them, VG5 was later excluded due to its deviation from neutrality. The remaining five fragments (VG1-4 and VG6) showed the same number of haplotypes as the whole mitochondrial genome and were phylogenetically congruent to the whole mitochondrial genome. We, however, have to admit that C. militaris isolates employed in this study were all from China or Chinese companies, more SNPs might be detected when isolates from other countries were tested. Overall, these mitochondrial DNA fragments determined in this study are suitable to be used as molecular markers in future typing studies in C. militaris.   Table 4. Variations at intergenic regions of mitochondrial genes among 11 C. militaris strains. a Intergenic regions are named by adjacent protein-encoding genes and/or ribosomal genes. tRNAs present in some of these fragments were excluded during calculation as noted in this table. b For the large rns-cox3 IR and the small rns-cox3 IR, identical five tRNAs were excluded. Therefore, the total number of tRNAs was still 27. c "Remaining length" represents the length after excluding tRNAs. % was calculated based on remaining length.

Materials and Methods
Fungal cultures. Based on our previous study 23 , eight additional C. militaris strains that showed multiple intron distribution patterns (Table 1) were chosen to assemble their complete mitochondrial genomes in this study. These isolates were cultivated at 25 °C for 10 days in potato dextrose agar media with a piece of cellophane paper covering the medium surface. Mycelia were collected and used for extracting genomic DNA using the cetyltrimethylammonium bromide method 24 .

Assembly of mitochondrial genomes and sequence annotations. Based on the published mito-
chondrial genome of C. militaris CM01 23 , multiple PCR primer pairs were designed to amplify mitochondrial fragments from strains used in this study. PCRs were performed using the DNA polymerase KOD FX (TOYOBO Bio-Technology Co. LTD, Japan), and sequences of amplicons were determined by Sanger sequencing at GENEWIZ Inc. (Suzhou, China). Different mitochondrial fragments of the same strain were assembled together under the aid of overlapping sequences. Annotation of mitochondrial genomes referred to those depicted previously 23 .

Sequence alignment and nucleotide variation analysis. Mitochondrial genome sequences from 11
C. militaris strains, eight from this study plus three from previous publications, were aligned by the online program MAFFT version 7 (http://mafft.cbrc.jp/alignment/server/). Individual exonic, intronic, and intergenic regions were extracted from the alignment using a perl script selectSites.pl written by Naoki Takebayashi (http:// raven.iab.alaska.edu/~ntakebay/teaching/programming/perl-scripts/perl-scripts.html). Nucleotide variations, including parsimony informative sites, singleton sites, and indels, were summarized for each extracted region in MEGA 6.06 25 . Phylogenetic tree construction and comparison. To know if the novel molecular markers are phylogenetically congruent to the complete mitochondrial genomes, we constructed phylogenetic trees using two Gaps were considered during window sliding, but nucleotide variations at gap positions were not considered. The high nucleotide diversity at rns-cox3 intergenic region was due to ambiguous alignment as a result of presence of two different-length sequences among these genomes. The rns-cox3 intergenic region was not considered when selecting novel molecular markers. The relative positions of 14 standard protein-encoding genes, two ribosomal genes, and the six molecular markers were illustrated below the graph. datasets. One is the combined dataset of new markers chosen in the above section, and the other is the complete mitochondrial genomes but with partial rns-cox3 intergenic region excluded because of ambiguous alignment.   Table 5. Nucleotide variations and neutrality tests on the six fragments chosen as potential molecular markers. a Numbers after "± " were standard deviations. b P values are those for denying neutrality. Maximum parsimony trees were constructed using PAUP 4.0b10 (Sinauer Associates, Sunderland, MA, USA) with gaps being treated as missing data. A tanglegram was constructed from both trees using TREEMAP 3b1243 28 .

Recombination analysis.
Nucleotide accession numbers. Sequences of the eight newly assembled mitochondrial genomes of C. militaris were all submitted to GenBank under accession numbers as depicted in Table 1.  Table 1 for the information of the 11 C. militaris strains used in this figure.