Introduction

Fungi are the commonest cause of insect disease in nature and approximately 1,000 fungal species are reported to kill insects, spiders or mites1. Fungi are particularly well suited for development as biopesticides because unlike bacteria and viruses they infect insects by direct penetration of the cuticle and so function as contact insecticides. Beauveria is one of the best-known genera of entomopathogenic fungi and worldwide numerous registered mycoinsecticide formulations based on Beauveria bassiana (Bb) and B. brongniartii are used for control of insect pests2. Bb has a particularly wide host range (over 700 species) allowing it to be used against vectors of human disease and a wide range of insect pests3. For example, in China, approximately one million hectares a year are treated with Bb to control forest insects such as the pine caterpillar Dendrolimus punctatus4,5.

B. bassiana was discovered by Agostinio Bassi in 1835 as the cause of the devastating muscardine disease of silkworms6. The ability of insects to defend against Beauveria has illuminated many aspects of innate immunity with direct relevance to human immunology7. Bb is also a well-known biocatalyst in chemical and industrial applications8. The important role of Bb as a plant endophyte and antagonist of plant pathogenic fungi has only become apparent in the last 20 years9. Furthermore, as shown by their pathogenicity to soil amoeba10, at least some Bb isolates have additional unpredicted flexibility in their trophic capabilities. However, the mechanisms underlying the physiological plasticity of Bb are still poorly understood. In addition, although the sexual stage of Bb has been identified as Cordyceps bassiana11, it is very rarely observed and the role of sexuality in Bb is unknown.

Beauveria is well known for producing a large array of biologically active secondary metabolites including non-peptide pigments and polyketides (e.g., oosporein, bassianin and tenellin), nonribosomally synthesized peptides (e.g., beauvericin, bassianolides and beauveriolides) and secreted metabolites involved in pathogenesis and virulence (e.g., oxalic acid) that have potential or realized industrial, pharmaceutical and agricultural uses12. Silkworm larvae infected by Bb (batryticated silkworms), have for centuries been a traditional Chinese medicine. The medicinal potential of batryticated silkworms has been validated by modern technologies e.g., water extract of batryticated silkworms protect against β-amyloid induced neurotoxicity13.

Expressed sequence tag analyses, insertion mutagenesis and gene functional studies of Bb have already identified some of the genes involved in fungal development, virulence, detoxification, insect immune avoidance and stress responses14,15,16. To facilitate further comprehensive understanding of Bb pathogenesis and interactions between insects and plants, we sequenced the genome of Bb strain ARSEF 2860 and performed a comparative study with the sequenced genomes of ascomycete insect pathogens, Metarhizium robertsii (Mr), M. acridum (Ma)17 and Cordyceps militaris (Cm)18. The comparison revealed a common set of gene family expansions that distinguish them from plant pathogens and saprophytes, as well as species-specific gain or loss of genes that correlate with different pathogenic strategies. Transcriptional responses of Bb to insect cuticles, insect hemocoel and plant root exudates were studied using an RNA_seq technique and demonstrated modulation of genes involved in signal transduction, secreted proteins and metabolism.

Results

General features

The genome of Bb strain ARSEF 2860 was shotgun sequenced to 76.6 × coverage using a Roche 454 system and Illumina paired-end sequencing. The assembly resulted in a total genome size of 33.7 mega bytes (Mb), which is similar to that of Cm (32.2 Mb), but smaller than Mr (39.0 Mb) and Ma (38.1 Mb) (Table 1). By mapping 13,412 EST sequences to the scaffolds, the Bb genome was estimated to be 96.1% complete. The genome was predicted to encode 10,366 protein genes, which is more than Cm (9,684) and Ma (9,849) but fewer than Mr (10,582). In terms of gene density (genes per Mb), Bb has a more compact genome structure relative to the other insect pathogens (Table 1), but it is less compact than plant pathogenic Fusarium spp.19.

Table 1 Comparison of genome features between B. bassiana and other insect pathogens.

An Interproscan analysis identified 3,002 protein families in Bb (containing 7,283 total predicted proteins), which is approximately 200 more than Metarhizium spp. and Cm17,18. Bb also has a higher proportion (18.2%) of genes encoding putatively secreted proteins than Cm (16.2%), Mr (17.6%) and Ma (15.1%) (Table 1). All the insect pathogens have a 2 to 3-fold higher proportion of their genome devoted to secreted products than other Ascomycetes including plant pathogens and the mycoparasitic Trichoderma spp.20. Similar to other insect pathogens (average 17%), 17.6% of Bb proteins are in the pathogen-host interaction (PHI) database (Supplementary Table S1). Functional analysis indicated that relative to Cm and Mr, Bb has more proteins involved in cell metabolism, energy, cell cycle, transcription, transportation, signal transduction and cell differentiation, but it has fewer proteins than Metarhizium spp. involved in localization and formation of organelles, virulence and detoxification (Fig. 1A). Whole genome reciprocal analysis indicated that more than 80% of Bb genes show orthologous relationships with those of Cm and Mr and its genome harbors the fewest number of species-specific genes (Fig. 1B).

Figure 1
figure 1

Comparative genomics analysis of three insect pathogens.

(A) Functional classification and comparison of B. bassiana (Bb), C. militaris (Cm) and M. robertsii (Mr) proteins. Each circle represents the relative fraction of genes represented in each of the categories for each genome. (B) Reciprocal blast analysis of the predicted proteins among the three insect pathogens. The cut-off E value is at ≤ 1e-5.

A Blast score ratio analysis showed that Bb is more closely related to Cm than Mr (Fig. 2A). Sequence identity between Bb and Cm orthologs was 76%, as compared to 58% with Metarhizium spp. Although Bb and Cm are different from each other in conidiogenesis, infection structure formation, life cycle and host range (Fig. 2C), their relatedness is similar to that between the mycoparasitic fungi Trichoderma virens and T. atroviride (74%)20, Aspergillus fumigatus and A. niger (69%)21 and fish and humans (75%)22. A phylogenomic analysis based on 1,915 orthologous protein sequences showed that Bb and Cm diverged after a split with mycoparasitic Trichoderma spp. (Fig. 2B) and reinforced our previous analysis18, suggesting that the split between the Cordyceps spp. (including Bb) and Metarhizium lineages occurred before Metarhizium diverged from the plant endophytic Epichloë lineage.

Figure 2
figure 2

Comparative genomics and phylogenomic analyses of B. bassiana with other fungi.

(A) Scatter plots of Blast score ratio analysis of B. bassiana (Bb), C. militaris (Cm) and M. robertsii (Mr) showing that Bb is more closely related to Cm. The numbers in red in the lower left corners indicate the percentages of Bb species-specific sequences and the numbers in the upper left or lower right show the percentages of lineage-specific genes between pairs of genomes as indicated. (B) A maximum likelihood phylogenomic tree constructed using the Dayhoff amino acid substitution model showing the evolutionary relationship of Bb with other fungal species. (C) Phenotypic and morphological comparison of Bb and Cm. Panel 1, a wax moth (Galleria mellonella) larva killed and mycosed by Bb; Panel 2, Cm fruits on a silkworm (Bombyx mori) pupa; Panel 3, Bb conidial formation on a zig-zag shared conidiophore; Panel 4, Cm conidial formation on a phialide-type conidiophore; Panel 5, a Bb germ tube on a locust hind wing showing tip swelling; Panel 6, a Cm germ tube on a locust hind wing without tip swelling; Panel 7, Bb hyphal bodies harvested from cotton bollworm 48 hrs post infection; Panel 8, a perithecium produced on the Cm fruiting body.

In contrast to two Metarhizium species17, there are no obvious syntenic relationships between the genome structures of Bb and Cm (Fig. 3A). Analysis of Bb and Cm paired paralogs showing >70% nucleotide identity found a stronger C:G to T:A mutation bias in Cm than in Bb (Supplementary Fig. S1). Thus, similar to Mr17, but unlike Cm18 and Neurospora crassa23, Bb may not use repeat-induced point mutations (RIP) for genome defense against repetitive sequences. This is consistent with expanded gene families and more transposons in the Bb and Mr genomes relative to Cm and Ma (Supplementary Table S2). RIP only occurs during meiosis23, so its apparent absence in Bb suggests the sexual cycle is rare in this fungus. Unlike Cm which is specific to lepidopteran pupae18, Bb has a wide host range. The wide host range Mr also has more gene families and larger gene families than the locust specialist Ma and likewise 61 families were expanded in Bb relative to Cm (Supplementary Table S3). These included subtilisins and trypsins involved in degrading insect cuticles. Relative to other fungi, there are expansion/contraction of different protein families in Bb (Supplementary Table S4).

Figure 3
figure 3

Comparative genomic analysis.

(A) Dot blot analysis of B. bassiana and C. militaris genome structures using ordered scaffold data. (B) Distribution of paralogous genes with different levels of nucleotide similarity in B. bassiana (highlighted in red) and other fungi. (C) Syntenic relationships of the MAT loci and their flanking regions between B. bassiana (Bb), C. militaris (Cm), M. robertsii (Mr) and M. acridum (Ma). Loci in the same color show orthologous relationships.

Bacterial-like toxins

In contrast to entomopathogenic bacteria and viruses, entomopathogenic fungi infect insects via cuticular penetration and are usually assumed to lack per os infectivity24. However, the Bb genome contains many more bacterial-like toxins than other fungi (Supplementary Table S5). For example, Bb has 13 heat-labile enterotoxins compared to six in Mr and one or none in other fungi. Bb strains expressing a vegetative insecticidal toxin gene (Vip3A) from Bacillus thuringiensis (Bt) exhibited enhanced oral toxicity24. Unexpectedly, we also found that the Bb genome encodes eight genes showing similarities (<1e-20) to Bt Cry-like delta endotoxins while other fungi have at most one of these genes. This suggests that Bb may possess greater oral toxicity than other fungi. Most fungi lack genes for bacterial zeta toxin-like proteins but Bb has three and the other insect pathogens one, suggesting that insect pathogens may use the bacterial toxin-antitoxin system to control cell stasis or death25. Interestingly, the TC-like insecticidal toxins present in Trichoderma species are absent in other fungi, including the entomopathogens (Supplementary Table S5).

Proteases

The genomes of Cm and Metarhizium spp. code for many more proteolytic enzymes than do non-insect pathogens. This may reflect an increased range of functions required when infecting insects17,18. Likewise, the Bb genome also codes for significantly (P<0.005) more trypsins (23 vs. an average of 2 in plant pathogens), subtilisins (43 vs. 17), aspartic proteases (21 vs. 18) and carboxypeptidases (52 vs. 32). However, Bb and the plant pathogens have similar numbers of cysteine peptidases (47 vs. 43), threonine peptidases (20 vs. 19) and metallopeptidases (98 vs. 89) (Supplementary Table S6). The broad host range Mr has more proteases than the narrow host range Ma and Cm17,18. The dramatic expansion of proteases in Bb resembles Mr, suggesting that this is an adaptation to a broad host range. Thus, Bb has more subtilisins (43 vs. 35) and trypsins (23 vs. 12) than Cm and has four subfamilies that Cm lacks (Supplementary Table S7). Bb has two trypsins (BBA_07369 and BBA_03951) that only show high identities (>60%) to bacterial trypsins and a trypsin (BBA_02900) with a Cm homolog that clusters with insect trypsins (Supplementary Fig. S2). This is consistent with horizontal gene transfer in fungi26. However, a patchy distribution of genes can also be due to lineage specific gene loss.

Carbohydrate active enzymes

The number of glycoside hydrolases (GH) possessed by Bb (145) resembles other insect pathogens (average 141), rather than the endophyte E. festucae (98), but is significantly (P = 0.0069) less than plant pathogens (average 199) (Supplementary Table S8). All the insect pathogens lack several families of cellulases (GH6, GH7, GH12, GH45 and GH61) and other enzymes involved in degrading plant cell walls (GH11, GH30, GH51, GH53, GH62, GH67 and GH115). Compared to Metarhizium spp., Bb and Cm have fewer xyloglucosyl transferases (GH16) responsible for degrading xylan oligomers and polymeric xylan27. Unlike Cm18, Bb has a phosphoketolase required for xylose metabolism and full virulence in Metarhizium (BBA_09253 vs. MAA_04563, 41% identity)28. Thus, in contrast to Cm, Bb could germinate and grow on xylose medium, albeit very weakly when compared to Mr (Fig. 4A and 4B). Because they have fewer cellulases and hemicellulases, the number of carbohydrate-binding module 1 (CBM1) domains is significantly less in insect pathogens (average 10) than plant pathogens (25) (Supplementary Table S9). Insect pathogens also have significantly fewer putative oxidative lignin enzymes (average 29 in insect pathogens vs. 40 in plant pathogens, P = 0.0016) (Supplementary Table S10), carbohydrate esterases (9 vs. 33, P = 0.0025) (Supplementary Table S11), cutinases (4 vs. 12, P = 0.0034) and pectin lyases (8 vs. 20, P = 0.0122) (Supplementary Table S4). Cutinases and pectin lyases in particular are known to be virulence factors for plant pathogens29.

Figure 4
figure 4

Growth tests.

Conidial suspensions (2.5 μl) of B. bassiana (Bb), C. militaris (Cm) and M. robertsii (Mr) were inoculated onto minimal medium supplemented with 1% xylose (A) or 1% glucose (B) and incubated for 10 days at 25°C. In contrast to Mr, Cm failed to grow on xylose medium while Bb germinated and grew poorly.

Chitin is the second most abundant polymer in insect cuticle. The necessity to degrade it is reflected in an abundance of GH18 family chitinases in Bb (20) and the three other insect pathogens (average 19) compared to plant pathogens (average 11) (Supplementary Table S8). Fungal chitinases are subdivided into three subgroups30. Our analysis indicated that eight of the 20 Bb chitinases belong to subgroup A (without a chitin-binding domain, CBM), four belong to subgroup B (one CBM at the C-terminal) and eight are subgroup C chitinases (possessing CBM18 and CBM50 LysM chitin-binding modules) (Supplementary Table S12). Insect and plant pathogens have similar (P = 0.2258) numbers of subgroup A chitinases but the entomopathogens have many more chitinases with CBMs (average 11 vs. 2, P = 0). Phylogenetic analyses of subgroup B and C chitinases revealed that most of the gene duplication events have occurred since Bb/Cm, Metarhizium spp. and Trichoderma spp. diverged from a common ancestor, suggesting their abundance in each clade is due to convergent evolution (Supplementary Fig. S3).

Cytochrome P450s (CYPs)

CYPs are involved in many essential cellular processes and play diverse roles in detoxification, degradation of xenobiotics and the biosynthesis of pathogenesis related secondary metabolites31. Plant pathogens (average 118) and Metarhizium spp. (average 111) have more CYPs than Bb (83) and Cm (57) and 24 CYP families present in Metarhizium spp. are absent in Bb and Cm (Supplementary Table S13). These include nitric oxide reductases (CYP55, NOR) used for anaerobic denitrification32. Thus, unlike Metarhizium spp., Bb and Cm may not be able to respond to hypoxic conditions by nitrate or ammonia fermentation. Bb and Cm also lack CYP619 for biosynthesis of the tetraketide mycotoxin patulin33. Relative to Cm (57), the Bb genome encodes 1.5-fold more CYP genes (83), including 16 additional subfamilies. Conversely, there are six families present in Cm that are absent in Bb. In particular, Bb lacks CYP567 whereas Cm has three CYP567s, among which CCM_03048 and CCM_03052 are located in a putative sesquiterpene cyclase gene (CCM_03050) cluster. The fungal CYP51 system is involved in sterol biosynthesis and detoxification of azole antifungals34. The four entomopathogens each have two CYP51 genes. There are 6 ω-hydroxylase CYP52 enzymes in Bb as compared to 4 in Cm, 1 in Mr and 4 in Ma (Supplementary Table S13). CYP52s catalyze the first step in the ω-oxidation pathway of alkanes, so their presence is consistent with efficiently metabolizing insect epicuticle alkanes35,36.

Of the 83 CYPs in the Bb genome, 18 are within gene clusters involved in the biosynthesis of secondary metabolites. For example, a CYP617 protein (BBA_02632) is proximal to a non-ribosomal peptide synthetase (NRPS, BBA_02630) for bassianolide biosynthesis37. CYP5293 (BBA_09720) is within the beauvericin (NRPS, BBA_09727) biosynthetic cluster38. CYP655 (BBA_07335) and CYP623 (BBA_07336) are close to a polyketide synthase (PKS)-NRPS hybrid (BBA_07338) for tenellin biosynthesis39. Metarhizium CYP65 DtxS2 is a multifunctional enzyme involved in hydroxylation, desaturation, oxidation and epoxidation of intermediate insecticidal toxin destruxins40. Bb has a cluster, absent in Cm, containing two CYP65s (BBA_08705 and BBA_08706) and one CYP58 (BBA_08698) for biosynthesis of as yet unknown metabolite(s). The CYP6001 subfamily produce oxylipins for signaling and secondary metabolism in fungi41. Cm lacks CYP6001 but it is present as a single copy gene in Bb and Metarhizium spp.

Small secreted cysteine-rich proteins (SSCPs)

Most of the secreted effector-type proteins of plant pathogens are small (<300 amino acids) and contain four or more cysteine residues42. We surveyed and compared fungal SSCPs. In total, 396 clusters were obtained by a Blastclust analysis of Bb and 11 other fungal species. Of these, 12 clusters contain SSCPs shared by insect pathogens and the plant endophyte E. festucae, and 26 are found in insect and plant pathogens. Of the 91 clusters specific to insect pathogens, 52 contain genes from Bb. Relative to other insect pathogens (average 307), the Bb genome encodes more SSCPs (373) and many of them are species specific (154 vs. an average of 95) (Supplementary Table S14).

As with other fungi20,42, most of the entomopathogen SSCPs are of unknown function. Of 373 Bb SSCPs, only 130 contained conserved domains recognized by an Interproscan analysis. Some of these had homologs in the PHI database of verified virulence determinants, e.g. five putative cutinases and five trypsins that may be used by Bb to target plant and insect cuticle components, respectively. Lectins are used by mammalian pathogenic fungi to evade detection by host receptors43. Six Bb SSCPs were identified as concanavalin A-like lectins and potentially could function in interactions with both insects and plants. Bb has four genes encoding proteins with eight cysteine-containing extracellular membrane (CFEM) domains resembling pathogenicity determinants in plant pathogens44.

Signal transduction

The PHI dataset contains large numbers of G-protein coupled receptors (GPCRs), protein kinases and transcription factors and that have similar sequences in the entomopathogen genomes (Supplementary Table S1). Fungal GPCRs sense extracellular cues and transmit the signals to distinct trimeric G-protein subunits45. Besides the conserved pheromone receptors and cAMP receptors, most of the insect pathogen GPCRs resemble the rice-blast fungus M. oryzae Pth11-like proteins (Supplementary Table S15). Relative to Metarhizium spp. (average 47), Bb and Cm have fewer Pth11-like receptors (average 21) and lack a GPR1-like GPCR which in yeast is activated during nitrogen starvation46. Thus, Bb and Metarhizium have evolved different mechanisms for nutrient sensing.

Functional kinome analysis of the plant pathogen F. graminearum indicated that many protein kinases (PKs) are involved in fungal growth, conidiation, pathogenesis, stress responses, toxin production and/or sexual reproduction47. The insect pathogens have more PKs (average 170, P<0.05) than the plant pathogens (average 145) (Supplementary Table S4). The specialist Ma has more PKs (193) than the generalist Mr (161), which may allow a more stringent discrimination between potential insect hosts and subsequent control of cell differentiation17. Consistent with this, Bb has fewer PKs (159) than Cm (167). Like other fungi, insect pathogens have large numbers of transcription factors (TFs) (Supplementary Table S16). However, Bb has more (10) GATA-type TFs than Cm (5) and Metarhizium species (4–5). Fungal GATA-type TFs are involved in multiple functions, including nitrogen metabolism, light induction, siderophore biosynthesis, mating-type switching and chromatin rearrangement48. Overall, these results imply that signal controls vary as much between the insect pathogens as they do between insect and plant pathogens.

Cryptic sexuality

The teleomorph of Bb was identified as C. bassiana11, but its sexual reproduction is seldom observed in nature nor to date is it inducible in the laboratory. A previous PCR based analysis showed that individual isolates of Bb carried either MAT1-1 or MAT1-2 mating-type genes49. Analysis of the mating-type locus indicated that the sequenced Bb strain is the MAT1-1 type (Fig. 3C). Thus, like Cm, Bb is a heterothallic and outcrossing fungus. Syntenic analysis of Bb, Cm and the two Metarhizium species showed that except for the idiomorphic regions, the genes flanking the mating-type locus are highly conserved, especially between Bb and Cm. However, unlike Bb and Metarhizium species, Cm commonly performs sexual reproduction18. To probe the cryptic sexuality of Bb, we surveyed Bb homologs of sex-related genes that have been functionally verified in A. nidulans and N. crassa50 (Supplementary Table S17). Many genes functioning in mating processes, karyogamy, meiosis and fruiting-body development in A. nidulans and N. crassa are also present in Bb and Cm. However, a meiosis-specific topoisomerase Spo11 present in Cm (CCM_09527), A. nidulans and N. crassa is absent in Bb and Metarhizium spp. Spo11 is crucial for initiating meiotic recombination by generating DNA double-strand breaks51. Thus, lack of Spo11-like protein in Bb and Metarhizium spp. may contribute, at least in part, to an infrequent sexual cycle.

Secondary metabolism

A plethora of insecticidal and other bioactive secondary metabolites has been identified from Bb, e.g. the cyclopeptides beauvericin, bassianolide and beauverolide, the yellow pigment pyridines tenellin and bassiatin and the dibenzoquinone oosporein52. Only genes involved in the biosynthesis of beauvericin37, bassianolide38 and tenellin39 have been functionally verified. Our genome survey found that there are 45 non-ribosomal peptide synthetase (NRPS), polyketide synthase (PKS) and terpenoid synthase/cyclase core genes in the Bb genome, which is more than Cm but fewer than Metarhizium spp. (Table 2). Three of the putative biosynthesis clusters are highly conserved in the four insect pathogens but are absent in other fungi, i.e. NPRS (BBA_05020), PKS (BBA_09745) (Supplementary Fig. S4A) and terpene synthase (BBA_06542). This finding implies that the evolution of fungal entomopathogenicity may be associated with the production of some similar secondary metabolites. Otherwise, there are four clusters conserved between insect pathogens and T. reesei, including two NRPS-like proteins (BBA_06997 and BBA_04028), one NRPS (BBA_05020) and one terpene cyclase (BBA_06542) (Supplementary Fig. S4B). Two clusters are limited to Bb, Cm and T. reesei, i.e. a type I PKS cluster and a PKS-NRPS hybrid for tenellin biosynthesis39 (Supplementary Fig. S4C). Ten clusters are specifically conserved between Bb and Cm. However, the NPRS gene clusters involved in the biosynthesis of insecticidal toxins beauvericin by an iteractive NRPS BbBEAS37 and bassianolide by an iteractive NRPS BbBSLS38 are unique to the Bb genome.

Table 2 The core genes involved in the biosynthesis of secondary metabolites in insect pathogens

Biotransformation and detoxification

Bb is a well-known whole-cell catalyst in industrial applications8. We compared Bb with other fungi for enzymes putatively involved in bioconversions to determine if their presence or absence was associated with particular fungal lifestyles (Supplementary Table S18). Relative to plant pathogens, insect pathogens in general have more acyl-CoA dehydrogenases (average 14 vs. 9, P = 0.0139), fatty acid hydroxylases (average 14 vs. 5, P = 0.0002), amidohydrolases (average 14 vs. 9, P = 0.0210), glyoxalases (average 4 vs. 1, P = 0.0023) and monooxygenases (average 24 vs. 13, P = 0.0445). The acyl-CoA dehydrogenases are responsible for β-oxidation of fatty acids while amidohydrolases are involved in biotransformation of N-heterocyclic compounds53. The glyoxalase system detoxifies methylglyoxal54. Consistent with its unique sulfoxidation ability55, Bb has three aryl sulfotransferases that are absent in other fungi. Among insect pathogens, Bb has the highest numbers of epoxide hydrolases (7 vs. 4–5), nitrilases (11 vs. 7–10 in others) and monooxygenases (36 vs. 16–30 in others). Epoxide hydrolases are highly versatile biocatalysts for the hydrolysis of epoxides55. Nitrilases are valuable alternatives to chemical catalysts for biotransformation of various organic nitriles56.

Transcriptional responses of Bb to different environmental niches

Bb must overcome a number of environmental challenges in plants and insects. We used the high throughput Illumina RNA-seq method to investigate the range of Bb transcriptional responses. After sequencing, we obtained a total of 2.3 to 3.7 million tags per treatment and these were mapped to predicted genes. We determined that, of 10,366 annotated Bb genes, 68%, 52% and 78% were transcriptionally active during growth on locust hind wings (LW), in cotton bollworm blood (CB) and in corn root exudates (RE), respectively (Supplementary Table S19). As reported in previous studies17,18, individual gene expression values were normalized to the index of transcripts per million tags (TPM). Pairwise analysis of gene expression data was used to determine that large numbers of genes were significantly (P<0.05) up- or down-regulated by Bb during different treatments (Fig. 5A). For example, Bb up-regulated 1,150 genes and down regulated 2,868 genes in CB as compared to LW. When referenced to gene expression in RE, growth in CW and LW induced up-regulation of 3,302 and 4,530 genes, respectively. Of the top 100 genes expressed in CB, RE and LW, 74, 79 and 81, respectively, were not highly expressed in the other media (Fig. 5B) and more secreted proteins were in the top 100 expressed by Bb on LW (25) than in CB (15) and RE (14) (Supplementary Table S20). Of the 373 identified SSCPs, 92 were expressed on LW, 128 in CB and 156 in RE. The bacterial-like toxins were expressed at very low levels in all three environments.

Figure 5
figure 5

RNA-seq analysis of differentially expressed genes by B. bassiana adapting to different environmental niches.

(A) Summary of significantly up- and down-regulated genes between different libraries. (B) Venn diagram analysis of co-expressed genes in different libraries. The numbers in parenthesis show total expressed genes with more than two transcripts. The numbers in square brackets show those of the 100 most highly expressed genes. (C) Heat map of the transcription factors differentially expressed by Bb in different conditions. The fungus was harvested from cotton bollworm hemocoel (CB), locust hind wings (LW) and corn root exudates (RE) for RNA extraction and RNA-seq analysis (TPM, transcripts per million tags).

The analysis showed that Bb differentially expressed transcription factors (TFs) for gene regulation (Fig. 5C), GPCR genes for niche recognition (Supplementary Fig. S5A) and kinases for signal transduction (Supplementary Fig. S5B). For example, a Pth11-like GPCR (BBA_03214) and a C2H2-type zinc finger TF (BBA_00971) were highly expressed by Bb in RE (TPM = 3,163 for GPCR and TPM = 1,648 for TF), whereas only low or trace expression occurred in CB (TPM = 137 for GPCR and TPM = 18 for TF) and on LW (TPM = 4 for GPCR and TPM = 5 for TF). A functional category analysis of the highly expressed genes indicated that around one third are involved in catabolism or anabolism as defined by FunCat (Supplementary Fig. S6A). Genes involved in amino acid metabolism were more highly transcribed by Bb on LW (30%) than in CB (19%) and RE (11%). Most of these were involved in catabolism, consistent with utilization of amino acids on the cuticle surface, and/or mobilization of internal nitrogenous nutrients. Fourteen percent of the most highly expressed genes in CB were involved in protein synthesis, as compared to 6% on LW and 2% in RE. This could be associated with the quick propagation of Bb cells in insect hemocoel by yeast-type budding57. Root exudates are carbohydrate rich and 49% of the highly expressed genes in RE were for carbohydrate metabolism versus 14% on LW and 30% in CB (Supplmentary Fig. S6B). Six heat-shock proteins were among the top 100 genes expressed by Bb in RE, compared to four in CB and one on LW, suggesting that plant root exudates provide the most stressful growth conditions.

Discussion

We report here a genomic analysis of B. bassiana, one of the best-studied and most widely used insect biocontrol agents. A comparative analysis with the genome sequences of three other insect pathogens demonstrated that Bb and Cm are closely related and evolved into insect pathogens independently of the Metarhizium lineage. We assume therefore that similar expansion of certain gene families, such as proteases and chitinases, is associated with functions necessary for insect pathogenesis and reflects convergent evolution. Likewise, plant pathogens have expanded families of glycoside hydrolases, carbohydrate esterases, cutinases and pectin lyases in order to degrade plant materials58. Mammalian pathogens are enriched for aspartyl proteases and phospholipases59. Mycoparasitic fungi have expanded numbers of chitinases20.

Besides proteases and chitinases1, virulence-related genes already characterized in Bb include a MAP kinase BBSLT2 (BBA_03334) mediating cell growth60, a neuronal calcium sensor BBCSA1 (BBA_05195) regulating extracellular acidification61, a cytochrome P450 enzyme CYP52X1 degrading cuticular fatty acids (BBA_02428)36 and a GH73 family of β-1,3-glucanosyltransferase BBGAS1 (BBA_04640) maintaining cell well integrity62. All of these genes were mapped in the sequenced Bb genome. Several other experimentally verified virulence genes in Metarhizium spp. also have orthologs in Bb, e.g., a perilipin-like protein (BBA_08759, 63% identity) that controls cellular lipid storage and appressorium penetration63, an osmosensor (BBA_08887, 59% identity) to mediate adaptation to the insect hemocoel64, adhesins MAD1 (BBA_02419, 34% identity) and MAD2 (BBA_02379, 47% identity) to mediate spore adhesion to insect and plant surfaces65. The mechanism of adhesion used by Bb also involves hydrophobins that mediate cell surface hydrophobicity and virulence66. Like other insect pathogens, Bb has two class I hydrophobins but it has additional class II hydrophobins (Supplementary Table S4). The presence of these genes in both Bb and Metarhizium spp. suggests some shared strategies for interacting with plants and insects. Ninety-one of the 397 SSCP clusters are shared exclusively by the insect pathogens suggesting many shared strategies are currently unknown. However, Beauveria lacks a homolog to the collagen-like protein used by Metarhizium to evade the insect immune system67. Bb also lacks the Metarhizium dtxS1 (MAA_10043) gene cluster involved in biosynthesis of the insecticidal destruxins40. It is likely that some of the highly expressed genes unique to Bb will play important and novel roles as Bb overcomes challenges in the dynamic microenvironments it will encounter in insects. As an endophyte, Bb presumably possesses mechanisms to avoid stimulating plant defenses. Fungal endoxylanases (GH11) are known to trigger plant immune responses and induce necrosis of infected plant tissue68. Therefore, lack of GH11 in Bb and E. festucae could be an adaptation limiting induction of necrosis by these endophytes and facilitating immune evasion. Like the basidiomycete plant symbiont L. bicolor42, Bb has a large battery of SSCPs and about half of them (154/373) are species-specific, implying that many specific functions are required for specialization to insect pathogenesis and endophytism.

If the finding that asexual Aspergillus species usually arise from sexual lineages69 is broadly applicable to fungi then Beauveria spp. are probably an asexual derivation from a Cordyceps lineage. Host switching is particularly common in Cordyceps spp. accounting for their wide variety of associations with animals, plants and fungi70. Some Trichoderma spp., such as T. strigosum, have a Cordyceps teleomorph. Our phylogenomic data suggests that the insect pathogenic Cm and Bb diverged from mycoparasitic Trichoderma 74–97 million years ago (Fig. 2B), so potentially mycoparasitism could have evolved from insect pathogenicity or vice versa. The high degree of genome structure divergence between Bb and Cm is unexpected given their close phylogenetic relationship. Transposable elements (TEs) are a major force driving genetic variation and genome evolution71. Bb has many more TEs than Cm, apparently because Bb lacks the genome defense mechanism of repeat-induced point mutations. However, the genomes of Metarhizium species are highly syntenic in spite of a similar difference in the number of TEs17. Most field populations of Beauveria and Metarhizium species reproduce clonally72. In contrast, Cm readily reproduces sexually18, thereby facilitating genome structure reorganization due to frequent genetic and/or chromosomal recombination. Thus, differences in life cycle might have led to the genome structure disparities between Bb and Cm.

High throughput transcriptomics demonstrated that Bb finely tunes gene transcription to adapt to different environmental niches. Bb up-regulated proteases on LW and carbohydrate hydrolases in RE. For example, a subtilisin-like protease (BBA_00443, TPM = 230) was highly expressed by Bb on LW but not in CB or in RE (Supplementary Table S20). The insect epicuticle or waxy layer comprises a heterogeneous mixture of long-chain alkanes, wax esters and fatty acids and represents the first barrier against fungal attack. Bb highly transcribed CYP52s (BBA_02428 and BBA_09022) and lipase genes (BBA_01783 and BBA_08812) which should target epicuticular hydrocarbons and lipids36. Chitin constitutes up to 40% of the procuticle but is absent from the epicuticular layer73. As with Metarhizium17, Bb does not significantly up-regulate chitinase genes on the epicuticle. Two SSCPs (GH75 chitosanase BBA_06270 and CFEM protein BBA_09339) were highly and specifically expressed in CB (Supplementary Table S20). Potentially, chitosanase BBA_06270 could be involved in remodeling cell wall structure to evade host immune recognition74. Homologs of CFEM BBA_09339 are involved in plant pathogenesis44. Future functional verifications of these genes will benefit our understanding of fungal pathogenesis.

In conclusion, we have sequenced the genome of the well-known insect pathogenic fungus B. bassiana and used RNA-seq to generate expression profiles of Bb growing in different host or environmental niches. The resulting information will benefit future molecular studies of insect-fungus interactions and facilitate the development of Beauveria as cost-effective mycoinsecticides and microbial biocatalysts.

Methods

Fungal strain and maintenance

B. bassiana strain ARSEF 2860 was selected for genome sequencing as it shows commercial potential for biological controls of aphids, planthoppers and spider mites75. To compare fungal abilities to utilize pentose and hexose, M. robertsii strain ARSEF 23 and C. militaris Cm01 were included with Bb for growth tests on minimum medium (MM, NaNO3 6 g/l, KCl 0.52 g/l, MgSO4.7H2O 0.52 g/l, KH2PO4 0.25 g/l) amended with 1% xylose and glucose, respectively. Otherwise, fungal cultures were maintained on potato dextrose agar (Difco).

Genome sequencing and assembly

The genome of Bb ARSEF 2860 strain was shotgun sequenced using a Roche 454 GS FLX system for massively parallel pyrosequencing at the Chinese National Human Genome Center (Shanghai, China). This resulted in 930 Mb of sequence data (with an average read length of 385 bp). Assembly was performed using the Newbler software (Ver. 2.3) within the Roche 454 suite package76, which produced 1,764 contigs with a total size of 33.7 Mb. For sequence scaffolding, a DNA library of 2–5 kb inserts was generated and sequenced with an Illumina system. This resulted in 1.3 Gb of paired-end reads and by mapping these reads to contigs, 1,764 contigs were assembled into 242 scaffolds. The whole project has been deposited at DDBJ/EMBL/GenBank under the accession no. ADAH00000000.

Genome annotation

To maximize accuracy, the gene structures of B. bassiana were predicted with a combination of different algorithms17,18. The inconsistent ORFs were individually subject to Blast searches against the NCBI curated refseq_protein database and manually inspected. Previously acquired ESTs14 were used to verify and complete the predicted gene models. All predicted gene models were annotated by InterproScan analysis (http://www.ebi.ac.uk/Tools/pfa/iprscan/). The potential secreted proteins were predicted by SignaIP 3.0 (http://www.cbs.dtu.dk/services/SignalP/) and TargetP (http://www.cbs.dtu.dk/services/TargetP/) analysis. Genome repetitive elements were analyzed by Blast against the RepeatMasker library (Open 3.2.9) (http://www.repeatmasker.org/) and with the Tandem Repeat Finder (http://tandem.bu.edu/trf/trf.html). The transposases/retrotransposases were classified by Blastp analysis against the Repbase (http://www.girinst.org/repbase/) plus manual inspections. Putative Beauveria virulence factors were identified by searching against the pathogen-host interaction database (http://www.phi-base.org/about.php) with a cut-off E value of 1e-5, plus additional searches of known virulence genes reported in entomopathogenic fungi. One tail t-tests were conducted to compare the difference of protein family sizes between insect pathogens and other fungi.

Comparative genomic analysis

For genome structure comparison of Bb and Cm, the scaffolds of both genomes were oriented by MEGABLAST for dot plotting and a pair-wise comparison with an Argo Genome Browser77. A Blast Score Ratio (BSR) test19 was conducted to compare the differences between Bb, Cm and Mr genomes. The BSR index for each reference protein is calculated by dividing the query score by the reference score and normalized from 0 to 1. A score of 1 indicates a perfect match while a score of 0 indicates no Blast match of a query protein in the reference proteome. The normalized pairs of BSR indices were then plotted using the Matlab (ver. 7.0) program.

Paralogy, orthology and phylogenomic analysis

The best candidate paralogs in examined fungal genomes were identified by reciprocal Blastn analysis of the coding DNA sequences with a cut-off E value of 1e-20 and more than 60% coverage of Blast alignment length. For Bb and Cm paralogs with more than 70% identity, the paired sequences were aligned with ClustalW and the nucleotide mutation ratios were estimated and compared. Ortholog conservation in fungi was characterized with Inparanoid 7.0 (http://inparanoid.sbc.su.se/cgi-bin/index.cgi). Corresponding orthologous protein sequences were aligned with Clustal X 2.0 and the concatenated amino acid sequences were used to generate a maximum likelihood phylogenomic tree with the program TREE-PUZZLE78 using a Dayhoff model. Based on the constructed phylogenomic tree, protein family size variation (expansion or contraction) between Bb and Cm was analyzed using the program CAFE79 by referencing against the most closely related species, T. virens.

Protein family classifications

To identify the gene clusters and their proteins responsible for the biosynthesis of secondary metabolites, the whole genome data set was subject to analysis with the programs SMURF (http://jcvi.org/smurf/index.php) and antiSMASH (http://antismash.secondarymetabolites.org/) with default settings. The families of proteases were identified by Blastp searching against the MEROPS peptidase database Release 9.4 (http://merops.sanger.ac.uk/) with a cutoff E value of 1e-100 plus manual inspections with the InterproScan analysis results. Fungal trypsins were selected for phylogenetic analysis with the program MEGA 5.080 using a Dayoff model, 1,000 replicates for bootstrap analysis and a pairwise deletion for gaps or missing data. The cytochrome P450s were named according to the classifications collected at the P450 database (http://blast.uthsc.edu/). Kinases were classified by Blastp against the KinBase (http://kinase.com/) with a cutoff E value of 1e-30. Carbohydrate-active enzymes were classified by local Blastp searching against a library of catalytic and carbohydrate-binding module enzymes (http://www.cazy.org/). G-proteins were identified from PFAM domain scanning. G-protein coupled receptors were selected from the best hits to GPCRDB sequences (http://www.gpcr.org/7tm/) and by confirmation that they contained seven transmembrane helices with the N-terminus outside and the C-terminus inside the plasma membrane. To identify the small secreted cysteine-rich proteins, the proteins less than or equal to 300 amino acid with secreted signals obtained above and those containing four or more cysteine residues were included for Blastclust analysis (http://toolkit.tuebingen.mpg.de/blastclust) at the cutoffs of coverage 80% and identity 20%20.

Transcriptome analysis

Conidia of B. bassiana ARSEF 2860 strain were harvested from 14-day old potato dextrose agar and used for different assays. To examine gene induction on insect cuticle, locust (Locusta migratoria) hind wings were collected, air-dried and surface sterilized in 10% H2O2 (10 min). The wings were washed in sterile water (twice) and immersed in a Bb conidial suspension (2 × 107 spores per ml) for 20 seconds17. The inoculated wings were placed on 1% water agar and incubated at 25°C for 24 hrs for total RNA extraction. For analysis of transcriptional adaptation to insect hemocoel, the 5th instar cotton bollworm (Helicoverpa armigera) larvae were each injected with 10 μl of a spore suspension (108 spores/ml). Hemolymph from infected insects 48 hours post inoculation was collected on ice and immediately applied on top of a step gradient of 25 and 50% Centricoll (Sigma). The fungal cells were purified for RNA extraction by centrifugation at 10,000 g for 10 min at 4°C. For analysis of transcriptional adaptation to plant root exudates, mycelia harvested from 36 hour Sabouraud dextrose broth were incubated in corn root exudates for another 24 hours before being used for RNA extraction. Root exudates were prepared as described before81. RNA was extracted with a Qiagen RNeasy kit plus on-column treatment with RNase-free DNase I. Messenger RNA was purified and after reverse transcription into cDNA libraries were constructed for tag preparation according to the massively parallel signature sequencing protocol82. The tags were sequenced with an Illumina technique. We omitted tags from further analysis if only one copy was detected or it could be mapped to a different transcript. Other tags were mapped to the genome or annotated genes if they possessed no more than one nucleotide mismatch17,18. The level of gene transcription was converted to transcripts per million tags (TPM) for each mapped gene for expressional comparison between samples. The RNA_seq expression dataset is available at the Gene Expression Omnibus under the accession GSE32699.