The genus Colletotrichum (Sordariomycetes, Ascomycota; Fig. 1a) comprises 600 species1 attacking over 3,200 species of monocot and dicot plants (ARS Fungal Databases, see URLs). These pathogens use a multistage hemibiotrophic infection strategy2: dome-shaped appressoria first puncture host surfaces using a combination of mechanical force and enzymatic degradation, bulbous biotrophic hyphae enveloped by an intact host plasma membrane then develop inside living epidermal cells, and finally, the fungus switches to necrotrophy and differentiates thin, fast-growing hyphae that kill and destroy host tissues (Fig. 1b,c). We sequenced two Colletotrichum species with different host specificities and infection strategies: C. higginsianum attacks several members of Brassicaceae, including Arabidopsis, and has emerged as a tractable model for studying fungal pathogenicity and plant immune responses3,4,5. Biotrophy in this fungus is confined to the first invaded host cell and is followed by a complete switch to necrotrophy5 (Supplementary Fig. 1). In contrast, C. graminicola primarily infects maize (Zea mays), causing annual losses of approximately 1 billion dollars in the United States alone6. In this species, biotrophy extends into many host cells and persists at the advancing colony margin while the center of the colony becomes necrotrophic7 (Supplementary Fig. 2).

Figure 1: Phylogeny and infection of the two Colletotrichum species analyzed in this study.
figure 1

(a) Cladogram showing the phylogenetic relationship of Colletotrichum to other sequenced fungi, including 13 species used for comparative analyses (see Fig. 3). The unscaled tree was constructed using CVTree34 with Rhizopus oryzae as the outgroup. (b) Infection process of C. higginsianum (Ch) and leaf anthracnose symptoms on Brassica and Arabidopsis. The Brassica image is reproduced with permission of University of Georgia Plant Pathology Archive ( (c) Infection process of C. graminicola (Cg), and leaf-blight, top die-back and stalk-rot symptoms on maize. SP, spore; AP, appressorium; PH, biotrophic primary hyphae; SH, necrotrophic secondary hyphae.

Optical mapping showed that the genomes of the two species are similar in size and structure. C. graminicola has a 57.4-Mb genome that is distributed among 13 chromosomes, including three minichromosomes less than 1 Mb in size, whereas C. higginsianum has a 53.4-Mb genome comprising 12 chromosomes, including two minichromosomes (Supplementary Table 1). We sequenced the C. graminicola genome using Sanger and 454 platforms, which provided a high-quality reference assembly of 50.9 Mb. We sequenced the C. higginsianum genome using 454 and Illumina platforms, yielding an assembly of 49.3 Mb (Supplementary Table 2). Repetitive DNA comprises 12.2% of the C. graminicola genome assembly and 1.2% of the C. higginsianum assembly (Supplementary Tables 2 and 3). The repeats clustered in genomic regions with low GC content in C. graminicola (Supplementary Fig. 3), similar to the AT-rich isochores found in Leptosphaeria maculans8. Including unassembled genomic regions (mostly repeats, such as ribosomal DNA, telomeres, centromeres and transposons), repetitive DNA was estimated to total 22.3% of the C. graminicola genome and 9.1% of the C. higginsianum genome. The two Colletotrichum species diverged relatively recently (47 million years ago), after the separation of monocots and dicots 140–150 million years ago9 (Supplementary Fig. 4). Although C. graminicola and C. higginsianum belong to sister clades within the genus (Supplementary Fig. 5), only 35% of the two genomes are syntenic (Supplementary Table 4), which is less than the synteny between Botrytis cinerea and Sclerotinia sclerotiorum10. Nevertheless, an analysis of synteny between the two Colletotrichum genomes identified homologous chromosomes and revealed that major intrachromosomal rearrangements have occurred in one or both species (Fig. 2a and Supplementary Table 5). The minichromosomes do not contain homologous sequences (Fig. 2b and Supplementary Table 4), suggesting that they are lineage-specific innovations, and in C. graminicola, the minichromosomes are enriched with repetitive DNA (averaging 23%) compared to the core genome (averaging 5.5%).

Figure 2: Conservation of synteny between the genomes of C. graminicola and C. higginsianum.
figure 2

(a) Dot plot showing the syntenic blocks between the 13 chromosomes (optical linkage groups) of C. graminicola (horizontal axis) and the 12 chromosomes of C. higginsianum (vertical axis). Homologies between chromosomes of each species are highlighted in red dashed boxes. Homologous sequences of C. graminicola chromosome 9, indicated between the blue dashed lines, are dispersed among many C. higginsianum chromosomes. (b) Global view of syntenic alignments between the genomes of C. graminicola and C. higginsianum. Linkage groups of C. graminicola are shown as the reference, with linkage group lengths defined by the C. graminicola optical map. For each chromosome, numbered genomic scaffolds (dark gray) positioned on the optical linkage groups are separated by scaffold breaks. The magenta blocks show syntenic mapping of the C. higginsianum sequences; notably, there is a near absence of homologous sequences among the minichromosomes.

We predicted the existence of 12,006 protein-coding genes in C. graminicola compared to 16,172 in C. higginsianum (Supplementary Table 2). Having been compiled from short-read data only, the C. higginsianum assembly is more fragmented than that of C. graminicola, resulting in some genes (5.2%) being split into two or more gene models, whereas others (4%) are truncated versions of the complete gene (Supplementary Note). After correcting for this fragmentation, the estimated gene content of C. higginsianum (15,331) is still markedly larger than that of C. graminicola. The two species share 9,795 orthologous genes. Using Markov clustering (MCL)11 to analyze the proteomes, we found that 10,077 C. higginsianum genes belong to multicopy gene clusters, compared to 5,342 genes in C. graminicola, suggesting that the greater gene content of C. higginsianum results partly from gene duplication (Supplementary Table 6). The MCL analysis also revealed that gene clusters encoding serine proteases, methyl transferases, polyketide synthases, cytochrome P450 enzymes and small-molecule efflux pumps are expanded in C. higginsianum compared to C. graminicola (Supplementary Fig. 6), which we verified by manual inspection (Supplementary Tables 7 and 8 and Supplementary Fig. 7). Clusters that are expanded in C. graminicola relative to C. higginsianum include a family of genes encoding atypical cellulases (glycoside hydrolase GH61, described below) and another encoding secreted histidine acid phosphatases, which probably mobilize phytic acid, the main form of stored phosphorus in plants12.

C. higginsianum and C. graminicola are particularly well equipped with genes encoding carbohydrate-active enzymes (CAZymes)13 that potentially degrade the plant cell wall14 (Fig. 3a and Supplementary Table 9) and modify the fungal cell wall (Supplementary Tables 10 and 11). Both species encode more CAZymes than 13 other fungal genomes we examined. These expanded CAZyme arsenals are more similar to those of other hemibiotrophic and necrotrophic pathogens than to the highly reduced set found in biotrophs such as Melampsora and Blumeria (Fig. 3a). The exceptionally large and diverse inventory of CAZymes encoded by both Colletotrichum genomes provides a rich source of enzymes for potential commercial exploitation15. C. higginsianum encodes over twice as many pectin-degrading enzymes as does C. graminicola (Fig. 3b), the majority (62%) of which are activated during necrotrophy (Supplementary Fig. 8 and Supplementary Note). Conversely, although both species encode similar numbers of cellulases and hemicellulases, C. graminicola activates many more of these genes during necrotrophy (48%) than does C. higginsianum (26%), including 22 GH61 copper-dependent oxygenases, which act in concert with classical cellulases to enhance lignocellulose hydrolysis16,17. Thus, C. graminicola and C. higginsianum use very different strategies to deconstruct plant cell walls, reflecting their host preferences: dicot cell walls are enriched with pectin (35% in dicots compared to 10% in maize), whereas the cell walls of grasses contain more hemicellulose (60% in grasses compared to 30% in dicots) and phenolics (up to 5%)18.

Figure 3: Comparison of fungal carbohydrate-active enzyme (CAZyme) repertoires.
figure 3

(a) Hierarchical clustering of CAZyme classes from Colletotrichum and 13 other fungal genomes. GH, glycoside hydrolase; GT, glycosyltransferase; PL, polysaccharide lyase; CE, carbohydrate esterase; CBM, carbohydrate-binding module. The numbers of enzyme modules in each genome are shown. Overrepresented (orange to red) and underrepresented modules (pale yellow to white) are depicted as fold changes relative to the class mean. (b) Comparison of the pectin-degrading enzyme repertoires of C. higginsianum and C. graminicola shown as the number of modules in each CAZyme family11. In total, C. higginsianum encodes 86 such modules, whereas C. graminicola encodes only 42.

Many phytopathogens secrete proteins known as effectors that facilitate infection by reprogramming host cells and modulating plant immunity19. By defining candidate secreted effectors (CSEPs) as predicted extracellular proteins without any homology to proteins outside the genus Colletotrichum, we found 177 CSEP-encoding genes in C. graminicola, 85 (48%) of which were species specific. In contrast, C. higginsianum encodes twice as many CSEPs (365), including more species-specific proteins (264, or 72%) (Supplementary Fig. 9a). The CSEPs are mostly small proteins (averaging 110 residues and 175 residues in C. higginsianum and C. graminicola, respectively) and are more cysteine rich than the total proteome (Supplementary Fig. 9b). CSEP-encoding genes are randomly distributed across the chromosomes of C. graminicola, with no evidence for clustering, enrichment on particular chromosomes or localization near transposable elements or telomeres, as has been reported for some other plant pathogens8,20,21,22 (Supplementary Note). An MCL analysis revealed that relatively few Colletotrichum CSEPs (14% in both species) belong to small multigenic families with two to five members (Supplementary Fig. 9c). The larger, more diversified CSEP repertoire of C. higginsianum might be an adaptation to invade a broader range of host plants than C. graminicola, which is restricted to infection of Zea under field conditions23.

Both Colletotrichum species encode markedly more secondary metabolism enzymes (103 in C. higginsianum and 74 in C. graminicola) than other sequenced fungi (2–58 in ascomycetes24,25) (Supplementary Fig. 10a and Supplementary Table 8). In fungi, secondary metabolism genes are typically located in clusters26; we found 42 of these clusters in C. graminicola and 39 in C. higginsianum, surpassing the numbers found in most other sequenced ascomycetes (Supplementary Fig. 10b). Only 11 secondary metabolism gene clusters are shared between the two Colletotrichum species (Supplementary Table 12), and only 6 of these clusters show limited synteny (Fig. 4). This cluster diversity seems to result from gene duplication or loss and chromosomal rearrangements and may be related to the association of secondary metabolism gene clusters (71% in C. graminicola) with repetitive DNA (Supplementary Note). Because each secondary metabolism gene cluster is probably involved in the biosynthesis of a specific metabolite24, each Colletotrichum species can be expected to produce unusually large and divergent spectra of secondary metabolites, some of which may be previously unknown bioactive molecules.

Figure 4: Structure and transcription of a secondary metabolism gene cluster.
figure 4

(a) Gene cluster 18 from C. graminicola (Cg) is orthologous to cluster 10 from C. higginsianum (Ch). The latter is split between four small scaffolds in the Broad Institute genome annotation (supercontigs (SCs) 37, 481, 2,277 and 4,474) and was reconstructed based on an improved genome assembly (Supplementary Note). Microsynteny is indicated by gray bars. The 14 genes highlighted in red in the C. higginsianum cluster are co-regulated. Functional annotation for the cluster genes is provided in Supplementary Table 12. (b) Visualization of RNA-Seq coverage across the C. higginsianum polyketide biosynthesis cluster. The gray curves indicate read coverage (log scale) for the four samples. Co-regulated gene models are highlighted in red. VA, in vitro appressoria; PA, in planta appressoria; BP, biotrophic phase; NP, necrotrophic phase.

To investigate how the fungal genetic program is deployed during host infection, we applied Illumina RNA sequencing to both pathosystems (Supplementary Tables 13 and 14). We collected samples from infected Arabidopsis or maize leaves at intervals corresponding to pre-penetration appressoria, the early biotrophic phase and the transition to necrotrophy and from C. higginsianum appressoria formed in vitro (Fig. 5a). Almost all the gene models were transcribed in planta (14,972 C. higginsianum genes, or 92%, and 10,812 C. graminicola genes, or 90%). However, this transcription was highly dynamic, particularly in C. higginsianum, where 7,162 genes (44%) were differentially regulated (log2 fold change >2, P < 0.05) between one or more of the infection stages (Supplementary Tables 15a and 16). Fewer genes (2,619, or 22%) were differentially regulated in C. graminicola, which may reflect the contrasting biology of this species, where biotrophic and necrotrophic growth occur simultaneously (Supplementary Tables 15b and 17). The more clearly defined infection stages of C. higginsianum provided better temporal and spatial resolution of expression changes, and we therefore highlight our results for this species.

Figure 5: Expression profiling of pathogenicity-related genes in C. higginsianum.
figure 5

(a) Schematic representation of the four C. higginsianum developmental stages selected for RNA sequencing. Gray indicates polystyrene, green indicates living plant cell, and brown indicates dead plant cell. Hpi, hours post-inoculation. (b) Heatmaps of gene expression showing the 100 most highly expressed and significantly regulated genes (log2 fold change >2, P < 0.05) in five functional categories. Overrepresented (pale red to dark red) and underrepresented transcripts (pale blue to dark blue) are shown as log2 fold changes relative to the mean expression measured across all four stages. The arrow indicates the CSEP-encoding gene ChEC6 (CH063_01084). (c) The statistical significance of gene induction (y axis) in five functional categories during fungal developmental transitions (x axis). The P values were calculated using a one-sided Fisher's exact test and represent the probability of observing the number of significantly induced genes for a specific category during a transition given the total number of significantly induced genes during that transition (log2 fold change >2, P < 0.05) and the total number of genes in the category. (d) Transcriptional regulation of the effector gene ChEC6 by plant-derived signals. Confocal micrographs showing C. higginsianum expressing the mCherry reporter gene under the native ChEC6 promoter (overlays of bright-field and fluorescence channels). Appressoria (A) formed on polystyrene are unlabeled (top left), whereas those on the leaf surface (top right) have fluorescent cytoplasm. After host penetration, labeling is visible in young biotrophic hyphae (YH) but not older biotrophic hyphae (OH) (bottom). Scale bars, 10 μm. C, conidium.

Five gene categories relevant to pathogenicity (encoding transcription factors, secondary metabolism enzymes, CSEPs, CAZymes and transporters) had markedly different expression patterns during infection (Fig. 5b and Supplementary Fig. 11). We distinguished three waves of gene activation corresponding to pathogenic transitions (Fig. 5c). Among the genes upregulated at the appressorial phase were those encoding CAZymes that are predicted to degrade cutin, cellulose, hemicellulose and pectin, which may contribute to initial host penetration, together with a larger set of enzymes that potentially remodel the fungal cell wall (Fig. 5b and Supplementary Fig. 8a). However, early during infection, the transcriptome of C. higginsianum was dominated by secondary metabolism genes, with 12 different secondary metabolism gene clusters being induced before penetration and during biotrophy (Fig. 5b,c and Supplementary Table 13). This indicates previously unsuspected roles for appressoria and biotrophic hyphae in synthesizing an array of small molecules for delivery to the first infected plant cells. Because these cells initially remain alive, such molecules are probably not toxins and may instead function in host manipulation, similar to protein effectors27. Remarkably, the C. higginsianum secondary metabolism gene cluster with the strongest activation at this stage was silent in C. graminicola at all the infection stages we examined (Fig. 4 and Supplementary Fig. 12), suggesting that additional metabolite diversity is generated through transcriptional regulation.

Different sets of CSEP-encoding genes were expressed at each infection stage, but the majority of these genes were strongly induced during biotrophy (Fig. 5b,c and Supplementary Table 16). This suggests Colletotrichum requires a maximum capacity for host manipulation during intracellular colonization and that biotrophic hyphae provide a major interface for effector delivery to host cells. These specialized hyphae morphologically resemble the haustoria of obligate biotrophs, which function both as platforms for effector secretion and feeding structures for the uptake of sugars and amino acids28,29. However, we found no evidence for specific transcriptional reprogramming of nutrient transporters in C. higginsianum during biotrophy (Fig. 5b,c and Supplementary Table 16), suggesting that the biotrophic hyphae of this pathogen function primarily to deliver protein effectors and secondary metabolites to the plant cell.

Transcripts encoding a vast array of lytic enzymes are induced at the transition to necrotrophy, when the pathogen uses dead and dying host cells as a nutrient source to support rapid colonization and sporulation (Fig. 5b,c and Supplementary Fig. 8). These enzymes include 44 putative secreted proteases and 146 CAZymes that potentially cleave all major polysaccharides in the host wall (Supplementary Figs. 8 and 11). Concomitantly, numerous genes encoding plasma membrane transporters that may be required for assimilating the products of this degradative activity, for example, oligopeptides, amino acids and sugars, are also induced (Fig. 5b,c). In fungi, genes encoding secreted proteases, CAZymes and permeases are often subject to pH regulation30. Consistent with this, we found evidence that necrotrophy in C. higginsianum is associated with local alkalinization of Arabidopsis tissue, probably resulting from fungal ammonia secretion31, but tissue alkalinization was less pronounced in maize colonized by C. graminicola at this stage (Supplementary Fig. 13).

Notably, although appressoria in vitro are morphologically indistinguishable from those in planta, their transcriptomes are substantially different, with 1,515 genes significantly induced by host contact (Supplementary Table 15). One of these, the CSEP-encoding gene called ChEC6 (CH063_01084)32, is the most highly and significantly induced of all C. higginsianum genes (>50,000-fold) compared to appressoria in vitro. To experimentally verify this expression pattern at the cellular level, we generated transgenic C. higginsianum strains expressing a reporter gene under the control of the ChEC6 promoter (Fig. 5d). Using this method, we confirmed that the transcription of ChEC6 was plant specific, starting in the appressorium before penetration and continuing in young biotrophic hyphae, but it was switched off before the hyphae were fully expanded, indicating that its expression is transient and tightly regulated. The large-scale reprogramming of appressorial gene expression in planta shows that these specialized cells are highly responsive to host-derived cues that are perceived before penetration. Long regarded as organs of attachment and penetration33, our findings assign a previously unsuspected sensory function to fungal appressoria, enabling the pathogen to prepare for the subsequent invasion of living host cells.

Major hemibiotrophic plant pathogens such as Colletotrichum and the rice blast fungus Magnaporthe oryzae undergo major transformations in cell morphology and infection mode when switching from growth on the plant surface to intracellular biotrophy and from biotrophy to necrotrophy. Genome sequencing combined with high-throughput transcriptome sequencing revealed the transcriptional dynamics underlying these transitions and led us to redefine the functions of appressoria and intracellular hyphae. Despite their similar morphologies, a genomic comparison of C. higginsianum and C. graminicola uncovered major differences in their gene content. We propose that the diversification of functions required for host interaction, notably, the secretion of small-molecule and protein effectors and the degradation of plant polymers, allows C. higginsianum to colonize a wider range of plant species. In contrast, C. graminicola, a pathogen that is adapted to a narrow range of hosts, has maintained a more targeted arsenal of virulence factors.


Broad Institute Colletotrichum Genome Database,; Max Planck Institute for Plant Breeding Research Fungal Genomes Database,; Fungal Transcription Factor Database,; Fungal Cytochrome P450 Database,; Transporter Classification Database,; Saccharomyces Genome Database,; NCBI Conserved Domains Database,; SAMtools,; Broad Institute Integrative Genomics Viewer (IGV) browser,; Geneious v5.5.,; RepeatMasker Open-3.0,; InterProScan,; ARS Fungal Databases,


Sequencing and assembly.

C. graminicola strain M1.001 (M2) was collected in Missouri from infected maize (Fungal Genetics Stock Center culture 10212). C. higginsianum strain IMI349063 was isolated from Brassica campestris in Trinidad and Tobago (CABI culture collection, Wallingford, UK). The genome assemblies of C. graminicola were generated at the Broad Institute by combining data from Sanger and 454 pyrosequencing using a Newbler hybrid approach. Paired-end reads from 468,734 plasmids and 67,151 fosmids improved the continuity of the assembly (Supplementary Table 2). In the assembled genome, more than 98.5% of the sequence bases had quality scores >40. The C. higginsianum genome assembly was generated by GATC Biotech AG (Konstanz, Germany) by combining 454 GS-FLX shotgun reads and Illumina GAII mate-pair reads. Additionally, 864 fosmids were end-sequenced with Sanger technology. After removing dinucleotide repeats, the 454 reads and the fosmid end sequences were coassembled using the SeqMan NGen assembler (DNAStar Inc., USA). Contigs were then sorted into scaffolds using the paired-end information derived from an Illumina 3-kb–insert mate-pair library (2 × 36 bp reads). Scaffolds were manually edited to correct falsely joined contigs and falsely arranged scaffolds. To correct homopolymer sequencing errors in the 454 data, the Illumina GA data (76-fold coverage) were mapped to the scaffolded contigs, and the depth of coverage was used to create a final corrected consensus sequence (Supplementary Table 2).

Gene annotation.

A total of 28,424 expressed sequence tags (ESTs) from two C. graminicola complementary DNA libraries and 828,592 ESTs from six C. higginsianum libraries were used to provide a training set for the gene-calling pipeline and for validating the gene models (Supplementary Note). Protein-coding genes were annotated in C. graminicola using multiple lines of evidence from BLAST, PFAM searches and EST alignments, as described previously20. Gene structures were predicted using the Broad Institute automated gene-calling pipeline35 based on a combination of gene models predicted by the programs FGENESH (Softberry Inc., USA), GENEID36, GeneMark37, SNAP38 and Augustus39 together with EST-based and manually curated gene models. GENEID, FGENESH, SNAP and Augustus were trained using a set of high-confidence EST-based gene models generated by clustering Blat-aligned species-specific ESTs. By combining BLAST, EST and ab initio predictions, annotators manually built additional gene models that were otherwise missed by the automated annotation. C. graminicola was predicted to have 12,006 gene models, 39% of which were verified by the alignment of 13,600 Sanger EST reads. The C. higginsianum gene set was created similarly and was filtered using TBLASTN alignments from 10,661 of the C. graminicola gene models (<1 × 10−10). Another 1,564 gene models were based on evidence from C. higginsianum ESTs, and 600 were based on EVidenceModeler (EVM) models having BLAST hits to proteins in the UniRef90 database. C. higginsianum was predicted to have 16,172 protein-coding genes, 89% of which were validated by the alignment of 135,923 ESTs from 454 sequencing.

Optical mapping.

C. graminicola and C. higginsianum protoplasts40 were lysed and prepared for optical mapping41 using MluI (with an average fragment size for both genomes of 9.2 kb). Raw datasets comprising single DNA molecule maps (Rmaps; 300× coverage per genome) were assembled into genome-wide contig maps spanning each chromosome using divide-and-conquer41 and iterative assembly strategies42. PROmer from the MUMmer package43 was used to conduct pairwise comparisons between the C. graminicola and C. higginsianum genomes (Fig. 2a and Supplementary Table 4). The synteny map (Fig. 2b) was generated using the Argo browser44.

Transposable element analysis.

Repetitive DNA elements were identified by performing a self BLASTN of each genome and processing the output with a custom Perl script (available on request), which identified multicopy sequences and organized them into nonredundant families. Consensus sequences of these families were then used to generate a custom library for RepeatMasker (see URLs) to scan both genome assemblies. The distributions of the genes, the transposable elements and the GC content were examined within a 100-kb window, sliding 10 kb across each chromosome.

Orthology and multigene families.

To identify differences in gene family size between C. graminicola and C. higginsianum, we clustered their proteomes using the Markov clustering program MCL11. An all-versus-all BLASTP search was performed using default parameters, followed by clustering with MCL using an inflation value of 2.0. We also included the proteomes of 13 additional fungal species (Fig. 3). Sequences were aligned using MAFFT45, and phylogenetic trees were constructed using the neighbor-joining method, followed by a bootstrap test with 100 replications. Sequence editing and alignment and phylogenetic analyses were performed using Geneious Pro (version 5.5; see URLs).

Annotation of specific gene categories.

Secretomes of both Colletotrichum species were predicted using WoLF-PSORT46. CSEPs were defined as extracellular proteins with no significant BLAST homology (expect value <1 × 10−3) to sequences in the UniProt database (SwissProt and TrEMBL components). Homologs of proteins from outside the genus Colletotrichum were excluded. Genes encoding putative carbohydrate-active enzymes were identified using the CAZy annotation pipeline13. To identify secreted peptidase genes, sequences of predicted extracellular proteins were subjected to a MEROPS Batch BLAST analysis47. Membrane transporters were identified from BLAST searches against the Transporter Collection Database (see URLs) and Saccharomyces Genome Database (see URLs). Secondary metabolism genes were initially identified using MCL and gene family searches using the Broad Institute Colletotrichum database, BLAST searches against GenBank and InterproScan analysis (see URLs). The Secondary Metabolite Unknown Region Finder (SMURF)26 was used to predict secondary metabolism gene clusters. SMURF was applied to the Velvet assembly of C. higginsianum (Supplementary Note). Candidate genes identified using automated searches were inspected manually, including protein sequence alignments to known enzymes and searches against the NCBI Conserved Domain Database (see URLs). Further details of the secondary metabolism gene annotation are presented in the Supplementary Note. The Fungal Cytochrome P450 Database (see URLs) and Fungal Transcription Factor Database (see URLs) were used to annotate cytochrome P450 enzymes and transcription factors, respectively.

Whole-genome transcriptome profiling.

Arabidopsis leaves infected by C. higginsianum were obtained as described previously48. Sampling and RNA isolation of the pre-penetration stage (22 h hpi), the early biotrophic stage (40 hpi), the switch between biotrophy and necrotrophy (60 hpi) and in vitro appressoria (22 hpi) have been described previously32,49. Each experimental repetition of the in planta stages was based on RNA extracted from 300 leaves. Maize leaf sheaths infected by C. graminicola were obtained as described previously50. Sheaths from the maize inbred line Mo940 at the V3 stage were cut into 5-cm–long segments and inoculated with two 10-μl drops of spore suspension (5 × 105 spores per ml). Sheaths containing mature pre-penetration appressoria (24 hpi), intracellular biotrophic hyphae (36 hpi) and necrotrophic hyphae with water-soaked lesions (60 hpi) were sampled. Each leaf sheath was trimmed to include only the inoculated area, and total RNA was extracted as described previously51 (15 maize sheaths per experimental repetition). The RNA integrity of all samples was verified on an Agilent 2100 Bioanalyzer.

Twelve C. higginsianum libraries (four developmental stages and three biological replicates) and nine C. graminicola libraries (three developmental stages and three biological replicates) were prepared with the Illumina TruSeq RNA Sample Preparation Kit and sequenced using the Illumina Genome Analyzer IIx (single reads, 100 bp for C. higginsianum and 76 bp for C. graminicola). Further details are provided in the Supplementary Note. The RNA-Seq reads were mapped to the annotated genomes with TopHat (a = 10, g = 5)52 and transformed into counts per annotated gene per sample with the 'coverageBed' function from the BEDtools suite53 and custom R scripts. Differentially expressed genes between two developmental stages were detected using the 'exactTest' function from the R package EdgeR54. To calculate fold changes, the number of reads for each gene in each library was normalized by the total number of mapped reads for the library, and direct ratios (log2) were calculated between the different developmental stages. Transcripts with a significant P value (<0.05) and more than a twofold change (log2) in transcript level were considered to be differentially expressed. All P values were corrected for false discoveries resulting from multiple hypothesis testing using the Benjamini-Hochberg procedure. Heatmaps of gene expression profiles were generated with the Genesis expression analysis package55. All codes for the RNA-Seq processing are available upon request. The C. higginsianum RNA-Seq data were also mapped onto the unannotated Velvet genome assembly (Supplementary Note) using bowtie56 and visualized with SAMtools (see URLs) and the IGV browser (see URLs). RNA-Seq expression profiles were validated by quantitative RT-PCR (Supplementary Note).

Molecular phylogeny and evolutionary divergence date estimation.

A whole-genome cladogram showing the phylogenetic relationships of C. graminicola and C. higginsianum to 17 other sequenced fungi was constructed with CVTree34 (Fig. 1). A phylogeny was generated for the genus Colletotrichum based on sequencing five genes in 28 selected isolates (Supplementary Fig. 5), as described in the Supplementary Note. To estimate the evolutionary divergence date for C. graminicola and C. higginsianum, a phylogenetic analysis was performed using the 13 species shown in Supplementary Figure 4. The proteomes were clustered using MCL, and proteins in each cluster were aligned using MUSCLE. Sixty-four clusters containing only one protein from each species and having at least 80% average pairwise nucleotide identity were used for further analyses. Sequence alignments were concatenated, and a phylogenetic tree was constructed with MrBayes57 using the WAG amino acid substitution model. Date estimates were computed using the program r8s58 with the nonparametric rate smoothing (NPRS) method using date estimates by Lücking et al.59.

Fluorescent reporter gene assay.

The promoter of the CSEP-encoding gene ChEC6 (CH063_01084) was fused to mCherry60 and a transcriptional terminator by overlap fusion PCR61 using the primer pairs shown in Supplementary Table 18. The genomic region between the ChEC6 start codon and the stop codon of its upstream gene (1,198 bp) was amplified with primer pair 1. The mCherry gene was amplified with primer pair 2. The transcriptional terminator of Aspergillus nidulans trpC was amplified from the plasmid pBin-GFP-hph5 with primer pair 3. After fusion, the insert was subcloned into the plasmid pENTR/D-TOPO (Invitrogen) and verified by sequencing. The insert was cut out with BamHI and EcoRI and ligated into the plasmid pBIGDR1, providing direct repeat recombination-mediated gene targeting62. A ku70 mutant of C. higginsianum strain IMI349063 (ref. 62) was used for Agrobacterium-mediated transformation3. Confocal images of transformants were obtained using a Leica TCS SP2 confocal laser scanning microscope. Excitation for imaging mCherry fluorescence was at 563 nm, and emission was detected at 566–620 nm.

Host tissue alkalinization.

The pH of the host cells during infection was measured using the cell-permeant pH-sensitive dye 2′,7′-bis(carboxyethyl)-5(6)-carboxyfluorescein (BCECF) for analysis by epifluorescence microscopy31. Fluorescence intensity values were correlated with direct pH determinations obtained with a piercing-tip pH electrode (Eutech, Singapore). Ammonia concentrations in infected maize and Arabidopsis leaf tissues were measured using a photometric ammonium assay kit (Merck, Germany).

Accession codes.

The C. graminicola and C. higginsianum genome assemblies have been deposited in NCBI's Whole-Genome Shotgun Project with accession numbers ACOD0100000000 and CACQ0200000000, respectively. The RNA-Seq data for C. graminicola and C. higginsianum have been deposited in the NCBI Gene Expression Omnibus under GEO Series accession numbers GSE34632 and GSE33683, respectively.