The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level of synteny was observed with other aspergilli sequenced. Strong function predictions were made for 6,506 of the 14,165 open reading frames identified. A detailed description of the components of the protein secretion pathway was made and striking differences in the hydrolytic enzyme spectra of aspergilli were observed. A reconstructed metabolic network comprising 1,069 unique reactions illustrates the versatile metabolism of A. niger. Noteworthy is the large number of major facilitator superfamily transporters and fungal zinc binuclear cluster transcription factors, and the presence of putative gene clusters for fumonisin and ochratoxin A synthesis.
Aspergillus niger, a member of the black aspergilli, is widely used in biotechnology for the production of food ingredients, pharmaceuticals and industrial enzymes. In their natural habitat A. niger strains secrete large amounts of a wide variety of enzymes needed to release nutrients from biopolymers. This high secretory capacity is exploited by industry in both solid state and submerged fermentations1,2. A. niger has a long tradition of safe use in the production of enzymes and organic acids. Many of these products have obtained GRAS (generally regarded as safe) status3. Aspergillus enzymes are used in starch processing, baking, brewing and beverage industries, in animal feed and in the paper and pulping industry. Furthermore, A. niger is used as host for the production of heterologous proteins4,5 and as cell factory for the production of citric acid and gluconic acid6. A. niger exhibits a remarkably versatile metabolism, enabling growth on a wide range of substrates under various environmental conditions. Its ability to degrade a range of xenobiotics through various oxidative, hydroxylation and demethylation reactions provides potential for use in bioremediation7.
In this paper we describe the genomic DNA sequence of A. niger strain CBS 513.88, its annotation and an initial gene expression study using Affymetrix DNA microarrays. CBS 513.88 is an early ancestor of currently used enzyme production strains. Several features are compared with the recently published genomes of the aspergilli A. oryzae8, A. nidulans9 and A. fumigatus10. The availability of the genome will facilitate the development of new products, improved strains and more efficient processes.
Genome sequence and analysis
The genome of A. niger CBS 513.88 was sequenced using an ordered set of large insert Escherichia coli bacterial artificial chromosomes (BACs) in a process called BAC walking. First, a large insert (up to 150 kb) BAC library covering the genome more than 20 times was constructed and used to generate over 18,000 BAC end sequences with an average length of 500 base pairs (bp). In parallel, several of the larger BACs were sequenced using a 7.5× coverage shotgun approach. Subsequently, sequences of these fully sequenced 'seed' BACs were compared to the BAC ends, allowing the selection of a new set of BACs showing overlap with one of the sequenced BACs. The newly selected BACs were used for shotgun sequencing. A total of 505 BACs, representing a minimal tiling set covering the genome, were selected and sequenced using the BAC walking approach. The assembled genome sequence consists of 468 DNA contigs spanning a total of 33.9 million unique bp (Mbp) arranged in 19 supercontigs. Recently an 8.9× shotgun coverage draft sequence of the citric acid–producing A. niger ATCC 1015 was made available by the DOE Joint Genome Institute (http://www.jgi.doe.gov/aspergillus). Based on Blastn analysis the ∼37.2-Mbp genome sequence is 97% identical to and largely colinear with the genome sequence published here.
Gene identification and annotation
Algorithms specifically trained on the basis of genes known in A. niger and related filamentous fungi were used to predict 14,165 protein coding genes (Table 1). Because automated gene-model predictions frequently fail to correctly predict all intron and exon boundaries, extensive manual curation was used to verify the predicted gene models and to assign gene functions on the basis of similarity to known proteins with an established function. In this way we improved mainly the models of genes encoding proteins with similarity to known proteins.
The specificity of the 269 tRNA genes and the codon usage are presented in Supplementary Tables 1 and 2 online and confirm the published usage11. A. oryzae has a comparable set of tRNAs (245), whereas A. nidulans and A. fumigatus contain 188 and 179 tRNAs, respectively. Whereas the observed tRNA density is higher for A. niger and A. oryzae, the codon distribution is similar for all four species. The functional catalog (FunCat) classification system12 was applied to functionally describe the A. niger predicted proteome. Of the 14,165 predicted proteins, approximately half (6,505) could be assigned to functional protein classes relating to metabolism, cellular transport and protein fate (Fig. 1). A detailed comparison was made between A. niger and eight other filamentous fungi (Fig. 1): 9,253 A. niger proteins have an ortholog in at least one of the nine queried species, 1,992 proteins have orthologs in all species, whereas 3,373 proteins are shared between the filamentous fungi. The distribution of orthologous proteins over the various FunCat classes is shown (Fig. 1). Compared to the other filamentous fungi A. niger contains a remarkably large number of unique proteins involved in C-compound, carbohydrate, lipid, fatty-acid and isoprenoid metabolism and secondary metabolism, reflecting the versatility of A. niger as a cell factory. In contrast, the number of unique proteins involved in cellular transport, protein secretion and fate are rather constant between the various filamentous fungi. Apparently A. niger is able to use its secretion machinery in a very efficient way.
Phylogenetic relationship and synteny between aspergilli
Phylogenetic relationships between filamentous fungi have often been based on ribosomal DNA sequences or single-gene families13. We selected twenty strictly orthologous sequences from A. niger and eight filamentous fungi. For A. terreus and A. clavatus no annotated proteome is available, hence translations of orthologous gene fragments were used instead. The multiple sequence alignment was manually curated and used to build a maximum likelihood tree (Supplementary Table 3 and Supplementary Fig. 1 online). In this tree A. niger is closely related to both A. terreus and A. oryzae and separate from A. fumigatus and A. clavatus, which have smaller genomes (29–30 Mb), and A. nidulans, which branches earlier.
Of the 8,695 A. niger genes that have an ortholog in A. nidulans, A. oryzae and A. fumigatus, 6,755 (78%) show conservation of neighboring orthologs (synteny) in at least one of the three other species; 4,189 genes (48%) are syntenic in all four Aspergillus species and were plotted on the proposed physical map of A. niger. Large parts of the A. niger genome show a high conservation of gene order with one of the other Aspergillus species, in line with previous observations9. However, within those regions numerous intrachromosomal rearrangements and microinversions have occurred (Fig. 2). A. nidulans centromere-flanking genes mapped without exception to either the 5′ or 3′ end of a large A. niger supercontig and for each linkage group two centromere flanking supercontig ends were found. However, no clear centromeric sequences were identified. Inferred telomeric regions show little to no synteny (Fig. 2).
Linking the A. niger physical map to the genetic map
Parasexual recombination and electrophoretic karyotyping were previously used to establish the presence of eight linkage groups in the widely studied A. niger strain CBS 120.49 (refs. 14,15). We used these methods to assign 72 characterized or newly selected genes to specific chromosomes (Supplementary Table 4 online). This positioning unambiguously links all supercontigs of CBS 513.88 to single chromosomes of CBS 120.49. Electrophoretic karyotype data indicate that chromosomes VI and VII may not be completely covered by sequence data (Table 2). For chromosome VI this most likely relates to the presence of rDNA repeats whereas for chromosome VII this may be due to a large deletion in strain CBS 513.88. Four different methods were used to establish the orientation of supercontigs on the chromosomes resulting in a proposed alignment of the 19 supercontigs of strain CBS 513.88 with the eight linkage groups of strain CBS 120.49 (Table 2 and Supplementary Data online).
Life cycle and reproduction
The majority of aspergilli, including A. niger are only known to reproduce by asexual means, forming conidiospores16. The identified set of genes involved in signal transduction and conidiophore development is essentially the same as that of A. nidulans (Supplementary Table 5 online). Asexuality is thought to be derived from an ancestral sexual state17. We therefore screened the genome of A. niger for the presence of genes involved in incompatibility (Supplementary Table 6 online) and sexual reproduction (mating processes, signal transduction and ascomata development). A full complement of apparently functional, early sexual development genes, including a key mat-1 alpha domain mating-type gene was identified (Supplementary Table 5 and Supplementary Data). Two genes were only partially retrieved and one (pro1) contained a premature stop codon. mat-1 and pheromone-precursor and receptor genes were not expressed during fed-batch fermentation, but this was not surprising given that mating and pheromone signaling require specific environmental conditions for induction (Supplementary Table 7 online). It has recently been suggested that the 'asexual' species, A. fumigatus and A. oryzae might have sexual potential9,18. A similar situation could apply to A. niger, which would be of great value for strain improvement.
Cell wall development
The fungal cell wall determines biotechnologically relevant features such as morphology during fermentation and cell integrity. The cell wall of A. niger consists of chitin, 1,3-β-glucan, 1,6-β-glucan, 1,3-α-glucan, galactosaminogalactan and galactomannan, similar to A. fumigatus19. Genes required for the biosynthesis of these cell wall components were identified in the genome (Supplementary Table 8 online). As in other aspergilli, a high degree of redundancy is observed for chitin synthases and chitinases compared to Saccharomyces cerevisiae (Supplementary Table 9 online). A. niger contains five putative 1,3-α-glucan synthase genes, the highest number in the four published Aspergillus genomes. Specific members of each gene family were expressed during vegetative growth in a fed-batch culture on glucose and ammonium (Supplementary Table 7).
The cell wall integrity (CWI) signaling pathway is responsible for cell wall remodeling and reinforcement in response to a changing environment20. Several components of this pathway are present in the genome (Supplementary Table 8) and indeed expressed during fed-batch culture, ensuring the growth of A. niger under harsh industrial-process conditions (Supplementary Table 7 and Supplementary Fig. 2 online).
Hydrophobins are important determinants of cell morphology through their amphipathic monolayer structure21 (Supplementary Table 8). Two out of seven of the predicted hydrophobin genes are expressed during submerged fermentation (Supplementary Table 7). By modifying the expression of hydrophobin genes, the cell wall composition and hyphal morphology of A. niger may be altered, for example, to improve protein production22. Several genes involved in autolysis are present16,23. Reduced autolysis through controlled expression of these genes is relevant for industrial processes to minimize proteolysis of secreted proteins and to simplify downstream processing24.
Central metabolism and organic acid production
A novel genome-scale model of A. niger central metabolism was developed on the basis of the A. niger genome. This metabolic model was constructed by supplementing literature on A. niger metabolism with pathway databases and literature on other aspergilli25,26. The reconstructed metabolic network consists of 1,069 unique reactions, of which 733 are based directly on literature and 832 are supported by genomic data, with an overlap of 570 reactions. Reactions not supported by literature or genomic data were included to ensure connectivity of the different pathways. For central metabolism only 24 enzymes out of 785 (excluding transport) are not linked to an annotated open reading frame (ORF, 3%), indicating an excellent coverage of metabolic genes. An overview of the subset of 785 unique reactions is presented in Supplementary Table 10 online. Examining the entire metabolic model, only 65 enzymes were not found in the annotation (Supplementary Table 11 online). A more detailed discussion on A. niger central metabolism and transport characteristics is presented in Supplementary Data, Supplementary Figures 3 and 4 online.
The extremely flexible metabolism and high nutritional versatility of A. niger is confirmed by the presence of various solute-transporter classes (Supplementary Table 12 online). Members of the major facilitator superfamily appear to be exceptionally abundant in A. niger with 461 genes, which is in the same range as A. oryzae (507), but greater than in A. fumigatus (275) and A. nidulans (358) (Supplementary Table 13 online). These proteins are predicted to be involved in the transport of a wide range of substrates and several may function as nutrient sensors. Aspergillus species have also undergone a significant expansion in the number of fungal zinc binuclear cluster proteins (PF00172)27 (Supplementary Table 14 online). This distribution is mirrored by a second PFAM domain (PF04082), which was originally identified as occurring in a proportion of proteins with PF00172 domain. This is most dramatic in A. niger and is not reflected in other transcription factor domains (Supplementary Table 15 online). The characterized examples of these proteins are involved in regulating diverse aspects of primary and secondary metabolism including polysaccharide degradation.
Using the reconstructed metabolic network, we investigated why A. niger is an efficient organic acid producer28. Several enzymes involved in the formation of the citrate precursor oxaloacetate were found, for example, two pyruvate carboxylases (one cytoplasmic, one mitochondrial), four malate dehydrogenases (three cytoplasmic and one mitochondrial). In addition a mitochondrial oxaloacetate transporter was found (Fig. 3). A. niger contains one cytosolic and three putative mitochondrial citrate synthases. Phylogenetic analysis of the A. niger citrate synthases in various aspergilli reveals two clusters containing one and three members, respectively (Supplementary Fig. 5). The second cluster may have originated from a duplication of an ancestral citrate synthase that was lost in all organisms except in some fungi and plants. At least one additional gene duplication event occurred (An01g09940 and An08g10920). Similar redundancy was found for aconitase, for which two putative cytoplasmic and two mitochondrial forms were found. The various gene duplications may be important in view of the efficient production of citrate by A. niger. Moreover, the necessary transport steps could be facilitated by two mitochondrial and several cytoplasmic membrane tricarboxylate transporters. A. niger, like other fungi, contains two ATP-citrate lyases and one eukaryotic beta-chain mitochondrial citrate lyase. This indicates a high degree of conservation of the 'fermentative pathway' of citrate. Because one of the ATP-citrate lyases is mitochondrial, a futile cycle of citrate formation and degradation needs to be prevented. The identification of genes involved in citric acid metabolism and transport provides excellent opportunities to study and understand the efficient production of citric acid by A. niger in much more detail.
A. niger is also able to efficiently produce gluconic acid. Genes are present that encode one intracellular and three secreted glucose oxidases. The presence of eleven ORFs encoding catalases ensures protection against hydrogen peroxide generated by glucose oxidase; two ORFs contain a signal sequence required for protein export. At least one of the four putative lactonases is located extracellularly. Two gluconate-specific kinases are present, suggesting that catabolism of gluconate proceeds via phosphorylation to 6-P-gluconate.
Oxalic acid is an undesired by-product in A. niger fermentations. One biosynthetic pathway involves oxaloacetate hydrolase29. In addition, the presence of two glycolate reductases and several lactate dehydrogenases in A. niger suggests the activity of an alternative pathway. This resembles the situation in mammalian liver cells30, in which the glyoxylate cycle operates during citrate consumption. Oxalate may be further catabolized by oxalate decarboxylase, yielding formate. Four putative oxalate/formate antiporters for formate transport were found (Supplementary Table 12).
Protein secretion in A. niger
For reasons unknown, A. niger is a far more effective natural secretor of proteins than the well-studied yeast S. cerevisiae31. In eukaryotes protein secretion involves transport via endoplasmatic reticulum (ER), Golgi apparatus and vesicles to the cell membrane (Fig. 4 and Supplementary Table 16 online). Translocation from the cytoplasm to the ER occurs in A. niger through the established signal recognition particle (SRP)-dependent and SRP-independent pathways. As in mammals, no ortholog of the essential yeast signal recognition and docking protein Srp21p was found. ER lumenal HSP70-type protein-folding chaperones, including BipA an LhsA are present. In S. cerevisiae, Kar2p (equivalent of BipA in A. niger) functions together with the nucleotide-exchange factor Sil1p32. Surprisingly, no ortholog of yeast Sil1p can be found in A. niger and other aspergilli, whereas a putative ortholog in Neurospora crassa could be detected. A. niger encodes three soluble lumenal and one putative membrane-bound protein disulphide isomerase (PDI). Only PdiA (ortholog of yeast Pdi1p) and the membrane-associated EpsA (Eps1p) appear to be close orthologs of known yeast proteins. Finally, in contrast to several yeast species, plants and mammals, the ER lumenal protein EroA of the sequenced aspergilli contains a predicted C-terminal ER-retention signal.
The A. niger unfolded protein response (UPR) signaling pathway is strikingly different from that in yeast31,33. In mammalian cells the protein p58 interacts with and inhibits the ER-localized eIF2alpha kinase PERK. This kinase is involved in translational regulation, attenuating the UPR during ER stress34. Orthologs of p58 and PERK are not found in yeast. The presence of a putative p58 ortholog in A. niger is remarkable as no ortholog of PERK was found.
Most components of the glycosylation machinery are readily identified by sequence similarity. Although present in A. fumigatus and A. nidulans, no ortholog of yeast Alg14p was identified in A. niger. Interestingly, the dolichol-P-mannose synthase gene is more mammalian- than yeast-like as it does not contain a C-terminal hydrophobic transmembrane region. A. niger possesses a glycosylation-dependent quality-control system that is clearly distinct from that of S. cerevisiae. A. niger contains an ortholog of the protein-folding sensor UDP-Glc:glycoprotein glucosyltransferase absent in yeast. Calnexin is more similar to its counterpart in Schizosaccharomyces pombe.
ER-associated protein degradation (ERAD) directs misfolded or unassembled proteins to the proteasome35. A. niger lacks clear orthologs of the yeast ERAD proteins Cue1p, Rad23p, Ubx2p and Yos9p and shows little homology to other ERAD components such as Der1p, Hrd1p, Doa10p and Hrd3p. Orthologs of all subunits of the yeast 26S proteasome are found, but orthologs of the regulatory proteins Rpn13p and Rpn14p36 appear to have low similarity in A. niger or to be absent (Supplementary Data). The A. niger secretion system is equipped with a well-developed machinery for glycoprotein quality-control, combined with an effective ERAD pathway, which differs from that in yeast. In conclusion, the A. niger secretory system shares some components with mammals and some with S. cerevisiae.
Aspergilli contain a wide spectrum of enzymes for polysaccharide, protein and lipid degradation. The two industrial aspergilli (A. niger and A. oryzae) contain the highest percentage of extracellular enzymes. Cellulases, hemicellulases, pectinases, amylases, inulinases, lipases and proteases are used in a range of industrial applications. Here we focus on polysaccharide-degrading enzymes and proteases.
Glycosyl hydrolase, lyase and esterase families involved in polysaccharide degradation in the aspergilli sequenced were identified using the carbohydrate-active enzymes (CAZy) classification (http://www.cazy.org/, Table 3). A detailed overview of the A. niger enzymes is presented in Supplementary Table 17 online. Specific differences between the aspergilli were noticed. In contrast to the other three aspergilli, A. niger contains only one GH10 and four GH11 endoxylanases, a xylanase family liberating larger oligosaccharides. For inulin degradation A. niger and A. fumigatus both contain invertases and endo- and exoinulinases. However, A. nidulans does not have endoinulinases, whereas A. oryzae appears to contain only invertases. This suggests that the latter two species degrade inulin extracellularly.
For the degradation of pectin a large set of enzymes is required37. A. niger has the largest set (21) of (GH28) enzymes, including seven candidate endopolygalacturonases (Supplementary Table 17). In contrast to the other aspergilli, A. niger has only a single pectate lyase (PL1). The pectinolytic machinery is completed by a single putative delta-4,5-unsaturated glucuronyl hydrolase (GH88), two putative delta-4,5-unsaturated rhamnogalacturonyl hydrolases (GH105), three putative pectin methylesterases (CE8) and one candidate rhamnogalacturonan acetyl esterase (CE12) (Table 3). The single pectate lyase is consistent with the acidifying properties of A. niger, given that pectate lyases show little activity at acidic pH. Moreover, A. niger does not contain exoarabinanases, indicating that A. niger relies on the combined action of endoarabinanase and α-L-arabinofuranosidase for the hydrolysis of the arabinan side-chains of pectin.
The glycoside hydrolase family GH13 involved in starch degradation is larger in aspergilli compared to other fungi. The GH13 family contains three separate groups of amylase-type enzymes in A. niger based on phylogenetic clustering (Supplementary Fig. 6). A. niger contains four putative extracellular α-amylases (Supplementary Table 17). A second group consists of enzymes with relatively low homology to known alpha-amylases, which do not contain signal sequences. A third group consists of glycosyl phosphatidylinositol–anchored enzymes, which may play a role in the maintenance of the cell walls containing α-glucan38. The A. niger genome contains 42 enzymes classified as CAZy family members without a signal sequence. The majority of these are likely to be involved in the degradation of disaccharides or glycosides imported by the fungus.
A. niger has a full set of enzymes to degrade polypeptides (Supplementary Table 18 online). Secreted proteases are applied in detergents, food applications and as biocatalysts in the production of fine chemicals. The A. niger genome encodes 198 proteins involved in proteolytic degradation including a variety of secreted aspartyl endoproteases (9), serine carboxypeptidases (10) and di- and tri-peptidylaminopeptidases (9). Compared to A. nidulans and A. oryzae, the number of putative secreted aminopeptidases (3) is low. The abundance of aspartyl endoproteases and carboxypeptidases (mostly active at low pH) and the low number of aminopeptidases (mostly active at neutral or high pH) matches the acidifying properties of A. niger as previously noted for A. oryzae8. Unlike the aminopeptidases in the other sequenced aspergilli, seven aminopeptidases of A. niger are predicted to be intracellular, indicating that at least part of the amino-terminal degradation of external proteins is taken care of by secreted di- and tri-peptidylpeptidases.
Secondary metabolism and safety
Among the secondary metabolites produced by filamentous fungi, mycotoxins are most relevant from a safety point of view39. A. niger contains several secondary metabolite clusters (Supplementary Table 19 online). The genome contains 17 nonribosomal peptide synthase (NRPS) and 34 polyketide synthase (PKS)-encoding genes, most of which are located in clusters. A. niger contains seven hybrid PKS-NRPS synthases whereas other sequenced filamentous fungi contain a single hybrid PKS-NRPS. The sometimes unusual domain patterns for these NRPS, PKS and hybrid genes are given in Supplementary Table 20 online. Of particular interest is the presence of a gene cluster that shares a number of genes with the Gibberella moniliformis gene cluster, which encodes the mycotoxin fumonisin (Supplementary Fig. 7)40. This cluster is absent from the genomes of other sequenced aspergilli such as A. fumigatus, A. oryzae and A. nidulans8,9,10.
A putative ochratoxin cluster was identified on the basis of a PKS fragment of A. ochraceus involved in ochratoxin biosynthesis41. Some A. niger strains have been reported to synthesize this toxin but little is known about the biosynthetic pathway42. In A. ochraceus, genes encoding chloroperoxidase, reductase, esterase, dehydratase and NRPS may be involved. Of these only an NRPS-like gene with the unusual domain structure ACPA is found in A. niger. Neither gene is expressed at a detectable level (Supplementary Table 7).
Siderophores are secondary metabolites involved in iron assimilation and storage43 and protection against oxidative stress. One clear example of a siderophore cluster was identified in the genome (Supplementary Table 19). The vast majority of the remaining nonribosomal peptide and polyketide synthases do not share orthologs with the other fungal genomes sequenced to date. In addition, no orthologs of genes for the biosynthesis of penicillin, prenylated alkaloids (fumitremorgins), clavines, gliotoxin, aflatoxin or known fungal terpenes (aristolochene and trichodiene) were found.
Genome sequencing and assembly.
A library with ∼20.7 × coverage of the A. niger CBS 513.88 genome was constructed in pBeloBAC 11 and characterized by end-sequencing and restriction digestion. Insert sizes of BAC clones ranged from <50 to >150 kb per clone. A total of 11,136 BAC clones were generated with an average insert size of 68 kb. We generated 18,014 BAC end-sequences with 504 bases average read length (phred20). The BAC end-sequencing success rate was 80.9%. Using the BAC walking approach, we selected 595 BAC clones for analysis, and 505 of those, representing the minimal tiling path, for shotgun sequencing. The minimal tiling set of sequenced BACs was confirmed by the high density map of BAC end-sequences (1 end per 2 kb). Repeated sequencing of the estimated 100 rDNA copies was avoided by not sequencing BACs that have end sequences matching rDNA. The 505 BACs selected have an average insert size of 76.8 kb and have been sequenced with a 7.5× coverage. The assembled BACs contained on average 2.14 contigs per BAC, corresponding with an average of 1.14 sequence gaps per BAC. Shotgun sequences were assembled with Phrap and edited after import into Gap4. BAC assemblies and raw data were visualized and edited using the STADEN package. The genome was assembled and logically joined using BAC clones physically bridging known gaps to form supercontigs with a total unique sequence size of 33.9 Mb. The estimated 5% of the genome not sequenced includes telomeric regions, additional rDNA repeats and small gaps. Full details of the genome sequencing and assembly can be found in Supplementary Methods online. CBS 513.88 can be obtained from Centraal Bureau Schimmelcultures (http://www.cbs.knaw.nl/).
Analysis and annotation of the genomic sequences of A. niger was performed with a combined automatic and manual approach. ORFs were predicted by a version of FGENESH44 trained on known A. niger and related organism sequences as well as other gene prediction algorithms (Supplementary Methods). ORFs were named after the organism (An), supercontig number (two digits) followed by g (gene) and a five-digit number matching the order of the ORFs on the contig. tRNA genes were identified as described in Supplementary Methods. For all ORFs identified, exhaustive automatic bioinformatic analysis with respect to function and structure of the respective protein was carried out using the PEDANT-ProTM software45. For each ORF the automatically predicted functional features were manually verified and the following features were manually annotated: gene model (based on Blast comparisons with known proteins), title, functional categories according to the MIPS functional catalog12 and EC numbers.
The manual annotation of the title involved (i) selection of a similar protein through Blast comparison and (ii) a description of the degree of similarity with this protein. If possible a similar protein with published experimental data was selected. The description of the degree of similarity was assigned as follows: questionable ORF, no Blast hit and questionable gene structure; no similarity, no Blast hit; weak similarity, Blast e-value >e-05; similarity, Blast e-value <e-05 and >e-20; strong similarity, Blast e-value <e-20; known protein, identical to known A. niger gene.
Manual FunCat and EC number assignments were based on the described function of the selected similar protein and took into account the degree of similarity as illustrated by the following example. An A. niger protein matching a well-known transcriptional activator was annotated in the manual FunCat as follows: weak similarity, 04.05 mRNA transcription; similarity, 04.05.01 mRNA synthesis; strong similarity, 04.05.01.04 transcriptional control; known protein, 04.05.01.04.01 transcriptional activator.
Orthologous genes were grouped using all-against-all, pairwise Blast similarity searches at the level of predicted proteins keeping reciprocally best-matching genes. Twenty fungal panorthologous genes encoding housekeeping functions and showing reciprocal best matches between all selected organisms were selected (Supplementary Table 3) and the predicted protein sequences were aligned with ClustalW46. Bidirectional Blast similarity searching using the Blastx and tBlastn algorithms was applied to find additional orthologs in the genome of A. clavatus. After deletion of ambiguously aligned regions, the protein sequences were concatenated (7,767 amino acid sites). Maximum likelihood phylogenetic analysis was performed with Tree Puzzle47 using the VT model48 and a gamma model of rate heterogeneity with alpha = 0.65. Reliable bootstrap values were obtained for all nodes of the tree except for the terminal nodes linking A. niger, A. oryzae and A. terreus, indicating their close relationship.
For phylogenetic analysis of protein families, the protein sequences were aligned first with ClustalW or ClustalX. Phylogenetic analyses were carried out in MEGA 2.1 or 3.1 using Maximum Likelihood, Neighbor Joining and Minimum Evolution49. Stability of clades was evaluated by 500 to 1,000 bootstrap rearrangements. Genome sequences used were obtained from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Genomes/) except for A. clavatus (http://msc.tigr.org/aspergillus/aspergillus_clavatus_nrrl_1/index.shtml) and A. terreus (http://www.broad.mit.edu/annotation/genome/aspergillus_terreus/).
Ortholog detection and synteny analysis.
To detect conservation of gene order, we obtained pairwise lists of orthologs using bidirectional Blast searching of the A. niger predicted coding sequence (CDS) and predicted CDS of each of A. fumigatus, A. nidulans and A. oryzae. Sequences having a length ≤100 amino acids were omitted to get high confidence e-values. Bidirectional best hits having an e-value <1e-10 in both directions were considered as orthologous gene pairs. Conservation of synteny was determined by comparing adjacent orthologous gene pairs allowing for inversions in gene order between species. If there was a match, the two orthologous gene pairs were considered to be part of a syntenous region. Such regions were extended by directional scanning along the A. niger genome. Gene order and inversions were also recorded.
A. niger genetic and physical map.
The genetic location of cloned genes was established using pulsed field electrophoresis followed by Southern blot analysis15. Parasexual analysis14 was used to determine the genetic location of cpcA by linkage of the phleomycin resistance to hisD4. The methods used to establish the physical map including supercontig orientation on chromosomes are described in the Supplementary Data.
Fermentation and transcriptional profiling.
A. niger was grown on defined medium with glucose as a carbon source and ammonia as nitrogen source using 20-liter submerged stirred (Rushton turbines) fermenters at controlled pH of 4.5 at 35 °C. Glucose feeding started after 24 h. Biomass samples for mRNA analysis were taken after 72 and 120 h. Sample treatment and Affymetrix array (DSM proprietary GeneChips) analysis is described in Supplementary Methods.
Reconstruction and analysis of the metabolic network.
A comprehensive literature search was carried out to identify reactions and/or enzymes that are present in A. niger, and this compilation was supplemented by information present in the enzyme databases BRENDA (http://www.brenda.uni-koeln.de/) and SwissProt (http://www.expasy.org/). The KEGG database (http://www.genome.ad.jp/kegg/metabolism.html) and the reconstructed metabolic network of S. cerevisiae26 were used to fill gaps in incomplete pathways. Finally, the information obtained through the annotation of all ORFs (see above) was compared to this metabolic framework and as many ORFs as possible were assigned to reactions. PSORT II (http://psort.ims.u-tokyo.ac.jp/form2.html) and TargetP (http://www.cbs.dtu.dk/services/TargetP/) were used for analysis of theoretical subcellular localization. Prediction of signal peptide cleavage sites was carried out with SignalP 2.0 (http://www.cbs.dtu.dk/services/SignalP/). Protein sequences were also analyzed with InterProScan (http://www.ebi.ac.uk/InterProScan/).
Carbohydrate-active enzyme classification.
The search for carbohydrate-active enzymes was performed according to the routine update strategy of the CAZy database (http://afmb.cnrs-mrs.fr/CAZY/). Sequences of the proteins in CAZy were cut into their constitutive modules (catalytic modules, carbohydrate-binding module (CBMs) and other noncatalytic modules) and the resulting fragments were assembled in a sequence library for Blast searches50. Each A. niger protein model was compared by Blast analysis against the library of around 50,000 individual modules. Models with an e-value <0.1 were manually analyzed to predict their function based on multiple sequence alignment using ClustalW and a search for conserved signatures/motifs characteristic of each family including the presence of the catalytic machinery. SignalP (http://www.cbs.dtu.dk/services/SignalP/) was used to detect the presence of possible signal sequences.
Genome data and availability.
Requests for materials.
A scalable vector graphic that plots all syntenous regions on the A. niger chromosomes, including relative orientation of the conserved synteny and blocks representing each gene, is available on request from H.J.K. (firstname.lastname@example.org).
Note: Supplementary information is available on the Nature Biotechnology website.
E.G.J.D., P.M.C., B.H. and F.M.K. wish to acknowledge the financial support from the European Commission (STREP FungWall grant, contract: LSHB - CT- 2004 - 511952) and the French Ministry of Research (program ACI-BCMS, Enzywall). The work of X.L, U.R, J.S. and A.-P.Z. was partly funded by the Sonderforschungsbereich 578 (SFB578) of the Deutsche Forschungsgemeinschaft, Germany. We acknowledge the Department of Energy, Joint Genome Institute, TIGR and the Broad Institute for allowing some comparative genome analysis. Part of this work was supported by SENTER (BTS project BTS00010, TSGE 3012). Array hybridizations were performed at the MicroArray Department (MAD) in Amsterdam. Ulrike Jacobi is acknowledged for assistance in transcriptome analysis. We thank Bea den Dekker for excellent organizational support.