Comparative genomics of Alexander Fleming’s original Penicillium isolate (IMI 15378) reveals sequence divergence of penicillin synthesis genes

Antibiotics were derived originally from wild organisms and therefore understanding how these compounds evolve among different lineages might help with the design of new antimicrobial drugs. We report the draft genome sequence of Alexander Fleming’s original fungal isolate behind the discovery of penicillin, now classified as Penicillium rubens Biourge (1923) (IMI 15378). We compare the structure of the genome and genes involved in penicillin synthesis with those in two ‘high producing’ industrial strains of P. rubens and the closely related species P. nalgiovense. The main effector genes for producing penicillin G (pcbAB, pcbC and penDE) show amino acid divergence between the Fleming strain and both industrial strains, whereas a suite of regulatory genes are conserved. Homologs of penicillin N effector genes cefD1 and cefD2 were also found and the latter displayed amino acid divergence between the Fleming strain and industrial strains. The draft assemblies contain several partial duplications of penicillin-pathway genes in all three P. rubens strains, to differing degrees, which we hypothesise might be involved in regulation of the pathway. The two industrial strains are identical in sequence across all effector and regulatory genes but differ in duplication of the pcbAB–pcbC–penDE complex and partial duplication of fragments of regulatory genes. We conclude that evolution in the wild encompassed both sequence changes of the effector genes and gene duplication, whereas human-mediated changes through mutagenesis and artificial selection led to duplication of the penicillin pathway genes.


Scientific RepoRtS
| (2020) 10:15705 | https://doi.org/10.1038/s41598-020-72584-5 www.nature.com/scientificreports/ beta-lactam producing fungi Acremonium chrysogenum, the genes cefD1 and cefD2 encode the isopenicillin N epimerase system and provide an alternative biosynthetic pathway to produce penicillin N from isopenicillin N, and cephalosporin C from penicillin N [30][31][32] . Several genes have been identified that play a role in regulating the pathway leading to penicillin G production, and evolved within beta-lactam producing fungi, described in more detail below 24,33 . We hypothesized that selection on antibiotic production should most likely result in changes in the coding sequence or copy number of effector genes at later stages of the pathway (i.e. penDE, which is unique to the production of penicillin G as opposed to other beta-lactams), or of the regulatory genes, rather than the upstream effector genes that generate pre-cursors used by multiple antibiotics.

Materials and methods
culturing, DnA extraction and sequencing. The fungus Penicillium rubens Biourge (1923) 34 IMI 15378 (= ATCC 8537; NRRL 824; CBS 205.57) was obtained from the CABI IMI culture collection. As part of a separate experiment (not reported here), replicates of fungus were grown for 11 weeks at 20 °C on petri dishes with LB Lennox agar media with addition of 20 g/L of sucrose. Fungus from each of 6 treatments was then cultured in LB Lennox Broth with 20 g/L of sucrose at room temperature for a week prior to the DNA extraction. For each of the six treatments, around 100 mg of washed mycelium was ground under liquid nitrogen and DNA was extracted using the DNeasy Plant Mini Kit (Qiagen). DNA libraries were prepared with an Illumina TruSeq PCR-free kit at the Department of Biochemistry, University of Cambridge, and sequenced with Illumina MiSeq v2 technology with 2 × 150 paired-end sequencing and 350 bp insert size. Separate library preparations and sequencing were performed for the 6 separate extractions, but subsequent results indicated no changes had accrued among the treatments, so reads were pooled for the assembly and analyses presented here.
penicillin pathway genes. We searched for the pcbAB, pcbC and penDE genes in each genome using BLAST, and for a paralog of penDE that was first discovered in the Wisconsin 54-1255 genome 17,38 and functionally characterized 39 . Query sequences are listed in Table S2. We also searched for cefD1 and cefD2 genes, which catalyse an alternative pathway for converting isopenicillin N to penicillin N rather than penicillin G, and were previously discovered in the Wisconsin 54-1255 genome 17 and shown to be expressed. In addition, we searched for a suite of genes identified as playing a regulatory role in penicillin production: anBH1, the three subunits of the transcription factor ancF (hapB, hapC, and hapE), pacC, and veA. Functions of these genes are summarised in Fig. 1. In brief, PacC is a wide domain pH and carbon source dependent regulator, which upregulates the pcbAB and pcbC genes in P. chrysogenum in an alkaline environment and/or when the fungus is grown on a depleted carbon source 40,41 . VeA is a wide domain light dependent regulator in P. chrysogenum, A. chrysogenum and Aspergillus nidulans. It is involved in upregulation of pcbAB and downregulation of pcbC 26,42,43 . The transcription factor ancF consists of 3 subunits hapB, hapC and hapE and is responsible for the downregulation of the gene pcbAB and upregulation of the genes pcbC and penDE in P. chrysogenum [44][45][46] . The anBH1 gene produces the basic-region helix loop helix protein (bHLH) that binds to the promoter region upstream of penDE and downregulates the transcription of penDE 26,47 .
For each gene, we generated an alignment (including multiple copies where present) using MAFFT 1.3.6 48 and reconstructed a phylogenetic tree by maximum likelihood using the GTR + invgamma model in PHYML 2.2.3 49 , implemented in Geneious 9.1.8 (Biomatters Ltd, Auckland, New Zealand, https ://www.genei ous.com). We tested for evidence of positive selection among strains by running codon models in PAML 4 50 for genes Table 1. Voucher and repository information for the strains used in this study. a Originally misnamed as P. chrysogenum but confirmed by molecular data to belong to the P. rubens clade 3,4 . www.nature.com/scientificreports/ displaying variation: the null model of a single dN/dS ratio across codons, referred to as ω; the neutral model with a fraction p 1 of codons that are under purifying selection (dN/dS < 1; ω 1 ) and a fraction p 2 evolving neutrally (dN/dS = 1; ω 2 ); and the positive selection model including an additional fraction p 3 codons evolving positively (dN/dS > 1; ω 3 ). We used the Akaike Information Criterion to select the best model while penalizing for differences in the number of free parameters (null = 1, neutral = 2, positive = 4). The test is conservative because it requires substantial changes in amino acids to detect positive selection, whereas in reality a single amino acid change could underlie functional divergence. In addition to comparing best models, we plotted the average dN/ dS ratio across the genome (ω) as a measure of degree of amino acid conservation among strains, with low values indicating stronger overall purifying selection.

Results
Genome assembly of P. rubens (IMI 15378). A total of 2.86 Gb trimmed data (82.3% raw; Table S1) were used to generate an initial assembly comprising 274 scaffolds spanning 30.51 Mb. Screens for potential contamination resulted in the removal of 6 scaffolds (15.7 kb) that were not marked as from the genus Penicillium based on sequence similarity to NCBI 'nt' and UniRef90 public repositories ( Fig S1). Scaffolds less than 500 bp in length were also discarded, resulting in a final draft assembly of 101 scaffolds spanning 30.46 Mb total length (~ 94X coverage), with an N50 scaffold length of 1.62 Mb. Assembly quality, based on the presence of core eukaryotic and fungal genes (BUSCO), indicated a high level of gene completeness (99.0% and 99.3% respectively) and a low level of duplication (0.3% and 1.0% respectively), suggesting a haploid genome assembly in common with the other strains ( Table 2). The assembly is marginally smaller than the assemblies of the two industrial strains (32.4 and 32.2 Mb respectively) and has similar GC content (48.9% vs 49% for both industrial strains). The final genome assembly for P. rubens IMI 15378 (nPRUBv1) has been deposited at DDBJ/ENA/GenBank under the accession GCA_902636305.1 (CACPRF010000001-CACPRF010000101).
Structural comparison among penicillium genomes. The genome of Fleming's P. rubens (IMI 15378) is broadly colinear with the P2niaD18 genome that was assembled to whole chromosome level, with relatively few cases of translocation or transversions (Fig. 2). More rearrangements are apparent between the Wisconsin 54-1255 strain and the P2niaD18 genome, perhaps indicative of structural mutations caused by mutagenesis during the improvement process as previously reported 20  The structure of the penicillin effector genes is conserved across species and always falls into the well characterized cluster of pcbAB, pcbC and penDE genes (Fig. 3). The P2niaD18 genome alone has a single tandem duplication of the whole cluster, rather than multiple complete or partial tandem duplications of the cluster present in other industrial penicillium strains 20,51,52 . In addition to the main loci, a partial duplicate exhibiting a match to the final 123 bp of pcbAB but with three amino acid substitutions is found in a non-coding region in the two industrial strains, 2704 bp downstream of pcbAB (Fig. 3, Table S3). This fragment, labelled B1 in the previous analysis of Wisconsin 54-1255 by Fierro et al. 52 is found in both tandem duplicates in P2niaD18, but absent from the Fleming genome (as confirmed by mapping raw reads of Fleming strain onto the Wisconsin 54-1255 genome, Fig. S2). We speculate that this might play a functional role in the region, for example in regulating expression of pcbAB, but it might simply be a neutral or deleterious side-effect of the mutagenesis during improvement of those strains.
Other putative beta-lactam effector genes were found in the genomes of all the four strains compared. All four strains contained the paralog of penDE first identified in the Wisconsin 54-1255 genome 17 . The cefD1 region Table 2. Genome assembly metrics for P. rubens (IMI 15378) and the published genomes. BUSCO notation: C, complete BUSCOs; S, complete and single-copy BUSCOs, D, complete and duplicated BUSCOs; F, fragmented BUSCOs; M, missing BUSCOs. BLAST confirmed that these are not repeated domains or regions found elsewhere in the genome but represent single partial duplications similar to those observed in pcbAB. Two of these duplicate fragments were found only in the three P. rubens strains and two only in P. nalgiovense. The cefD2 region was not duplicated but recovered as two long sections and one short section in all three genomes, indicating the absence of match across the fulllength region found in the P. arizonense gene used for the query. The genes involved in regulation of penicillin production were scattered across the genome of each strain (Table S3). The hapB gene in the ancF transcription factor complex displays a partial duplicated fragment of 95 bp in the two industrial strains, which is lacking in the other two genomes. A clear hapE match was missing for P. nalgiovense. All other regulatory genes are present in single copy in all four genomes.

Sequence divergence of penicillin effector and regulatory genes. The two industrial strains,
P2niaD18 and Wisconsin 54-1255, were identical at the sequence level for all the focal genes and therefore for subsequent analyses only sequences from Wisconsin 54-1255 were used to represent the American isolate of P. rubens. In contrast, penicillin-pathway genes have diverged in amino acid sequence between the Fleming strain and the US strains. All three effector genes encoding enzymes in the penicillin G pathway have diverged, but pcbAB and penDE showed the highest rates of amino acid divergence relative to silent changes whereas pcbC was strongly conserved (Fig. 4, Table S4). The level of divergence in pcbAB is unexpected since this gene functions to produce the initial precursor in the pathway, which is shared in the production of other beta-lactams. The homologs of cefD1 and cefD2 genes were found in all genomes of P. rubens strains. This was unexpected as these genes are involved in the synthesis of the cephalosporin intermediate penicillin N in A. chrysogenum and are not known to have a functional role in P. rubens 17,53,54 . The penicillin N effector gene cefD2 also showed a high level of amino acid divergence whereas cefD1 was more strongly conserved than other effector genes. The best sequence model for the effector genes plus hapB was a model with most codons being under constraint (dN/ dS < < 1) but with a significant proportion of codons being unconstrained (dN/dS = 1). There was no sequence divergence between American and British P. rubens isolates in the penDE paralog, pacC, ancF (hapB, C and E) or veA. To further investigate possible regulatory changes, we looked for sequence variation within transcription factor binding sites within the intergenic region between pcbAB and pcbC, which is a bidirectional promotor region for these genes. Among 28 binding sites previously identified in P. chrysogenum 55 , all were found in the Fleming genome, and just one site was lost in both industrial strains (GATA to GGTA mutation, Table S5). Thus, the divergence of known binding sites is low, similar to that seen for regulatory proteins.

Discussion
Nearly a century since Alexander Fleming discovered the action of penicillin in bacterial cultures contaminated by P. rubens, we report the first draft genome sequence of his original strain. Very soon after the original discovery and isolation of penicillin, a second wild isolate of P. rubens from the USA was employed for future industrial manufacture owing to its greater rate of penicillin production 15 . Consequently, two of the strains derived from this isolate have been the focus of previous whole genome sequencing within the P. rubens clade 16,17,20 . We compared these genomes with each other and a third, more distantly related genome of P. nalgiovense 56 .
Comparison of the two USA strains provides insights into the industrial mutagenesis and artificial selection process 16 , which was originally performed by selecting phenotypically useful mutants without knowledge of the underlying genomic basis 15 . There were no amino acid differences at any genes encoding the enzymes in the penicillin pathway and regulatory genes. Instead, there was evidence for structural rearrangement across the genome, including tandem duplication of the pcbAB-pcbC-penDE cluster in P2niaD18, which has previously been studied in these and other industrial strains 51,52 . This fits with the type of mutagenesis and artificial selection used for this process. Experimental work showed that tandem duplication of the pcbAB-pcbC-penDE cluster does not directly increase penicillin production over short time periods-a strain of P2niaD18 that was modified to lose one copy did not produce significantly less penicillin over a 96-h assay period 57 . Substantial www.nature.com/scientificreports/ copy number multiplication of the region among industrial strains still seems to implicate gene duplication in penicillin production, but perhaps only under specific growth conditions or over longer periods 18,51 . Another plausible source of variation would be changes in regulatory regions, but experimental evidence indicates that such variation is unlikely to contribute to increased penicillin production 18 . Comparison between the UK and US genomes sheds light on both evolved differences between the wild progenitors of the strains, and potential initial changes in the domestication steps prior to the divergence of P2niaD18 and Wisconsin 54-1255. One structural difference shared by the US genomes was the partial duplication of the final portion of the pcbAB gene. Read mapping confirmed that this region is missing from the Fleming genome and not just absent due to assembly artefacts (Fig. S2). Partial duplication and inversion have been documented previously at the ends of the amplified region containing the penicillin synthesis genes for Wisconsin 54-1255 52 . Furthermore, partial duplication has been found to play a role in generating novel diversity previously, e.g., in the case of pathogen resistance in barley 58 , and could play a role in gene regulation. Without further sequencing, we cannot be certain whether this change occurred in the wild progenitor of the US strains or during initial stages of domestication. Because of the nature of these changes in relation to the predicted effects of mutagenesis, however, and the fact that further such differences arose between P2niaD18 and Wisconsin 54-1255, it seems plausible that shared structural differences of the two industrial strains from the Fleming genome occurred during their initial shared history of mutagenesis prior to their separation. No sequence divergence was observed between the two US strains in any of the genes involved in penicillin G production and regulation of the pathway: mutagenesis and selection for improved function resulted in major structural changes but no substitutions at these loci.
In contrast, penicillin-pathway enzymes have diverged in amino acid sequence between the Fleming strain and the US strains, especially pcbAB, penDE and cefD2. While it is possible in principle that these changes were caused by mutagenesis during domestication of the US strains, we think that this is unlikely: subsequent rounds of the same process led to no sequence divergence between the US strains, and the numbers of substitutions involved would seem more commensurate with longer periods of time elapsing. Instead, these differences are likely to have accrued during evolutionary divergence of the UK and US strains of P. rubens in the wild.
Although the level of divergence did not meet the statistical criteria for detecting significant evidence of positive selection, a low level of constraint on protein sequence of these genes could still indicate a history of divergent selection at a subset of codons. Alternatively, it could indicate that the function of these proteins is less dependent on amino acid identity at several sites than is the case for the other genes. In A. nidulans, the aatA gene (an ortholog of penDE) encodes the enzyme isopenicillin N acyltransferase 26,59 . It has been found that disruption of this gene does not disrupt penicillin production in A. nidulans. A paralog of aatA, aatB compensates for this as it encodes a homolog of isopenicillin N acyltransferase 59 . It should be noted that the isopenicillin N acyltransferase encoded by aatA is only 55.2% similar to its homolog encoded by aatB and the two genes themselves are only 58% similar 59 . www.nature.com/scientificreports/ Additionally, the liquid chromatography-mass spectrometry (LC-MS) data for penicillin compounds synthesized by either of the genes indicate unexplained significant peaks in proximity to the peaks representing standard penicillin V or penicillin G compounds synthesized by these genes. These unexplained peaks could represent penicillin analogues synthesized by aatB and aatA. It would be worthwhile to investigate further how the differences in penicillin effector genes translate into altered function of the enzymes encoded, such as variation in the substrate specificity or efficiency of the enzymes 60 . Such variation in specificity of the enzymes could result in synthesis of penicillin G analogues. Furthermore, presence of a penDE paralog, and cefD1 and cefD2 homologs in all the genomes compared in this study suggest the possibility that these genes encode homologs of isopenicillin N acyltransferase and isopenicillin N epimerase respectively 17,39,53,61 . These enzymes could potentially synthesize analogues of penicillin G and penicillin N. Other beta-lactam gene variants such as homologs of the gene encoding 7-alpha-cephemmethoxylase subunit, cmcJ, have also been identified in the genome of P. chysogenum 17 . Studies suggest that many of these gene variants are expressed but further work is needed to elucidate the functional importance of these genes, which is currently unclear 17,39,54 .
The biosynthesis of penicillin G in P. chrysogenum and P. rubens consists of a simple three gene pathway, but in certain bacteria such as S. clavuligerus, as many as twelve genes can be involved in the synthesis of beta-lactams such as cephamycin C 26,62 . Much of what is known regarding the evolution of diversity of natural antibiotics stems from the concept of rearrangement of genes in an existing biosynthetic gene cluster, or by addition of novel genes to existing clusters via processes such as horizontal gene transfer 12,63 . Our analyses indicate that individual genes of beta-lactam biosynthetic pathways can themselves vary between species. Evidence indicates that many penicillin producing species such as P. chrysogenum are genetically diverse, and allelic variation within wild P. chrysogenum populations can impact penicillin production within these populations 64,65 . Thus, it is plausible that sequence variation in the genomes that we describe could account for the production of novel penicillin analogues. Subtle variation in chemical structure of antibiotics has been identified for other antibiotics such as antimycins produced by Streptomyces 63,66,67 . Future work to sample variation more widely in P. rubens and measure the impacts of variation on chemical structure of penicillin compounds is needed to distinguish these alternatives.
In conclusion, our results provide preliminary evidence that genes involved in the production of penicillin display relatively high rates of amino acid divergence between populations, as predicted if antibiotics evolve in an arms race with antagonistic microbes. Moreover, the results indicate that natural changes involving point mutation and amino acid substitutions were not fully explored by the classical industrial mutagenesis approach, which instead produced larger structural rearrangements. Thus, the mutagenesis approach employed previously may have missed some solutions for optimizing penicillin design compared to natural selection in the wild, especially in the context of robustness to evolving antibiotic resistance. Future approaches could use solutions explored by nature as a template for the development of novel antibiotic varieties. www.nature.com/scientificreports/