Main

Coronatine 3 (COR) is an important phytotoxin produced by bacterial plant pathogens; it is composed of the polyketide coronafacic acid 1 (CFA), conjugated via an amide bond to coronamic acid 2 (CMA), an unusual cyclopropyl amino acid (Fig. 1a)1,2. COR is a structural mimic of JA-Ile (6), a ubiquitous plant hormone that is essential for plant development and defence3. The biologically active stereoisomer is (3R,7S)-JA-Ile, but the C7 stereocentre rapidly epimerizes at physiological pH to the more stable but inactive trans (3R,7R) diastereoisomer, which modulates its activity4. In contrast, COR is configurationally stable, endowing it with increased potency and longevity. The conjugation of JA and l-Ile in plants is catalysed by the adenosine triphosphate (ATP)-dependent ligase Jar1 (Fig. 1b), a member of the ANL superfamily5. ANL enzymes generate acyl-adenylate (acyl-AMP) intermediates that undergo substitution with various nucleophiles. For example, acyl-CoA synthetases (ACSs) are common ANL enzymes generating thioesters, which can be coupled to an amine by a secondary N-acyltransferase. Jar1 is an amide bond synthetase (ABS), a rarer subclass of the ANL superfamily which accept amine nucleophiles directly without requiring an additional partner enzyme (Supplementary Fig. 1)5,6.

Fig. 1: Biosynthesis of coronatine (COR) and jasmonyl-l-isoleucine (JA-Ile).
figure 1

a, Bacterial CfaL enzymes are predicted to ligate coronafacic acid (CFA) 1 with coronamic acid (CMA) 2 or l-amino acids to generate phytotoxins, including COR 3 and CFA-Ile 4. b, In plants, the enzyme Jar1 ligates jasmonic acid (JA) and l-isoleucine to produce JA-Ile epimers (3R,7S)-6 and (3R,7S)-6. c, Polyketide synthase (PKS) assembly of 1 from succinic semialdehyde. PKS consists of acyl-carrier protein (ACP), acyl-transferase (AT), keto-synthase (KS), dehydratase (DH), enoyl-reductase (ER), keto-reductase (KR) and thioesterase (TE) domains. d, Nonribosomal peptide synthetase (NRPS)-mediated biosynthesis of 2 in P. syringae. NRPS consists of adenylation (A), thiolation (T) and thioesterase (TE) domains.

CFA is assembled by a type I polyketide synthase via the cyclization of a β-ketothioester intermediate (Fig. 1c)1,7,8,9. The biosynthesis of CMA occurs via a nonribosomal peptide synthetase (NRPS)-mediated cryptic chlorination and cyclization (Fig. 1d)10,11,12. A putative ligase (CfaL) within the Pseudomonas syringae COR biosynthetic gene cluster is predicted to couple CFA and CMA to form COR (Fig. 1a, Supplementary Fig. 2a and Supplementary Table 1). Previous attempts to characterize this ligase have been unsuccessful13. Other plant pathogens possess a COR-like biosynthetic gene cluster14,15, including Streptomyces scabies which has a putative CfaL and CFA biosynthetic genes, but no CMA pathway (Supplementary Fig. 2b and Supplementary Table 1)15. Consequently, this strain produces predominantly CFA-l-Ile 4, along with smaller quantities of l-Val and l-allo-Ile adducts, which have also been detected from P. syringae16,17,18. As CFA must be coupled to an amino acid to elicit biological activity, we sought to characterize the key CfaL enzymes to enable new routes to COR-like phytotoxins as potential herbicides19. We were also interested in exploring the relationship between the bacterial CfaL and the functionally related plant ligase Jar1.

As well as being fundamental in nature, amide formation is one of the most widely used synthetic transformations. Although coupling acids and amines is relatively simple, it often requires three steps, protect–couple–deprotect, to install each amide. Stoichiometric quantities of expensive and deleterious coupling reagents are typically required and purification can be problematic20. While some progress has been made in the development of chemocatalytic methods for amide synthesis, these have not been widely adopted21,22,23,24,25,26,27. Consequently, there is interest in the development of enzymatic alternatives6,20,28,29,30,31. In this work, we characterize CfaL ligases and demonstrate how they are highly versatile biocatalysts for the synthesis of ubiquitous amides. Additionally, using structure-guided mutagenesis, we generate improved ligases providing more sustainable, alternative routes for production of pharmaceuticals, agrochemicals and other valuable materials.

Characterization of the CfaL family

Overproduction of the CfaL from P. syringae (PsCfaL) in Escherichia coli resulted in only trace amounts of active enzyme. However, assays demonstrated that PsCfaL catalyses the ATP-dependent coupling of l-isoleucine and CFA, obtained from acid hydrolysis of coronatine, confirming the function of CfaL for the first time (Supplementary Figs. 3 and 4). The low quantity of PsCfaL available prevented full characterization, and so alternative CfaL homologues were explored. In addition to the putative S. scabies ligase (SsCfaL), other candidates located within putative COR-like clusters were selected from BLAST analysis (Supplementary Fig. 2). Of these, PbCfaL from Pectobacterium brasiliense and AlCfaL from Azospirillum lipoferum were chosen for characterization, as they are predicted to be more amenable to crystallization (Supplementary Table 2). While P. brasiliense is a well-known plant pathogen32,33, A. lipoferum is a root-dwelling, nitrogen-fixing plant symbiont that is not known to produce coronatine34.

SsCfaL was overproduced in E. coli (Supplementary Fig. 5) and assays with synthetic (±)-CFA19, l-isoleucine and ATP showed the direct formation of CFA-l-Ile 4 via a CFA-AMP intermediate (Supplementary Fig. 6), confirming CfaL is an ABS enzyme. In addition, SsCfaL also accepted the aromatic CFA variant 7 which when coupled to l-Ile forms coronalone, a simplified synthetic COR analogue with promising herbicidal activity35. Given adenylation occurs in the absence of amine substrate, the rate of adenylation can be measured in isolation (Extended Data Table 1 and Supplementary Fig. 7). The rate of adenylation (kcat) was greatest with (±)-CFA but the aromatic analogue 7 was found to have a lower Michaelis constant, Km. Both (3R,7R)- and (3S,7S)-enantiomers of trans-JA 5 were also accepted, albeit at a lower level than CFA or 7, with a preference towards the natural (3R,7R)-5 stereoisomer. This indicates that, despite low amino acid sequence similarity (16%), the bacterial SsCfaL and plant Jar1 both catalyse the ligation of JA with l-Ile (Supplementary Fig. 8). Reactions with deactivated enzyme confirmed that the adenylation of 5 and the subsequent reaction with l-Ile are both enzyme-catalysed. Samples of trans-JA (Extended Data Table 1) contain a minor amount of the less stable cis epimer, owing to facile C7-epimerization (Fig. 1b)4. To explore the stereoselectivity of SsCfaL further, all four stereoisomers of configurationally stable 7-methyl-jasmonic acid were synthesized (Supplementary Fig. 9). Initially the trans and cis diastereoisomers of 7-methyl-jasmonic acid were separated and tested as a racemic mixture. However, these were poor substrates for SsCfaL, hence the resolution of all four stereoisomers was not carried out. The selectivity of SsCfaL was further tested by incubating CFA 1 or (±)-jasmonic acid 5 with 21 proteinogenic amino acids, resulting in a wide range of amino acid conjugates (Supplementary Figs. 10 and 11). Hydrophobic amino acids such as l-isoleucine and l-valine were preferred by SsCfaL which reflects the COR-like metabolites isolated from S. scabies16. No activity was seen with d-amino acids, or with primary amines and dipeptides (Supplementary Fig. 12). AlCfaL and PbCfaL, expressed from codon optimized synthetic genes, were seen to be functionally similar to SsCfaL, although with lower activities (Supplementary Fig. 13).

The structure of a CfaL ligase

Crystallography trials revealed that only PbCfaL yielded crystals of sufficient quality for structural studies. A PbCfaL structure in the adenylation conformation was solved to 2 Å resolution (Fig. 2), which is consistent with the ANL superfamily. Despite sharing low sequence identity (<20%), PbCfaL showed substantial structural similarity (>70%) to several other ANL ligases from a variety of organisms, including bacterial benzoate CoA ligases and firefly luciferases (Fig. 2b, Supplementary Table 3). By contrast, PbCfaL shares very little structural similarity with the catalytically equivalent Jar1 (30%), suggesting that the two enzymes evolved independently (Supplementary Fig. 14). Plants are known to possess other ACSs that do share high structural homology with PbCfaL (Supplementary Table 3). However, Jar1 and related plant acyl-AMP forming enzymes that conjugate salicylate or indole-6-acetic acid (IAA) with amino acids appear to have evolved separately and specifically for plant hormone signalling36.

Fig. 2: Structure of PbCfaL.
figure 2

a, Main image, X-ray crystal structure of PbCfaL (2 Å) in the ‘open’ or ‘adenylation’ conformation (PDB ID, 7A9I). PbCfaL has a large N-terminal region (residues 1–403), shown in blue, and a flexible C-terminal region (residues 403–516), shown in red. The boxed region and magnified inset highlight the active site region, which is shown with 7 co-crystallized; labelled residues are conserved between all four CfaLs in this study. 7 lies 3.6 Å from a conserved tryptophan residue (W220), which probably helps to align the substrate via π–π stacking interactions. b, PbCfaL superimposed onto McbA (gold; PDB ID, 6SQ8)29, the closest structural homologue to PbCfaL. In the ‘adenylation’ state, the two structures show high levels of similarity. In this state the carboxylic acid binding site (shown) is located between the N-terminal and the flexible C-terminal regions and is solvent accessible. c, PbCfaL superimposed on McbA in the ‘closed’ state (also referred to as the ‘thiolation’ state in related ACS enzymes). Like all ANLs, the C-terminal region of McbA undergoes a large rotation (direction indicated by dashed red arrow) to lie on top of the carboxylic acid binding site, trapping the adenylated intermediate before amine attack. The more rigid N-terminal region does not substantially change conformation. We would expect the C-terminal region of PbCfaL to undergo a similar rotation during catalysis. Structural alignment was performed with Chimera (version 1.14) MatchMaker.

The structure of PbCfaL is composed of a large N-terminal domain (residues 1–403) and a smaller, flexible C-terminal domain (residues 403–516, Fig. 2). As with other members of the ANL superfamily, it is likely that the C-terminal domain undergoes a large rotation following acyl-adenylate formation to close off the acyl binding pocket and form the amino acid binding site (closed conformation) (Fig. 2). Co-crystallography of PbCfaL with 7 revealed that the extremely solvent-accessible acyl binding pocket lies between the two domains (Fig. 2, Supplementary Fig. 15). Sequence alignment between the CfaLs in this study showed a small number of conserved residues (Supplementary Fig. 16), with only W220 likely to make direct contact with 7, probably aligning the carboxylic acid via π–π stacking, explaining the higher binding affinity of 7 versus CFA (Fig. 2). Other conserved residues around 7 probably define the width and depth of the binding pocket. When aligned with the most structurally similar proteins from the Protein Data Bank (PDB; Supplementary Fig. 17) there are few conserved sequences, the most similarity occurring in the ATP binding SSGTTG motif (residues 168–173)37. Despite many attempts, determination of a PbCfaL structure in the closed conformation with AMP and amino acid bound could not be achieved.

CfaL substrate scope and engineering

We next explored if the synthetic scope of CfaL enzymes could be extended towards other amide targets. The CfaL enzymes were found to possess extremely broad substrate tolerance, accepting a variety of aryl and heteroaryl carboxylic acids 829 as well as aliphatic carboxylic acids 3046, including several chiral compounds 3946 (Fig. 3, Extended Data Table 2). Several acyl-donor substrates possessed other reactive functionalities, such as electrophilic ketones (11, 35, 43, 45), alkenes (33), as well as nucleophilic alcohol (40, 44) or amine groups (13, 17, 18, 19, 46), which would require protecting for traditional coupling chemistries, but do not interfere with the enzymatic ligation to amino acid acceptor substrates. In addition to proteinogenic amino acids (Fig. 4a, Extended Data Table 3), CfaL enzymes also accept a wide range of non-proteinogenic amino acids 4761, including common pharmaceutical building blocks, with a preference for hydrophobic amino acids (Fig. 4b, c, Extended Data Table 4). Although polar, particularly charged, amino acids are not well accepted by CfaL, both l-2,4-diaminobutyrate 47 and l-ornithine 48 can be selectively acylated at the α-amino group, obviating the need for protection of the side-chain amino group (Fig. 4c, Supplementary Fig. 17).

Fig. 3: Carboxylic acid substrate scope.
figure 3

a, Diverse structures of carboxylic acid (donor) substrates assayed with l-Ile and CfaL enzymes. b, Percentage conversion for ligation of carboxylic acids 846 with l-Ile catalysed by CfaL enzymes (column headings). Assays were carried out with wild-type and engineered CfaL enzymes (25 μM), carboxylic acids 846 (2 mM) and l-Ile (5 mM). Conversion to amide products was determined by HPLC analysis following 20 h incubation. Actual conversion values and errors can be found in Extended Data Table 2.

Fig. 4: Amino acid substrate scope.
figure 4

a, Percentage conversion for ligation of carboxylic acid 9 with acceptor proteinogenic amino acids (rows). b, Percentage conversion for ligation of 9 with acceptor non-proteinogenic amino acids. a-Ile = allo-isoleucine. c, Structures of non-proteinogenic amino acids. d, Reversed-phase (RP)-HPLC trace of ligation product of l-Dab 47 (green) and 9 (m-methylbenzoate, red) catalysed by SsCfaL, compared to HPLC traces of synthesized standards of the two possible products 62 and 63. Product of the enzymatic reaction (bottom trace) shows selective acylation of the α-amino group to give amide 62. All assays were carried out with wild-type and engineered CfaL enzymes (5 μM), carboxylic acid 9 (1 mM) and amino acids (2 mM). Conversion to amide products was determined by RP-HPLC analysis following 20 h incubation. Actual conversion values and errors can be found in Extended Data Tables 3 and 4.

In general, SsCfaL and AlCfaL both performed better than PbCfaL (Figs. 3 and 4, Extended Data Tables 24), which was found to be less thermally stable and frequently precipitated during the reaction timescale (Extended Data Fig. 1). On the basis of our crystallographic studies, we sought to improve the activity and stability of PbCfaL via rational, structure-guided mutagenesis. Sequence comparison between the four CfaL enzymes (Supplementary Fig. 16) identified few obvious distinctions. However, one noticeable difference was found on the flexible hinge-region linking the N- and C-terminal domains (at position 395, Extended Data Fig. 2a). This position is solvent-exposed and likely to be involved in the conformational changes required to shift between the adenylation (open) and amidation (closed) states of CfaL. The large, charged arginine residue that is located in this position of PbCfaL is orientated out from the enzyme, while the same position in the other CfaLs and ANLs included in the sequence alignment (Supplementary Fig. 16) is occupied by a small and uncharged glycine. A PbCfaL(R395G) mutant showed increased activity against the panel of carboxylic acids and some amino acid substrates (Figs. 3 and 4, Extended Data Tables 24). An X-ray crystal structure of this mutant was determined, which revealed no overall structural changes (Extended Data Fig. 2a). However, the melting temperature (Tm) of PbCfaL(R395G) increased by 5 °C relative to the wild type, suggesting that the replacement of this solvent-accessible, charged R395 is beneficial for stability (Extended Data Fig. 1).

A subsequent double mutant, PbCfaL(R395G/A294P), showed a further increased Tm and slightly improved activity (Figs. 3 and 4, Extended Data Table 24). The location of this second mutation is within a highly conserved ATP binding loop (G289–L297) that is significantly larger in PbCfaL than in other related structures, and which may partially occlude the ATP binding site (Extended Data Fig. 2b). The proline found at this location in SsCfaL, PbCfaL and several other structurally similar ligases may aid in rotating this loop out of the binding site (Supplementary Fig. 16). Using these two PbCfaL mutants, which no longer precipitate during the reaction, we were able to substantially improve the conversions of both the panel of carboxylic acid and amino acid substrates (Figs. 3 and 4), demonstrating that minimal structure-guided mutagenesis can be used to engineer improved CfaL variants.

Synthetic applications of CfaL

To demonstrate the synthetic utility of CfaL, we sought to establish preparative-scale ligation reactions. Accordingly, conditions were optimized for the ligation of carboxylic acid 10 and l-Ile (Fig. 5a). Reaction of 10 (at 15 mM concentration) with PbCfaL(R395G/A294P) cell free lysate afforded amide 64 in near quantitative conversion as determined by high-performance liquid chromatography (HPLC). The reaction mixture was subjected to a simple solvent extraction, providing 1.48 g of crude 64 from 400 ml of reaction mixture, which would be sufficiently pure (>92% purity by NMR, Extended Data Fig. 3) for further synthetic derivatization. Purification of the extract by column chromatography provided 1.37 g of pure 64 in 87% isolated yield (Fig. 5a). To avoid the use of stoichiometric quantities of the expensive co-factor ATP, we repeated the ligation of 10 and l-Ile at the same scale, omitting ATP and instead introducing an ATP recycling system consisting of a polyphosphate (PolyP) kinase enzyme (CHU)38 and an inexpensive PolyP phosphate donor (Extended Data Fig. 3). Although the isolated yield was lower in this case (52%), there is further scope for optimization. While CfaL cell lysate shows good activity for up to 12 h, we sought to improve enzyme stability/longevity through immobilization of CfaL in the form of a cross-linked enzyme aggregate (CLEA)39. PbCfaL(R395G/A294P) CLEAs were shown to retain activity over an extended period of five days, and could be isolated and recycled in five sequential ligation reactions (Extended Data Fig. 4a). Purified PbCfaL(R395G/A294P) was also shown to tolerate several solvents, including MeOH, ethylene glycol and the widely used ‘green’ solvent 2-methylTHF (Extended Data Fig. 4b).

Fig. 5: Synthetic potential of CfaL enzymes.
figure 5

a, Amides including pharmaceutical scaffolds synthesized by CfaL enzymes. b, Comparison of percentage conversion for CfaL enzymes in the synthesis of 6570. c, Kinetic resolution of racemic carboxylic acids (donor). Absolute configuration and diastereoisomeric ratios (d.r.) were determined by RP-HPLC using synthetic standards. Inset, the HPLC chromatogram for (S)-71 formed in the kinetic resolution of racemic ibuprofen (41) with l-Ile and AlCfaL (E = 94). d, Comparison of the enantioselectivities of the different enzymes. Values of E = 15–30 are considered moderate–good, E > 30 are excellent43. For conversions <30% the calculation of E is unreliable, so the values were not determined (ND). e, Kinetic resolution of racemic amino acid (acceptor) 57 (2 mM) with acid 9 (1 mM) and AlCfaL (5 μM) (E > 200) following 20 h incubation. The yield reported is based on 9 which equates to a yield of 33% based on 57 (2 equiv. used). Inset, chiral HPLC analysis of the amide product showing a single enantiomer, (S)-77. Enantiomeric ratio (e.r.) values determined by chiral HPLC. aIsolated yield preparative-scale synthesis of 64 with PbCfaL(R395G/A294P) lysate 10 (15 mM), l-Ile (45 mM) and ATP (36 mM) incubated for 24 h. bIsolated yields of about 100-mg-scale reactions catalysed by SsCfaL cell lysate with carboxylic acid (5 mM), amine (15 mM) and ATP (15 mM) following 24 h incubation. cConversions determined from HPLC peak area ratios, following assays including CfaL enzymes (25 μM), carboxylic acids (2 mM) and amino acid (6 mM) incubated for 20 h. dE value was calculated from average d.r. or e.r. values, as described previously43. Percentage conversions and d.r. values represent means where n = 3, error denotes s.d.

To further demonstrate the synthetic potential of CfaL, a series of pharmaceutical-relevant scaffolds were prepared in excellent yields (Fig. 5a, b and Supplementary Fig. 18). For example, amides 65 and 67 were prepared in >70% isolated yields at around 100 mg scale. Furthermore, ligations of cinnamic acid 33 and indole carboxylic acids 26 (Fig. 3) with cyclopropyl amino acids 56 and l-Leu, respectively (Fig. 4), produced amides 68 and 69, which are precursors for the manufacture of promising SARS-CoV-2 protease inhibitors, including PF-07304814 (Pfizer; in phase I clinical trials) (Fig. 5a, b and Supplementary Fig. 18)40,41. Similarly, ligation of the thiazole carboxylic acid 29 and O-methyl-l-serine 49 provided amide 70; this is a key component of oprozomib, which is in phase II clinical trials for treatment of multiple myeloma42. Probing the limits of potential CfaL-reaction scope, we found that the mutant PbCfaL(R395G/A294P) also allowed the generation of amide precursors required for the synthesis of the antiviral telaprevir and the anti-cancer agent bortezomib (Extended Data Fig. 5). These products were only produced in low quantities, but with further engineering it may be possible to synthesize these precursors at higher levels. Overall, these reactions (Fig. 5) clearly demonstrate how structurally diverse carboxylic acids can be combined with proteinogenic and synthetic amino acids to produce pharmaceutically important compounds.

The potential of CfaL for use in kinetic resolution of racemic synthetic carboxylic acids was also investigated (Fig. 5c, d). Notably, racemic ibuprofen could be resolved, with excellent enantioselectivity (E = 94)43 leading to the biologically active (S)-ibuprofen-l-Ile amide 71. Amide conjugates of ibuprofen and related NSAIDs with amino acids have been explored extensively for applications as prodrugs and/or hydrogel-based nanomedicine44,45. Five other racemic acids were subjected to kinetic resolution, affording amides 7276, with modest E values (Fig. 5c, d and Supplementary Fig. 19). In the case of amide 73, the mutant PbCfaL(R395G) was superior to any of the wild-type CfaLs, illustrating how protein engineering could be used to achieve more effective kinetic resolutions. Finally, we sought to exploit the high selectivity of CfaL for l-amino acids to effect the kinetic resolution of racemic amino acids. Amino acid 57 was selected as this is a common pharmaceutical building block, which would normally require multi-step asymmetric synthesis or laborious resolution and protection before acylation or peptide coupling. As anticipated, the reaction between carboxylic acid 9 and racemic amino acid 57 proceeds with excellent enantioselectivity (E > 200), with none of the R-configured enantiomer evident in chiral HPLC when SsCfaL or AlCfaL was used (Fig. 5e, Supplementary Fig. 20). This demonstrates how racemic carboxylic acid and racemic amino acids can be resolved during amide bond synthesis, using CfaL, avoiding more laborious asymmetric synthesis or traditional resolution procedures, and the need for protective group manipulations.

Discussion

The results presented here demonstrate the role of CfaL enzymes in biosynthesis of the important coronatine family of phytotoxins. BLAST analysis reveals that CfaL-like ligases appear in a large number of distinct COR-like clusters from across a broad range of microorganisms, including bacteria where COR-like phytotoxins have not been observed, suggesting that CfaLs and the biosynthesis of COR-like phytotoxins are widespread. CfaLs can also catalyse ligation of JA with Ile to generate the plant hormone JA-Ile in an identical fashion to the plant ligase Jar1. The lack of sequence and structural similarity between the CfaL and Jar1 suggests that the two enzymes have evolved largely independently in bacteria and in plants to perform very similar reactions. In addition to potential agrochemical applications, the CfaL family of enzymes can be used to produce a wide range of pharmaceutically relevant amides. Used in combination with improving ATP recycling techniques46,47, these enzymes could become powerful synthetic tools offering major advantages over other biocatalysts developed for amide synthesis. For example, the combination of ACS and N-acyltransferase enzymes have been investigated for amide synthesis48. However, large numbers of ACS and N-acyltransferase enzymes had to be screened to find pairs of enzymes with matching selectivity48. In addition to low substrate scope, this system also requires use of two expensive co-factors as well as engineering of two enzymes for further optimization, rather than just one. Other reports describe the use of standalone NRPS adenylation domains to synthesize amides30,49. In these examples only the carboxylic acid activation step is directly enzyme catalysed, the subsequent amidation proceeds spontaneously, requiring a large excess (about 100 equiv.) of the amine, which is not viable for many syntheses. CfaLs directly catalyse both steps and can therefore utilize acids and amines with more efficient stoichiometry. Taken together our results show that CfaLs have potential for the synthesis of diverse range of important amide products, offering clear advantages over traditional synthetic methods and other biocatalytic approaches.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.