Despite continual advances in antimicrobial therapies, infectious diseases pose a serious threat to global health owing to the ability of microbes to acquire resistance against antibiotics and transform themselves into superbugs.1 Infectious diseases are an even more serious threat to health during immunocompromised conditions such as HIV/AIDS, diabetes and old age.2 The recent increase in multidrug-resistant pathogens emphasizes the urgent need for the discovery of therapeutically active novel antimicrobial agents.1

Lysobacter species are widely distributed in nature and characterized by their ability to lyse other organisms by producing a broad range of proteases and antibiotics.3, 4 Lysobacter sp. RH2180-5 was isolated from soil in Japan while screening for novel therapeutically active antibiotics using the silkworm infection model. Lysocin E, a therapeutically active novel cyclic peptide produced by Lysobacter sp. RH2180-5, has antimicrobial activity against clinical isolates of methicillin-susceptible and methicillin-resistant Staphylococcus aureus, including S. simulans, S. haemolyticus, S. pseudintermedius, Bacillus subtilis, B. cereus and Listeria monocytogenes with MICs ranging from 1–4 μg ml−1.5 Lysocin E exhibits a novel mode of action that involves its binding to menaquinone in the cell membrane. In addition to lysocin E, Lysobacter sp. RH2180-5 produces at least eight more derivatives, most of which have antimicrobial activity. To further improve the production of lysocin E, analyze the derivatives and prepare novel derivatives with improved activity, the genetic analysis of the lysocin biosynthetic gene cluster is required.

The critical feature of the lysocin E chemical structure involves 12 amino acids including one N-methylphenylalanine and one fatty acid side chain. On the basis of this structure, we speculated that lysocin E could be synthesized via non-ribosomal peptide synthetases (NRPSs). In this work, we sequenced the whole genome of RH2180-5 and analyzed the putative lysocin biosynthetic gene cluster. The 71 kb DNA contained two large multimodular NRPSs named lysocin E synthetase (Les) A and LesB with a 1.7 MDa core peptide. A total of 12 modules and 43 domains were detected where LesA and LesB contained a loading condensation domain and terminal thioesterase domain, respectively. The specificities of the adenylation domains suggested a linear mode of lysocin biosynthesis, and confirmed the association between the cluster and lysocin biosynthesis.

Results and Discussion

Genome sequence and mining of lysocin biosynthetic gene cluster

Genomic DNA was extracted using a Qiagen DNA-blood Mini Kit (Qiagen, Hilden, Germany) using lysozyme for bacterial lysis. DNA libraries were prepared using the Illumina TruSeq DNA Sample Preparation Kit (Illumina, San Diego, CA, USA) for paired-end reads, and the quality and quantity of the constructed libraries were confirmed using a BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA). Reads were generated using the Illumina HiSeq2000 platform to obtain 4 million paired-end reads. Subsequent data processing was performed with CLC Genomics Workbench ver 8 (CLC bio, Aarhus, Denmark), resulting in 224 contigs with a combined length of 5.8 Mb and a G+C content of 70%. The longest contig was 289 kb, and the N50 length, defined as the largest length L such that 50% of all nucleotides are contained in contigs with a size of at least L,6 was 76.6 kb. Genome annotation using MiGAP7 revealed a total of 4814 putative protein-coding sequences, and genes for 1 ribosomal RNA, 53 transfer RNAs and 38 miscellaneous RNAs. Genome analysis using antiSMASH8 revealed the presence of multiple secondary metabolite gene clusters encoding for non-ribosomal peptide, polyketide, arylpolene, bacteriocin and lantipeptide.

We detected 20 regions containing NRPS genes in the genome. On the basis of the module and domain organization of each NRPS, we speculated that 12 NRPS modules and 43 domains in LesA and LesB were involved in lysocin E biosynthesis, and focused our analysis on the 71 kb region of the genome (Figure 1a; Supplementary Table 1). LesA and LesB were estimated to be 998 kDa and 705 kDa, respectively, with a core peptide of ~1.7 MDa—one of the biggest NRPSs reported to date. We found 7 modules and 25 domains in LesA, and 5 modules and 18 domains in LesB (Figure 1b), and genes associated with the transport and regulation were detected in proximity to LesA and LesB. An MbtH-like protein was found downstream of LesB. Although MbtH-like proteins do not directly act as catalysts, biochemical studies have revealed that these proteins participate in tight binding to NRPS proteins containing an adenylation domain where they stimulate an adenylation reaction.9 The final cyclization process for formation of the lactone ring is carried out by the LesB thioesterase domain, which, in general, accounts for the majority of cyclization reactions within NRPSs.10

Figure 1
figure 1

NRPS involved in lysocin biosynthesis. (a) Genetic organization of the lysocin biosynthetic gene cluster from Lysobacter sp. RH2180-5. The annotation of the cluster is shown in Supplementary Table 1. (b) Deduced module and domain organization of LesA and LesB. Bars indicate the position of modules within the protein and each circle represents individual domains: A, adenylation domain; C, condensation domain; E, epimerization domain; MT, methyltransferase domain; PCP, peptidyl carrier protein domain; TE, thioesterase domain. (c) Chemical structure of nine lysocin derivatives identified so far. Les, lysocin E synthetase; NRPS, non-ribosomal peptide synthetase.

Specificities of adenylation domains in LesA and LesB

The adenylation domain is responsible for specifically attaching amino acids to the growing peptide chain.11 The adenylation domain sequences on both LesA and LesB were extracted using the available bioinformatics tools.8, 12 The specificity of the 12 domains toward the activating amino acid was predicted on the basis of sequence alignments of the signature code of pocket residues, as defined by Stachelhaus et al.,11 and the nearest neighbor predicted from NRPSpredictor2 (Table 1).12 Consistent with the lysocin structure, A1, A2, A3, A4, A5, A6, A7, A9 and A12 were predicted to activate Thr, Arg, Ser, Gly, Phe, Leu, Arg, Gln and Thr, respectively. A11 was predicted to activate Val, which is present in lysocin B, whereas lysocin E harbors Ile (Figure 1c). This suggests the possibility that A11 activates both the amino acids. The lysocin A8 and A10 domains were predicted to activate Ser and Val, respectively, but no lysocin derivatives with Ser and Val at these positions have been isolated so far, requiring further analysis of these domains. The alignment of lysocin A8 and A10 with the characterized gene clusters showed that these domains shared a high degree of similarity with the A8 and A10 of the WAP-8294A, a cyclic depsipeptide containing 12 amino-acid, biosynthetic gene cluster,13 respectively, with nearly identical specificity-conferring codes. Because A8 and A10 from the WAP-8294A biosynthetic gene cluster activate Glu and Trp, respectively,13 we speculated that lysocin A8 and A10 would have similar affinity.

Table 1 Analysis of adenylation and condensation domains in LesA and LesB

Domains conferring stereochemistry of amino acids

During NRPS biosynthesis, the elongation reaction of the peptidyl chain tethered to the phosphopantetheinyl arm of the upstream peptidyl carrier protein (PCP) domain to the amino acid bound to the downstream PCP domain is catalyzed by the condensation (C) domains.14 The C domains that couple long-chain fatty acids to the PCP-tethered aminoacyl acceptor substrate are referred as the starter14 or FCL.15, 16 These FCLs catalyze the lipoinitiation process during lipopeptide biosynthesis. A DCL domain catalyzes the condensation reaction between the d-residue in the upstream peptidyl donor and the l-residue in the downstream aminoacyl acceptor, whereas an LCL is responsible for the condensation between two l-residues.14 Before a DCL can perform its action, the chirality of the last amino acid in the growing peptide must be flipped, which is carried out by the epimerization domain. LesA and LesB harbor a total of 12 condensation domains and 5 epimerization domains. Lysocins harbor a relatively long β-hydroxyl fatty acid side chains attached to the peptide ring, and therefore, we expected that the first C domain of LesA would clade with known FCL domains. To decipher the activity of the lysocin C domains, we first prepared a phylogenetic tree by using the reported sequences,14, 15, 16, 17 and identified that the 12 C domains could be categorized into three domain clades, FCL, DCL and LCL, each consisting of 1, 5 and 6 domains, respectively (Supplementary Figure 1). Each class of condensation domains differs in the amino-acid sequences present in the conserved residue of motifs 1–7.14 Sequence alignment showed a clear distinction between these three classes of condensation domains (Figure 2). The presence of epimerization domains upstream of these five DCL domains suggested the presence of five D-amino acids in lysocin E, which was in complete agreement with the number and position of the DCL domains in the lysocin cluster (Table 1).

Figure 2
figure 2

Amino-acid sequence alignment of motif C1–C714 of three subtypes of condensation domains found in the lysocin biosynthetic gene cluster. The FCL domain was aligned with FCLs from daptomycin15 and A5414517 gene clusters.

Methylation and lipoinitiation of lysocin

N- or C-methylation of amino-acid residues makes the peptide less susceptible to proteolytic breakdown.18 The presence of D-N-methylphenylalanine in the lysocin E structure led us to search for the methyltransferase domain. In addition, of the nine isolated lysocin derivatives, six were methylated at Phe5 (Figure 1c). Within the whole-lysocin biosynthetic gene cluster, we found an N-methyltransferase domain in module 5 of the lesA gene. As the A domain of module 5 encodes for Phe, the presence of methylated phenylalanine in lysocin E was in accordance with this finding.

The presence of a lipid side chain attached to the first amino acid, Thr, in lysocin E adds to the diversity of the lysocin derivatives. Biochemical analysis of lipoinitiation during the biosynthesis of daptomycin is reported. DptE, an acyl ligase (AL) activates the fatty acid, which is then transferred to DptF, an acyl carrier protein (ACP) before condensation by an FCL of DptA.16, 19 This AL-ACP mode of activation is predicted for A54145, where fused AL-ACP (LptEF) carries out both steps, and homologs of DptE and DptF are observed in multiple other gene clusters.16 In the absence of the AL-ACP mechanism, activation of the fatty acid for lipoinitiation is carried out by acyl-CoA ligase (ACL), where the fatty acid is converted to fatty acyl-CoA thioester and incorporated into the FCL as in surfactin20 or WAP-8492 A.21 ACLs also activate fatty acids for polyketide biosynthesis.22, 23 Our analysis of the open reading frames of the predicted gene cluster did not reveal the genes encoding for either the AL-ACP or ACL mode of activation, suggesting the involvement of a mechanism outside the gene cluster.

In summary, we performed a genome sequence analysis of Lysobacter sp. RH2180-5, identified the lysocin E biosynthetic gene cluster and proposed a linear logic of biosynthesis. Our bioinformatic analysis will facilitate further genetic and biochemical analyses of lysocin biosynthesis, which will enhance our understanding of the molecular mechanisms of lysocin biosynthesis, thus paving the path for the generation of new analogs with better antimicrobial properties. Although Lysobacter species have recently drawn attention as the producer of bioactive secondary metabolites,3, 4 the RH2180-5 genome sequence provides the opportunity to elucidate the metabolic potential of this species. As an enormous range of medicinally important compounds is synthesized by bacterial NRPSs, our findings will facilitate genetic manipulation of the gene clusters involved in the biosynthesis of lysocin and other NRPSs for the generation of novel (un)natural analogs with improved bioactivity.

Nucleotide sequence accession number

The lysocin E biosynthetic gene cluster has been deposited at DDBJ under the accession number LC128664.