GPAHex-A synthetic biology platform for Type IV–V glycopeptide antibiotic production and discovery

Glycopeptide antibiotics (GPAs) are essential for the treatment of severe infectious diseases caused by Gram-positive bacteria. The emergence and spread of GPA resistance have propelled the search for more effective GPAs. Given their structural complexity, genetic intractability, and low titer, expansion of GPA chemical diversity using synthetic or medicinal chemistry remains challenging. Here we describe a synthetic biology platform, GPAHex (GPA Heterologous expression), which exploits the genes required for the specialized GPA building blocks, regulation, antibiotic transport, and resistance for the heterologous production of GPAs. Application of the GPAHex platform results in: (1) a 19-fold increase of corbomycin titer compared to the parental strain, (2) the discovery of a teicoplanin-class GPA from an Amycolatopsis isolate, and (3) the overproduction and characterization of a cryptic nonapeptide GPA. GPAHex provides a platform for GPA production and mining of uncharacterized GPAs and provides a blueprint for chassis design for other natural product classes.

to acquire due to poor fermentation titers. These challenges restrict the opportunities of securing GPAs for exploration as drug candidates 12,13 . While most GPAs target the D-Ala-D-Ala terminus of peptidoglycan, we recently established that a subgroup, termed Type V GPAs 12 , binds to the cell wall in a distinct manner 14 . Type V GPAs are characterized by a tryptophan (Trp) linked to the central 4-hydroxyphenylglycine (Hpg) and are not generally glycosylated. Because they do not bind D-Ala-D-Ala, they are not susceptible to the canonical D-Ala-D-Lac-mediated GPA resistance and offer a class of promising antibiotics.
GPA core peptide scaffolds are synthesized through large multisubunit non-ribosomal peptide synthetases (NRPSs) 15,16 . These are arrayed along the genomes of actinobacteria in biosynthetic gene clusters (BGCs) that include genes required for the chemical modification of the peptide, the production of constituent elements, such as precursor amino acids, acyl chains, and sugars, in addition to genes for regulation, transport, and resistance 17 . The adenylation (A) domains of the NRPS modules recognize the amino acid components, many of which are nonproteinogenic. These include Hpg, 3,5-dihydroxyphenylglycine (Dpg), βhydroxytyrosine (Bht), in addition to tyrosine (Tyr), Trp, leucine, and others. The amino acids are loaded onto the peptidyl carrier protein (PCP) domain of the NRPS for elongation by the condensation domain that catalyzes amide bond formation. During the process of loading, aryl group halogenation and βhydroxylation on Tyr may also occur, introducing chlorinated amino acids and Bht into GPAs 18,19 . Bht can also be provided by a mini-NRPS pathway comprised of BpsD, OxyD, and Bhp 20 . Upon reaching the last module of the NRPS assembly line, the full-length peptidyl chain tethered to the last PCP domain can be crosslinked through intramolecular biaryl and diphenyl ether bonds. These are formed by cluster-associated cytochrome P450s, each recruited by adjacent characteristic X domain present in the last NRPS module 21 . Crosslinks in the GPA scaffolds confer on these antibiotics a cup-like 3D conformation that can tightly bind to peptidoglycan and its precursors 12 . Following peptide release from the NRPS, additional modifications such as glycosylation, acylation, and sulfation can occur, contributing to GPA chemical diversity 17 .
As a consequence of next-generation genome sequencing technologies, BGCs that may encode GPA scaffolds can be easily identified in actinomycete genomes. The associated BGCs represent an untapped resource for GPA discovery and development 22 . Many of these BGCs are 'cryptic', i.e., not expressed under laboratory conditions, or yield only small amounts of compound upon fermentation. Consequently, accessing GPAs remains challenging. Synthetic biology strategies such as heterologous expression in an accommodating chassis offer potential solutions to these difficulties 13 . There are two reports of heterologous expression of GPAs, complestatin and A47934, where the yield of complestatin was only 0.24 mg/L in Streptomyces lividans TK24 23,24 . An additional obstacle is the large size of GPA BGCs (often >70 kb), which makes them difficult to clone 15,25 , and the limited options for surrogate chassis strains.
Here, we describe the GPAHex platform based on an engineered GPA production chassis, S. coelicolor M1154/pAMX4, and an optimized transformation associated recombination (TAR) system for the production of known GPAs and mining of cryptic GPAs. GPAHex offers a synthetic biology platform for titer improvement and, importantly, to access to GPA chemical 'dark matter' 26 embedded in microbial genomes.

Results
Construction of the GPAHex platform. We chose to use S. coelicolor as the chassis for the GPAHex platform as it is a well-documented surrogate host for actinomycete gene expression, and we had previously shown that it supports good production of the GPA A47934 (~100 mg/L) 24 . Using the P1-phage artificial chromosome (PAC) plasmid pA47934 24 , which contains the A47934 BGC (~67 kbp) and~70 kbp of unrelated DNA, we removed the NRPS scaffold, the tailoring genes, and the unrelated region using λ-red mediated PCR-targeting 27 . We next mobilized the remaining plasmid into S. coelicolor to establish the GPAHex platform ( Fig. 1b and Supplementary Fig. 1). Hpg, Tyr, and Trp are all synthesized via the shikimate pathway 17 (Fig. 1c), which is highly regulated by its intermediates. We, therefore, introduced genes encoding the gatekeepers of the shikimate pathway, 3deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase and chorismate mutase, from S. toyocaensis NRRL15009, downstream from the remaining A47934 biosynthetic genes resulting in plasmid pAMX4 ( Fig. 1b and Supplementary Fig. 1). pAMX4 was delivered into S. coelicolor M1154 28 and integrated into the attB φC31 site to generate the GPAHex chassis, S. coelicolor M1154/ pAMX4.
Refactoring of the vector for TAR cloning of large NP BGCs. GPAs are synthesized from large 50 kb-90-kb BGCs, which are challenging to manipulate using standard DNA cloning methods. Two examples of GPA BGC cloning: (1) PAC library based random cloning of the A47934 BGC 24 and (2) λ-red mediated stitching of two cosmids to reassemble the complestatin BGC 23 have been reported. Both methods required labor-intensive and time-consuming genomic library construction and screening. Moore's group has pioneered the TAR system for targeted capture of NP BGCs using the in vivo homologous recombination system in yeast 29,30 . However, this method suffers from low capture efficiency and instability in maintaining large and repetitive BGCs such as those of GPAs 31 .
We developed a protocol for cloning of large NP BGCs by constructing vector pCGW, which replaces the SuperCos I backbone of pCAP03-aac(3)IV with the copy number control replicon bearing 'oriV-ori2-repE-sopABC' cassette from pBAC-lacZ. pCGW can be maintained at 1-2 copies/cell when supplemented with 0.2 % D-glucose; however, it can be conditionally induced to~100 copies/cell by adding 1 mM Larabinose in a trfA + E. coil strains such as E. coli EPI300, which are critical for stable maintenance of large exogenous DNA 32 . Significant differences in the concentration of pCGW with and without L-arabinose induction are readily apparent (Supplementary Fig. 2a, b).
Since pCGW was derived from pCAP03-aac(3)IV, we retained the use of the ura3 counter-selection marker by the insertion of capturing 'hook sequences', which are DNA sequences homologous to the two ends of target DNA region, between its TATA box and the ATG start codon 30 . Despite this counter-selection system, we observed high levels of background colonies due to insufficient expression of ura3, likely caused by the absence of transcriptional initiation sites (TIS) between the TATA box and start codon. Therefore, when designing the hooks, we included additional TIS sequences (CAAG, CAAA, TAAA, TAAT, or TAAG) to ensure transcription initiation of ura3 33 . The Schizosaccharomyces pombe pADH promoter has a transcription initiation window of 55-125 bps (75-115 bps is optimal) downstream of the TATA box 33 , so scanning of target BGCs to include the TIS sequences is critical for designing capture hooks. Since transcription of the pADH promoter can be initiated at different TISs, multiple TISs are preferred to maximize ura3 transcription to reduce background colonies generated by non-homologous end joining, consequently improving the capture efficiency.
GPAHex production of corbomycin. Corbomycin (Fig. 2a) is a Type V nonapeptide GPA discovered through phylogeny-guided genome mining 14 . While corbomycin shows clinical promise, it has a relatively low yield in the producing strain Streptomyces sp. WAC01529 (<4 mg/L) limiting its further development. Given the genetic intractability of S. sp. WAC01529, heterologous expression offers an alternative approach to improve the titer and provide a platform for synthetic biology development of corbomycin. The corbomycin peptide scaffold includes three Hpg (Hpg1, Hpg4, and Hpg5), three Dpg (Dpg2, Dpg8, and Dpg9), two Tyr (Tyr6 and Tyr7), and one Trp3 residues. These shikimate pathway or type III polyketide synthase (PKS) 17 derived aromatic amino acids provide an ideal pressure test for our GPAHex platform.
Genome sequencing of S. sp. WAC01529 revealed a~70 kb crb BGC covering 30 genes (Supplementary Table 2). We searched the boundary regions of crb BGC and designed a pair of capture hooks bearing five TISs (six TISs in total, including one introduced by the Pme I site) (Supplementary Table 1) for cloning a region of 76 kb from the chromosome of S. sp. WAC01529 into pCGW. Seven out of ten yeast colonies showed a positive signal by PCR screening-a 70% capture rate ( Supplementary Fig. 3a). Plasmid bearing the crb BGC, pGP1529 (Fig. 2b), was mobilized into S. coelicolor M1154 and S. coelicolor M1154/pAMX4 for corbomycin production through E. coli-Streptomyces intergenic tri-parental mating 34 .
As noted above, pAMX4 was inserted into the attB φC31 site located at position SCO3798 on the chromosome of S. coelicolor M1154. The pCGW plasmids include the same integration system and are predicted to integrate into the chromosome at pseudo-attB φC31 sites through site-specific recombination or at the pAMX4 site through homologous recombination. There are three pseudo-attB φC31 sites present in S. coelicolor located in SCO3398, SCO3792-SCO3793 intergenic region, and SCO4645 35 . Using whole-genome sequencing and PCR, pGP1529 was found to be integrated into S. coelicolor M1154/pAMX4 at the attL-int site downstream of SCO3797 through homologous recombination between the 2,064 bp attP-int region from the two plasmid backbones ( Supplementary Fig. 5a, b). Advantageously, homologous recombination integration of pGP1529 positions the crb BGC adjacent to the GPA biosynthetic cassettes of pAMX4. The  Fig. 1 with the addition of red (NRPS scaffold and mbtH genes) and purple (P450s genes). c Heterologous expression of corbomycin using GPAHex. The inverted triangle represents the small corbomycin peak produced by the parental strain S. sp. WAC01529 (i)). No corbomycin was detected in S. coelicolor M1154 (ii)), while a large corbomycin peak was evident in the GPAHex production chassis' trace (iii)). d Quantification of corbomycin production. Production of corbomycin in the GPAHex production chassis (iii)) was increased 19-fold compared to the parental strain S. sp. WAC01529 (ii)). Mean with error bar showing s.d. of three biological replicates (n = 3) is plotted. Significance was tested to *P = 0.0131 by an unpaired two-sided Student's t-test. Similar results (d) were obtained from two independent experiments. presence of neo and tsr genes upstream and downstream of the attL-int homologous recombination site provides stable maintenance of the GPA BGC with selection by kanamycin and thiostrepton.
Discovery of GP1416 from Amycolatopsis sp. WAC01416. Amycolatopsis strains are more common GPA producers in comparison with other actinomycetes 12 ; however, there have been no reports of heterologous expression of an Amycolatopsis GPA BGC in a Streptomyces surrogate host. We previously isolated a series of soil actinomycetes that show resistance to vancomycin, providing an enriched pool of putative GPA producers 36 . Genome sequencing of these isolates revealed dozens of GPA BGCs. Among these strains, Amycolatopsis sp. WAC01416 harbors ã 67 kb teicoplanin-like GPA. A domain analysis 37 predicted a heptapeptide scaffold of 'Hpg-Bht(or Tyr)-Dpg-Hpg-Hpg-Bht-Dpg' similar to that of 'Hpg-Tyr-Dpg-Hpg-Hpg-Bht-Dpg' heptapeptide of teicoplanin, a type IV lipoglycopeptide. Based on the sequence information (Supplementary Table 3 and 'F-O-G' crosslinks, N-acylated-glucosamine on Hpg4, Nacetylglucosamine on Bht6, and α-D-mannose on Dpg7 for GP1416. However, we were unable to isolate GPAs from Amycolatopsis sp. WAC01416, and so turned to our GPAHex platform to capture and express the GP1416 BGC. Two capture hooks bearing four TISs were designed to clone GP1416 BGC into pCGW (Supplementary Table 1). Three out of twelve screened yeast colonies showed a positive signal in PCR screening-a 25% capture rate ( Supplementary Fig. 3b). Verified plasmid, pGP1416 (Fig. 3a), was mobilized into S. coelicolor M1154 and S. coelicolor M1154/pAMX4 for GP1416 production through intergenic tri-parental mating. Antibacterial activity assay against B. subtilis 168 revealed no growth inhibition from extracts of Amycolatopsis sp. WAC01416 or S. coelicolor M1154/ pGP1416; however, S. coelicolor M1154/pAMX4/pGP1416 extracts showed excellent activity ( Supplementary Fig. 6). pGP1416 integrates into the same site as pGP1529 on the S. coelicolor M1154/pAMX4 chromosome ( Supplementary Fig. 7). HPLC analysis of D-Ala-D-Ala affinity column partially purified extracts revealed the presence of a series of peaks in S. coelicolor M1154/pAMX4/pGP1416 chromatograms compared to the controls Amycolatopsis sp. WAC01416 and S. coelicolor M1154/ pGP1416 (Fig. 3b). HRESI-QTOF-MS analysis detected a series of mass signals in the partially purified extract, which are distinct from those of teicoplanin ( Supplementary Fig. 17). Given the number of closely related analogs produced by S. coelicolor M1154/pAMX4/pGP1416, it proved difficult to purify a specific single compound for NMR analysis. Most of the analogs of type IV lipoGPAs are generated by loading various acyl side chains on the N2′ position of the glucosamine attached to Hpg4. We, therefore, deleted the acyltransferase coding gene (orf22) on pGP1416 to simplify the metabolic profile and facilitate the purification of deacyl-GP1416 for structural determination (  Table 5). Dechlorinated, dichlorinated, and hydroxylated deacyl-GP1416 analogs were also observed in the mass spectra ( Supplementary  Fig. 16). Based on the structure of deacyl-GP1416 and the mass spectra of GP1416, predicted structures of GP1416 are shown in Supplementary Fig. 17. The key differences between GP1416 and teicoplanin are in the chlorination of Tyr2 and Bht6, hydroxylation on Tyr2, and the acyl chain on glucosamine attached to Hpg4, which contribute to the complex metabolic profile of GP1416.
No GP1416 compounds were detected in Amycolatopsis sp. WAC01416, suggesting that the GP1416 BGC is a cryptic cluster. This hypothesis was confirmed using reverse transcriptionpolymerase chain reaction (RT-PCR) analysis. The GP1416 BGC was transcriptionally inactive in Amycolatopsis sp. WAC01416 and S. coelicolor M1154/pGP1416 but actively transcribed in the GPAHex production host, S. coelicolor M1154/pAMX4 ( Supplementary Fig. 18).
Discovery of a cryptic Type V GPA-GP6738. In our phylogenetic analysis of GPA biosynthesis and resistance 22 , we discovered a clade of Streptomyces strains harboring an almost identical BGC encoding a Type V GPA ( Supplementary Fig. 19). A domain analysis 37 predicted a nonapeptide scaffold of 'Dpg-Dpg-Val-Trp-Dpg-Hpg-Dpg-Tyr-Dpg'. Analysis of the two P450 enzymes in the BGC revealed that they are closely related to ComI and ComJ in complestatin's BGC ( Supplementary  Fig. 20), suggesting two possible crosslinks between Trp4 and Hpg6, and Hpg6 and Tyr8 residues. Unfortunately, when we attempted to isolate the compound from the wild type strains, Streptomyces sp. WAC06738, Streptomyces sp. CNQ329, Streptomyces sp. CNQ509, Streptomyces sp. CNQ525, Streptomyces sp. CNQ865, Streptomyces sp. CNT371, and Streptomyces sp. CNY243, no related compounds were detected after testing against a panel of different media, consistent with the 'cryptic' designation of GP6738. All of these strains are slow-growing and genetically intractable, making GP6738 a candidate for our GPAHex platform.
The GP6738 BGC was cloned through TAR as mentioned above, resulting in plasmid pGP6738 (Fig. 4a), followed by introducing into S. coelicolor M1154 and S. coelicolor M1154/ pAMX4 for expression. pGP6738 integrates into the identical site as pGP1529 and pGP1416 on S. coelicolor M1154/pAMX4 chromosome ( Supplementary Fig. 21). Antibacterial activity assay against B. subtilis 168 revealed no growth inhibition from the S. sp. WAC06738 extracts, in contrast, robust growth inhibition was observed from the S. coelicolor M1154/pGP6738 and S. coelicolor M1154/pAMX4/pGP6738 extracts ( Supplementary Fig. 22). HPLC analysis also revealed a distinct peak with characteristic GPA UV absorbance at 280 nm (Fig. 4b). HRESI-QTOF-MS analysis of the differential peak revealed the accurate mass  Table 6). The structure of GP6738 was further confirmed through tandem MS/MS spectrometry ( Supplementary Fig. 23b). GP6738 is a type V GPA bearing a 'Dpg-Dpg-Val-Trp-Dpg-Hpg-Dpg-Tyr-Dpg' nonapeptide scaffold as predicted, which is modified through a biaryl crosslink between Trp4 and Hpg6, and a diphenyl ether crosslink between Hpg6 and Tyr8 (Fig. 4c). The two intramolecular crosslinks are identical to that of the Type V GPAs complestatin, kistamicin, and corbomycin 12,14 . Unlike all other reported Type V GPAs to date, which are composed solely of aromatic amino acids, GP6738 includes an aliphatic Val3 residue.
Since GP6738 is cryptic in S. sp. WAC06738 and there are two predicted positive regulator genes orthologous to strR and lmbU present in its BGC, we hypothesized that overexpression of these regulators might further increase the expression of the BGC and increase the yield of GP6738 38,39 . We constructed a pIJ10257derived plasmid bearing orf10 (lmbU) and orf11 (strR) from GP6738's BGC under the control of the constitutive strong Streptomyces promoter ermEp*. This construct was mobilized into S. coelicolor M1154/pGP6738 and S. coelicolor M1154/ pAMX4/pGP6738, and inserted site-specifically at attB φBT1 site. The yields of GP6738 in S. coelicolor M1154/pGP6738, S. coelicolor M1154/pAMX4/pGP6738, S. coelicolor M1154/ pGP6738/pIJ10257-sl and S. coelicolor M1154/pAMX4/ pGP6738/pIJ10257-sl were 98 mg/L, 126 mg/L, 643 mg/L, and 591 mg/L, respectively (Fig. 4b, d). Overexpression of the positive regulators from the GP6738 BGC increased the yield of GP6738 by 6.5-and 4.7-fold in S. coelicolor M1154/pGP6738 and S. coelicolor M1154/pAMX4/pGP6738. Transcriptional analysis revealed that the GP6738 BGC was transcriptionally inactive in S. sp. WAC06738 but actively transcribed in the heterologous expression strains S. coelicolor M1154/pGP6738 and S. coelicolor M1154/pAMX4/pGP6738 ( Supplementary Fig. 29). Overexpression of the pathway-situated regulators strR and lmbU further increased the expression of the BGC, contributing to a higher yield of GP6738.
Antibiotic activity of GP6738. Given that GP6738 is a cryptic GPA discovered in this work, we were interested in its bioactivity and mode of action (MOA). MIC determination against a panel of indicator strains revealed a 2-8 and 2-16 -fold higher MIC of GP6738 compared to other Type V GPAs, complestatin, and corbomycin, respectively (Supplementary Table 7). Complestatin and corbomycin inhibit bacteria growth by blocking the action of autolysins, which are essential peptidoglycan hydrolases required for cell growth and division 14,40 . To test whether GP6738 retained the same MOA, we grew B. sublitis 168 with sub-MIC of GP6738, resulting in the characteristic elongated cell morphology of autolysin inhibition (Fig. 5a). Further incubation of B. subtilis 168 at a concentration of 10-fold MIC of GP6738 showed a bacteriostatic phenotype (Fig. 5b), and GP6738 was able to block cell lysis induced with various agents (fosfomycin, ampicillin, and sodium azide). Like complestatin and corbomycin, GP6738 blocks cell wall digestion by the peptidoglycan hydrolases mutanolysin and LytD in vitro (Fig. 5c- Fig. 3 Discovery of the cryptic GP1416 from Amycolatopsis sp. WAC01416 using GPAHex. a pGP1416 bearing GP1416 BGC captured through TAR. b Heterologous expression of GP1416 using GPAHex. GP1416 analogs can only be detected in the GPAHex host (iii)) but not in the parental strain Amycolatopsis sp. WAC01416 (i)) or in S. coelicolor M1154/pGP1416 (ii)). Peaks 1-14 are labeled and their associated MS spectra and proposed structures are presented in Supplementary Fig. 12. Peaks labeled with asterisks are also GP1416 analogs showing characteristic UV absorption of GPA. c Heterologous expression of deacyl-GP1416 using GPAHex. Acyltransferase deletion mutant S. coelicolor M1154/pAMX4/pGP1416Δorf22 shows less complex GPA production (compare to trace iii) in (b). Deacyl-GP1416 labeled with inverted triangle is purified and characterized as shown in (d). d Chemical structure of deacyl-GP1416. Peptide scaffold of GP1416 varies by the presence of a Tyr, Bht, m-Cl-Tyr, or m-Cl-Bht residue at position AA2 and a Bht or m-Cl-Bht residue at position AA6. Putative chlorination (green) and hydroxylation (cyan) on AA2 are highlighted.
blocking the action of autolysins by binding to peptidoglycan. Additional studies are required to decipher the precise binding site of GP6738 on peptidoglycan.

Discussion
In contrast to many other classes of antibiotics where multiple 'generations' of compounds have been developed over the years to address resistance and improve drug-like qualities, only two NP GPAs: vancomycin and teicoplanin, have dominated clinical use since the discovery of vancomycin in the early 1950s 12 . In recent years, three second-generation semi-synthetic GPAs: telavancin, dalbavancin, and oritavancin, have been introduced that overcome inducible vancomycin resistance and modulate pharmacodynamic parameters of the class 11 . These five compounds represent the complete portfolio of clinically approved GPAs. Reasons for the comparative sparsity of GPA analogs include the chemical complexity of GPAs, which makes total synthesis challenging on a production scale (though creative strategies by the Boger lab have begun to address this limitation) 41 , the rarity of reports of NP GPAs, and the general poor yield of such compounds obtained by fermentation even once they have been discovered.
We developed the GPAHex platform to address many of these drawbacks that plague the discovery and development of GPAs. Using the well-established S. coelicolor M1154, we added genes for the supply of specialized biosynthetic precursors (Hpg, Dpg, Bht, Tyr, and Trp), GPA resistance, a cluster-situated positive regulator, and transporters from the A47934 BGC. The chassis S. coelicolor M1154/pAMX4 serves as a common stage for the expression of GPA BGCs. A further advancement is the optimization of the TAR cloning method for cloning large specialized metabolite (SM) BGCs using an engineered copy number controlled capture vector pCGW.
Our results show that using the pCGW vector combined with introducing TISs into the cluster capture hooks, the cloning efficiency can reach as high as 70% for large (76 kb) BGCs like crb. The introduction of extra TISs helps to increase the transcription of the counter-selection marker gene ura3 thereby decreasing background colonies from re-circularized plasmid in Saccharomyces cerevisiae. The pCGW-based TAR cloning system should also be applicable to other BGCs, accelerating genome mining of cryptic NPs.
The application of GPAHex in the cloning and expression of the Type V GPA corbomycin increased production >19-fold over   Fig. 4 Discovery of the cryptic GP6738 from S. sp. WAC06738 using GPAHex. a pGP6738 bearing GP6738 BGC captured through TAR. Genes are colorcoded as in Figs. 1 and 2. b Heterologous expression of GP6738 using GPAHex. No GP6738 was detected in the parent S. sp. WAC06738 (i)). A prominent GP6738 peak was detected in all heterologous expression strains: S. coelicolor M1154/pGP6738 (ii)), S. coelicolor M1154/pAMX4/pGP6738 (iii)), S. coelicolor M1154/pGP6738/pIJ10257-sl (iv)), and S. coelicolor M1154/pAMX4/pGP6738/pIJ10257-sl (v)). Overexpression of the pathway-situated regulators highly increased GP6738 production. c Chemical structure of GP6738. GP6738 bears a rare nonapeptide scaffold harboring a Val3 residue and the characteristic Tyr-Hpg-Trp dual crosslinks of Type V GPAs. d Quantitation of GP6738 production. Overexpression of strR-lmbU regulators increased the yield of GP6738 by 6.5-fold in S. coelicolor M1154 and 4.7-fold in the GPAHex production chassis. Mean with error bars showing s.d. of three biological replicates (n = 3) is plotted. Multiple comparison significance was tested to ***P = 0.0003 or **P = 0.0005 by one-way ANOVA with Turkey's post hoc analysis. NS, not significant. Similar results (d) were obtained from two independent experiments.  Fig. 30).
In addition to the success of the production of corbomycin and discovery of GP1416, application of GPAHex to S. sp. WAC06738 identified a cryptic Type V GPA bearing a nonapeptide scaffold with a unique Val3 residue. Production of GP6738 reaches 125 mg/L in the GPAHex production chassis, which was further improved by >4-fold by overexpression of cluster-situated positive regulators. MOA studies revealed that, like other recently discovered Type V GPAs 14 , GP6738 interrupts the cell wall degradation process by indirectly inhibiting autolysins, which is a promising target for antibiotic drug development.
The GPAHex platform offers a blueprint for further NP research. For example, phenylglycines (Hpg and Dpg) are present not only in GPAs but in a variety of peptide NPs, including nocardicin, feglymycin, enduracidin, ramoplanin, arylomycin and others 42 , GPAHex may be applied for the production and discovery of other NPs. We note that additional synthetic biology tools can be integrated into the GPAHex platform due to the reserved integration sites (ΦBT1, pSAM2, SV1, and R4) [43][44][45][46] in the chromosome for developing novel GPAs. The GPAHex described here provides a synthetic biology platform for the titer improvement of GPAs and for the discovery of cryptic GPAs in actinomycetes. Success in the discovery and production of GP1416 from Amycolatopsis sp. WAC01416 shows that the genetically tractable Streptomyces model system is also suitable for the mining of NPs from Amycolatopsis strains, which, together with Streptomyces, cover~80% of known GPA BGCs 12,22 . Notably, the GPAHex platform includes most general precursor supply, resistance, transport, and regulation genes for GPA production, however, the offline Bht biosynthetic cassette (bpsD-oxyD-bhp) and the amino sugar biosynthetic cassette (evaABCDE) 17 , which are necessary for the aglycone scaffold biosynthesis and post-aglycone modification of Type I, II, & III GPAs, were not included in this version of the platform. Further implementation of these gene cassettes into the GPAHex platform in the future would make the platform more general to all GPAs. The modified TAR cloning system provides a significant addition to the synthetic biology toolbox for manipulating large NP BGCs, and the chassis developed here also provides an important tool for targeting of phenylglycine-containing NPs. We predict that this strategy for the development of NP production chassis by targeting genes necessary for precursors production, regulation, resistance, and transport into the chromosome should generally apply to other important NPs such as polyketides and terpenoids obviating the need to work with specific production strains that tend to be limited by genetic tools and inaccessibility.

Methods
Strains and plasmids. All strains and plasmids used in this study are listed in Supplementary Data 1.
Oligonucleotides and reagents. Oligonucleotides used in this study are listed in Supplementary Data 2. gBlocks designed for capturing of GPA BGCs are listed in Supplementary Table 1. Oligonucleotides and gBlocks were ordered from Integrative DNA Technologies (Coralville, IA, USA) and Sanger and genome sequencing were performed at the MOBIX Lab Central Facility (McMaster University). PCR reactions were performed using Dream Taq Green PCR Master Mix (2×) and Phusion Hi-Fidelity DNA polymerase (Thermo Fisher Scientific). Plasmids were purified using the GeneJet Plasmid Miniprep Kit (Thermo Fisher Scientific) except for pGP1529, pGP1416, and pGP6738 which were purified using the alkaline lysis method. Restriction enzymes were purchased from Thermo Fisher Scientific and T4 DNA ligase was purchased from New England Biolabs.
Construction of the GPAHex production chassis. The boundary sequence of PAC pA47934 24 was determined by terminal sequencing using pESAC13-sF primer, including a region of 70,139 bps downstream of the A47934 BGC unrelated to A47934 biosynthesis. This region was deleted using λ-red mediated PCR targeting and a DAHP synthase and choristmate mutase (DAHP-CM) dual gene cassette amplified from S. toyocaensis NRRL15009 was introduced in situ. To remove the non-GPA region and introduce DAHP-CM dual cassette, three amplicons were generated. pA4-DAHPs-F/R primers and pA4-CM-F/R primers were used to amplify the DAHP synthase gene and the chorismate mutase gene from S. toyocaensis NRRL15009 genome, respectively, and pA4-cat-F/R primers were used to amplify the cat gene from pKD3. The three amplicons were stitched together through overlap-extension PCR, resulting in a DAHP-CM-cat cassette positioning PmeI and SwaI sites at the two sides of cat gene. The DAHP-CM-cat cassette was transformed into E. coli BW25113/pKD46/pA47934 strain through electroporation, resulting in pAMX1. Error-free pAMX1 plasmid was digested with PmeI and SwaI to remove the cat gene, and then self-ligated at 20°C, overnight, using T4 DNA ligase, resulting in pAMX2. The A47934 NRPS and crosslinking genes (staA-staL) were deleted using an identical procedure. pA4-Δnrps-F/R primers were used to amplify the cat gene from pKD3, followed by transformation into E. coli BW25113/ pKD46/pAMX2 strain through electroporation, resulting in pAMX3. pAMX3 was digested with PmeI and SwaI to remove the cat gene, and then self-ligated at 20°C, overnight, using T4 DNA ligase, resulting in pAMX4. pAMX4 was transformed into E. coli ET12567 and then mobilized into S. coelicolor M1154 through E. coli-Streptomyces tri-parental mating 34 using E. coli ET12567/pR9406 47 as a helper strain to generate the GPAHex production chassis, S. coelicolor M1154/pAMX4.
Construction of the copy number control vector pCGW. Based on the original pCAP03-aac(3)IV capture vector 30 , pCGW was designed by replacing the Super-Cos I derived region with a copy number control region (oriV-ori2-repE-incC-sopABC) from pBAC-lacZ. Primers strep-F/R and yeast-F/R were used to amplify the Streptomyces and yeast elements from pCAP03-aac(3)IV, respectively. Since there is a NdeI site in the repE gene in the copy number control cassette, overlapextension PCR was used to mutate this NdeI site. Primers oriVS-sop-F and repE-mNdeI-R were used to amplify one half of the copy number control cassette from pBAC-lacZ, while primers repE-mNdeI-F and oriVS-sop-R were used to amplify the other half. Then primers oriVS-sop-F/R were used to stitch the two fragments, resulting in the mutated copy number control cassette, which was further assembled into pCGW with previously amplified Streptomyces and yeast elements through Gibson assembly 48 .
TAR cloning and heterologous expression of the GPA BGCs. GPA BGCs were identified by analyzing the genome sequences using antiSMASH 5.0 before TAR cloning. TAR cloning was performed by following the standard protocol of direct cloning of NP BGCs 49 using pCGW instead of pCAP03-acc(3)IV. pCGW was treated with NdeI and XhoI followed by introduction of gBlocks bearing capture hooks into pCGW by Gibson assembly. gBlocks for capturing corbomycin, GP1416, and GP6738 BGCs are listed in Supplementary Table 1. pCGW bearing capture hooks were linearized by PmeI and used to transform yeast spheroplasts. Genomic DNA from Streptomyces sp. WAC01529, Amycolatopsis sp. WAC01416, and Streptomyces sp. WAC06738 were isolated using the salting out procedure followed by RNase A treatment 34 , and then digested with BstZ17I/HpaI, BstZ17I, and HindIII, respectively. Linearized pCGW-derived capture plasmids (500 ng) and digested genomic DNA (2 μg) were mixed and co-transformed into S. cerevisiae VL6-48N spheroplasts and plated onto SD-Trp-5-FOA selection medium for growing 3-5 d 49 . Plasmid DNA was extracted from yeast colonies using the alkaline lysis method for PCR screening. Positive hits were re-transformed into E. coli EPI300 cells using electroporation and then confirmed by restriction mapping. Sequence confirmed constructs were isolated and transformed into E. coli ET12567 and conjugated into S. coelicolor M1154 and S. coelicolor M1154/pAMX4, through tri-parental mating as described above.
Determine of the integration site of GPA BGCs. S. coelicolor M1154/pAMX4/ pGP1416, S. coelicolor M1154/pAMX4/pGP1529, and S. coelicolor M1154/pAMX4/ pGP6738 strains were grown at 30°C, 250 rpm, in TSBY medium (supplemented with 50 μg/mL kanamycin and 25 μg/mL thiostrepton) to mid-log phase, and then harvested for genomic DNA preparation using the salting out procedure 34 . Illumina MiSeq sequencing (300 bp, paired end reads) was performed by the McMaster Genomics Facility in the Farncombe Institute at McMaster University. The high complexity of the crb and GP6738 BGCs results in an inability to resolve the chromosomal integration loci through genome sequencing, so PCR diagnostics were used instead for these cases. crb-up-dF/dR, crb-m-dF/dR, and crb-dw-dF/dR primers were used to confirm the integration of crb BGC into the chromosome through homologous recombination between the attL-int regions. Similarly, GP6738-up-dF/dR, GP6738-m-dF/dR, and GP6738-dw-dF/dR primers were used to confirm the integration of GP6738 BGC into the chromosome at the same site.
Purification of deacyl-GP1416. Streptomyces strains were grown as described above. The culture supernatant was treated with 5% (w/v) Diaion HP-20 resin (Sigma-Aldrich), followed by elution with 10% methanol and 50% ACN using column chromatography. The 50% ACN fraction was applied to a Sephadex LH20 (Sigma-Aldrich) column using 50% ACN as the running solvent. Active fractions were collected and further purified by HPLC on an Agilent semi-prep column (Zorbax Extend C-18, 21.2 × 100 mm, 5 μm) using a linear gradient from 5 to 10% solvent B (solvent A, 0.1% v/v formic acid in water and solvent B, 0.1% v/v formic acid in ACN) over 10 mins to yield pure compound.
Overexpression of the regulators in GP6738 BGC. strR 6738 -F/R and lmbU 6738 -F/ R primers were used to amplify strR and lmbU genes from S. sp. WAC06738. The two amplicons were then assembled into NdeI/AvrII treated pIJ10257 50 plasmid using Gibson assembly, resulting in pIJ10257-sl. Error-free pIJ10257-sl plasmid was conjugated into S. coelicolor M1154/pGP6738 and S. coelicolor M1154/ pAMX4/pGP6738 through tri-parental mating as described above.
Purification of GP6738. S. coelicolor M1154/pAMX4/pGP6738/pIJ10257-sl was inoculated in TSBY medium (supplemented with 50 μg/mL kanamycin, 25 μg/mL thiostrepton, and 50 μg/mL hygromycin) and grown at 30°C, 250 rpm for 48-60 h. Then 1 mL of the seed culture was inoculated into 50 mL SAM medium in 250 mL flasks, and incubated at 30°C, 250 rpm for 6 days. 150 mL of SAM culture was collected and lyophilized. The dried material was washed with 50% MeOH (4×) and extracted with DMSO (3×), resulting in~70 mg pure GP6738. GP6738 was dissolved into d6-DMSO (20 mg/mL) and analyzed by 1D and 2D NMR recording on a Bruker AVIII 700 MHz instrument equipped with a cryoprobe. GP6738 was analyzed by high-resolution mass spectrometry (HRMS) using an Agilent 6550 iFunnel Q-TOF mass spectrometry with an inline Agilent 1290 HPLC system using electrospray ionization in positive mode, and mass scan range from 50-2000 Da.
RNA isolation and RT-PCR. Streptomyces strains were inoculated into 50 mL SAM medium in 250 mL flasks using 1% (v/v) inoculum from TSBY seed cultures, and mycelia were harvested at early-stationary phase (20 h) and late-stationary phase (48 h). Cells were lysed by bead beating with 4 mm glass beads in 5 mL TRIzol reagent (Invitrogen), and RNA was extracted and purified using the PureLink RNA Mini Kit (Invitrogen) according to the manufacturer's recommendations. cDNA synthesis was performed using Maxima H Minus First Strand cDNA synthesis Kit (Thermo Fisher Scientific) after dsDNase (Thermo Fisher Scientific) treatment, followed by RT-PCR quantification on a BioRad CFX96 real time system using PowerUp SYBR Green Master Mix (Applied Biosystems). Primers targeting genes of interest were designed (Supplementary Data 2) and 90-100% efficiency was confirmed before quantitation. Analysis was performed on biological triplicates and the fold change of gene expression was calculated by normalizing to hrdB using the ΔCt method.
Phylogenetic analysis of P450s. Protein sequences were obtained from the NCBI database and aligned with Muscle using default settings in MEGA X. A maximum likelihood phylogeny was created by MEGA X 51 with 100 bootstrap replicates using the WAG substitution model with empirical amino acid frequencies (+F), gammadistributed rates invariant sites (G + I), complete deletion of gaps, and the Nearest-Neighbor-Interchange (NNI) tree search method. The consensus tree was re-rooted to preserve the monophyletic OxyB.
Cell lysis assay. Early exponential phase B. subtilis 168 (OD 600~0 .25) grown in LB media was dispensed 100 µL per well into a round bottom 96 well plate. Antibiotics were supplemented to the appropriate concentration, and where appropriate lytic agents were added (50 µg/mL fosfomycin, 100 µg/mL ampicillin, 100 mM sodium azide). The plate was covered with a clear, breathable film and OD 600 was monitored on a Tecan sunrise microplate reader at 37°Cwith shaking.
Peptidoglycan autolysin digestion. The muramidase mutanolysin was purchased from Sigma. The glucosaminidase domain from LytD was cloned from B. subtilis 168 into pET28 between NcoI and XhoI restriction sites for introduction of C-term 6× His. Construct was expressed from E. coli BL21(DE3) pLysS by inoculating 1 L of LB media 1:50 from an overnight culture, then growing at 37°C until OD 600 reached 0.6, at which time the cells were induced with 1 mM IPTG and grown for 18 h at 16°C. Native LytD-6× His was purified by Ni-NTA affinity chromatography and cation exchange chromatography 14 . Protein purity was >95% as assessed by SDS-PAGE analysis.
B. subtilis peptidoglycan for binding and digestion assays was prepared by harvesting the cells (OD 600 0.6-0.7), boiled in 4% SDS, and washed and sonicated to break up sacculi, followed by α-amylase and DNase treatment, and digested with pronase overnight 14 . The peptidoglycan was boiled in 2% SDS again, washed, and teichoic acids were hydrolysed with 1 M HCl. The peptidoglycan was washed with water to pH 6.0 and lyophilized to use. Peptidoglycan was digested under optimized conditions with either mutanolysin (20 µg/mL, 1 mg/mL PG, 20 mM sodium acetate, pH 6.5) or LytD (10 µg/mL, 1 mg/mL PG, 200 mM NaCl, 50 mM MES-NaOH, pH 5.5). 100 µL of peptidoglycan in buffer, as indicated, was preincubated with either GP6738 or a corresponding volume of DMSO with shaking for 30 minutes, then dispensed into a round bottom 96 well plate and enzyme added. The plate was covered with a clear, breathable film to prevent evaporation, and OD 600 was tracked on a Tecan sunrise microplate reader shaking at 37°C. Results show the average and standard deviation of triplicate wells.
Statistical analysis. Statistical analysis of compound production quantified by LC-MS and HPLC and gene-expression quantified by RT-PCR was performed using GraphPad Prism v. 7 in every case. For corbomycin and GP6738 quantitation, statistical significance was assessed by unpaired two-sided Student's t-test (n = 3) or one-way ANOVA with Tukey's post hoc analysis (n = 3) as described in the figure legends. For gene expression analysis, two-way ANOVA with Tukey's post hoc analysis (n = 3) was performed as described in the figure legends. P values <0.05 were considered as statistically significant. All results are representative of two independent experiments.