Improvement of a synthetic live bacterial therapeutic for phenylketonuria with biosensor-enabled enzyme engineering

In phenylketonuria (PKU) patients, a genetic defect in the enzyme phenylalanine hydroxylase (PAH) leads to elevated systemic phenylalanine (Phe), which can result in severe neurological impairment. As a treatment for PKU, Escherichia coli Nissle (EcN) strain SYNB1618 was developed under Synlogic’s Synthetic Biotic™ platform to degrade Phe from within the gastrointestinal (GI) tract. This clinical-stage engineered strain expresses the Phe-metabolizing enzyme phenylalanine ammonia lyase (PAL), catalyzing the deamination of Phe to the non-toxic product trans-cinnamate (TCA). In the present work, we generate a more potent EcN-based PKU strain through optimization of whole cell PAL activity, using biosensor-based high-throughput screening of mutant PAL libraries. A lead enzyme candidate from this screen is used in the construction of SYNB1934, a chromosomally integrated strain containing the additional Phe-metabolizing and biosafety features found in SYNB1618. Head-to-head, SYNB1934 demonstrates an approximate two-fold increase in in vivo PAL activity compared to SYNB1618.

. EcN strains were engineered to express PAL. The engineered strain series include strains (2) with a single chromosomal insertion of stlA (encoding PAL) in separate locations (1 copy PAL), a strain containing both stlA gene copies in the same background (2 copy PAL), a strain with a low copy plasmid expressing stlA (Low Copy PAL (pSC101)), and a strain with a high copy plasmid expressing stlA (High Copy PAL (pUC)). In each of these strains, the same stlA coding sequence, ribosome binding site, and anhydrotetracycline-inducible promoter was used so that the only difference between strains was the locations of chromosomal insertion and/or stlA copy number. a, The blue bars represent whole cell EcN (Wild type) 1 copy PAL 1 copy PAL 2 copy PAL Low Copy PAL (pSC101) High Copy PAL (pUC) Ladder Ladder

Pop 'n' sort methodology
Pop 'n' sort offers the diffusion mitigation of water-in-oil encapsulation with the ease, throughput, and commercial availability of fluorescence activated cell sorters (FACS). The benefit of using encapsulation during simultaneous growth, production, and sensing is highlighted below in a proof-of-concept sorting experiment performed on an ~300-member library of PAL variants (see description in "Engineering Rounds" below, Round 2 were never encapsulated, and instead were simply incubated with shaking following the washing and resuspension step. Following enrichment, colonies were assayed using the plate-based activated biomass protocol described in the Methods. For each of the two sorting strategies, n=4 wild-type PAL biological replicates (blue), n=44 independent colonies from a mock-mock sort (orange), and n=44 independent colonies from a top 1%-top 1% sort (green) were assayed in parallel (one 96-well plate for the liquid sort, and one 96-well plate for pop 'n' sorted). Boxplots with accompanying clonal data points, showing FOWT of TCA produced in 4 h. Mock-mock: two sequential mock sorts (sorted based on cell size and doublet exclusion without GFP-based gating). Top 1%-top 1%: two sequential top 1% sorts (sorted based on cell size and doublet exclusion as well as the top 1% brightest events). Boxes extend from the first to third quartile, with a center line indicating the median. Upper and lower whiskers extend to the maximum and minimum point, respectively, within 1.5 times the interquartile range of the bounds of the box. Data points outside of the whiskers are considered outliers. A single low activity outlier was excluded from the mock-mock population of panel b for improved data visualization, and can be found in the accompanying Source Data.

Engineering rounds
The focus of this Article is on a single round of PAL engineering, the enrichment of a >1 millionmember library, and the identification of an improved PAL variant that was further developed for use in a clinical strain. As alluded to in the discussion of Fig. 1b as well as the >1 millionmember library Methods and discussion, early efforts were made to engineer the enzyme ahead of the successful screening of a large library. Supplementary Fig. 3 below shows the activity relative to wild-type (fold over wild-type, FOWT) of the 411 unique variants that were identified throughout the course of the Synlogic-Zymergen collaboration, separated into multiple rounds of engineering. Supplementary Fig. 3. Unique variants identified throughout project. Fold over wild-type (FOWT) 4 h whole cell activity of 411 unique PAL variants identified throughout the Synlogic-Zymergen collaboration (n=48 independent strains in Round 1, n=107 independent strains in Round 2, n=105 independent strains in Round 3, n=151 independent strains in Round 4). Boxes extend from the first to third quartile, with a center line indicating the median. Upper and lower whiskers extend to the maximum and minimum point, respectively, within 1.5 times the interquartile range of the bounds of the box. Data points outside of the whiskers are considered outliers. Each data point represents the activity level of a unique sequenced variant. Where applicable, replicates of the same variant are averaged together, such that a single point appears per variant. FOWT was calculated by normalizing by a wild-type control or controls in the same batch/plate. For much of the data in this plot, replicates were not performed, as we prioritized promising hits over thoroughly characterizing less improved variants.
Round 1 of Supplementary Fig. 3 took place in parallel to the development of a sensor application suitable for large library screening, and largely consisted of clonal assaying of single permutations. These efforts identified a set of neutral and beneficial mutations from early structural, PSSM, and coevolution designs. In Round 2, we combined these neutral and beneficial mutations into the ~300-member library described in Supplementary Fig. 2, which was also the origin of the improved templates for the library discussed in the main text, as well as building out some rational full-length designs (the origin of the strains in Fig. 1b), and strategically layering promising mutations. As shown in Supplementary Fig. 2, by this time we had demonstrated successful library enrichment and were ready to expand our search.
Round 3 consisted of the building and screening of the >1 million-member library described throughout this work, which resulted in top hits with approximately 2-fold improvement over wild-type in whole cell assays. In a final round of engineering (Round 4), we layered additional mutations onto the top 1%-top 1% sorted population described in this work (complexity ≥10,000 variants) in two separate build strategies. On average 2 random mutations per variant were layered onto this pool to generate an ~5 million-member error-prone library. In a separate library, 112 new designed mutations (same approaches as described in the text) as well as 6 optional reversions of the most commonly found mutations (introduced using the inverse PCR approach described in the Methods, but with wild-type sequence) were introduced in a ~14 million-member combinatorial library. Despite identifying many unique improved sequences in this combined ~19 million complexity effort ( Supplementary Fig. 2, Round 4), we did not identify variants that surpassed the activity of the top variants identified in Round 3.

Characterization of top candidates in physiological context
Throughout the collaboration, top hits were assayed using a more thorough OD600 normalization protocol (described in "Activated biomass assay at various pH or after exposure to low pH" in the Methods, data assayed at neutral pH in Supplementary Table 1), and transferred to Synlogic as appropriate for testing in their in vitro simulation (IVS) stomach model (data in Table S1). In addition to those strains, wild-type StlA (EP2315) and variant H133M_I167K (EP2401, an early identified hit) were included as controls. For all the experiments in this section except the initial plate-based measurements in Supplementary Located at the C-terminal part of the helix; S92G may serve to increase flexibility by interfering with hydrogen bonding of the carbonyl oxygen. This position is also located near the active site and could influence the binding and release of substrate. A

C Packing
Might increase packing of enzyme through dipole-quadropole interaction between sulfur of A93C and aromatic ring of F90. I167K removes an exposed hydrophobic residue, which should improve the stability and biophysical behavior of the enzyme. R

E
No current hypothesis; R185E mutation on its own has a negative impact on enzyme activity.

T 322 W Stability
T322 is located in the helix; the change to W might increase stability of the enzyme by packing with charged amino acids in its spatial vicinity. It is unclear why W is the preferred substitution.
L 432 I Activity L432I changes the phi/psi of loop region by introduction of a betabranched amino acid. The loop region is near the active site and this change could affect substrate binding and release. A

S Solvation
A433S increases solvation of the enzyme due to exposed hydroxyl group (small perturbation in the change from A to S).  Alanine substitution at V470 increases helicity and stability of enzyme. T

T503E increases solvation energy by introduction of carboxylic acid.
Stability, packing H133F (or H133M) increases packing and helix stability by removing histidine's potential perturbation. H133 could potentially distort a helix by interfering with the carbonyl oxygen from A129. Recovery of activity after exposure to reduced pH -additional data for Fig. 4c Supplementary Fig. 5. Activity after recovery from acidic environment. a, Raw TCA values for data shown in Fig. 4c and expanded to show additional timepoints (0.5 h, 1 h, 2 h, and 4 h) of activity assay incubation at neutral pH after 1 h incubation at pH 5 or no pre-treatment (controls, gray, account for wash steps, described in the Methods). b, Data from panel a represented as fold improvement over wild-type StlA EP2315 (FOWT), and expanded to show additional timepoints compared to panel Fig. 4c. Average of 3 biological replicates with s.d. error-bars and individual data points shown. E P 2 3 1 5 E P 2 4 0 1 E P 2 5 1 6 E P 2 5 2 5 E P 2 4 9 5 E P 2 5 0 2 E P 2 5 2 8 E P 2 5 2 6 Gating strategy example Supplementary Fig. 6. Gating strategy example. A hierarchical gating strategy was applied for sorting. First, an elliptical SSC-Area vs. FSC-Area gate was applied to select cells of a similar size and omit debris (a), followed by gating on FSC-Height vs. FSC-Area to exclude doublets (b, events already gated based on cell size), and finally based on GFP signal (c, events already gated based on cell size and doublet exclusion). In the above example, the orange gate in c corresponds to the top ~1%; the green gate gathers the 1-10% brightest cells, which was not described in this paper. For mock sorts described in the main text, no selection based on GFP was applied.

Kinetic parameter determination of top candidates
Supplementary Fig. 7. Michaelis-Menten models from lysate kinetic data, example fit. Michaelis-Menten graphs with rate V (μM TCA/min) as a function of Phe concentration [Phe] (mM) for wild-type StlA (a) and the two top variants EP2516 (b) and EP2525 (c). The data points on each graph are rate (V in μM TCA/min) calculated from the first hour of activity for each Phe concentration tested, where activity remained linear. These plots show an example fit from a single biological replicate out of the three used in generating the values of Table 1.