In vitro flow cytometry-based screening platform for cellulase engineering

Ultrahigh throughput screening (uHTS) plays an essential role in directed evolution for tailoring biocatalysts for industrial applications. Flow cytometry-based uHTS provides an efficient coverage of the generated protein sequence space by analysis of up to 107 events per hour. Cell-free enzyme production overcomes the challenge of diversity loss during the transformation of mutant libraries into expression hosts, enables directed evolution of toxic enzymes, and holds the promise to efficiently design enzymes of human or animal origin. The developed uHTS cell-free compartmentalization platform (InVitroFlow) is the first report in which a flow cytometry-based screened system has been combined with compartmentalized cell-free expression for directed cellulase enzyme evolution. InVitroFlow was validated by screening of a random cellulase mutant library employing a novel screening system (based on the substrate fluorescein-di-β-D-cellobioside), and yielded significantly improved cellulase variants (e.g. CelA2-H288F-M1 (N273D/H288F/N468S) with 13.3-fold increased specific activity (220.60 U/mg) compared to CelA2 wildtype: 16.57 U/mg).


Results
The result section is divided into three parts in order to develop and to validate the InVitroFlow technology platform for directed cellulose evolution. In the first section, the principle of InVitroFlow is described; followed by the second section comprising optimization steps using the CelA2-H288F cellulase variant as reference for cell-free production in (w/o/w) emulsions, flow cytometry-based analysis, and sorting. In the third section, InVitroFlow was validated in a directed evolution campaign by flow cytometry-based screening of a random mutagenesis library (epPCR library) with the CelA2-H288F cellulase variant as parent (encoded by a linear template). Sorted variants were expressed in 96-well MTP format and the most beneficial cellulase variant M1 (N273D/H288F/ N468S) was partially purified and kinetically characterized.
Principle of high throughput flow cytometry-based in vitro compartmentalization screening platform. Figure 1 summarizes the seven steps of an InVitroFlow-based directed evolution campaign. A celA2-H288F library with a high mutational load (8 mutations per gene) containing > 10 8 mutants was generated using epPCR ( Fig. 1, Step 1). Subsequently the random mutant library was encapsulated in (w/o/w) emulsion droplets together with supplemented fluorogenic substrate (fluorescein-di-β -D-cellobioside) and an optimized in vitro extract mixture for cell-free enzyme production ( Fig. 1, Steps 2-3). Optimization comprised selection of optimal substrate and substrate concentration, amount of template DNA, BSA concentration, incubation time and incubation temperature. The compartmentalization in emulsion droplets enables genotype-phenotype linkage since it ensures that the mutated genes, the encoded enzyme variants, and the generated fluorescent products (fluorescein) remain entrapped in the emulsion compartment ( Fig. 1, Step 3). As a result, the fluorescent (w/o/w) emulsions contain active enzyme variants which generate sufficient fluorescein for sorting by flow cytometry. InVitroFlow offers a throughput of 10 7 events per hour and sorting of 5,000 events per second by qualitative differentiation between fluorescent and non-fluorescent events (Fig. 1, Step 4). The genes encoding for active cellulase variants in the sorted sample were isolated and amplified by PCR in approximately 3 hours (Fig. 1, Step 5). Amplified PCR products can subsequently be used as template for further iterative rounds of directed evolution and/or cloned into a vector with a subsequent transformation into an expression host (e.g. for MTP screening; Fig. 1, Step 6). MTP-based screening is usually used to identify the most beneficial variants (e.g. for improved activity, thermal and organic solvent resistance; Fig. 1, Step 7). In summary, InVitroFlow offers screening of a high number of enzyme variants (> 10 8 ) within a few hours.
Optimization of (w/o/w) emulsion compartments. Essential for high level in vitro protein production in (w/o/w) emulsions are: (1) compatibility of cell-free expression mixture and substrate/product with emulsion building components (i.e. oil, surfactants), (2) stability throughout the selection procedure (including sorting), (3) homogeneity in size and shape, (4) one inner aqueous phase per (w/o/w) emulsion droplet. Generation of (w/o/w) emulsions was performed by three preparation methods based on stirring, homogenizing and membrane extrusion. The generated emulsions were characterized according to the above mentioned criteria (1)(2)(3)(4). Stirring resulted in a comparably low number of (w/o/w) emulsions and a broad size distribution (1-30 μ m; Fig. 2a aqueous compartment per droplet (Fig. 2c,d). Membrane extrusion was finally used for InVitroFlow since it yielded highly monodisperse (w/o/w) emulsions of homogeneous size (~10 μ m) that harbor only one inner aqueous compartment per droplet (Fig. 2e,f). Fluorescence microscopy analysis over time showed that generated (w/o/w) emulsions have a stability of at least 24 hours under the selected conditions. Substrate selection for the InVitroFlow screening platform. In order to maintain genotypephenotype linkage between gene and encoded cellulase a suitable substrate-product pair was identified. 4-Methylumbelliferyl-β -D-cellobioside (4-MUC) and fluorescein-di-β -D-cellobioside (FDC) were finally used to visualize cellulase catalyzed conversion and leakage of fluorescent product from (w/o/w) emulsions. The pair FDC/fluorescein is more sensitive than the commonly employed coumarin-based substrates 30 and had lower product diffusion rates from the inner aqueous phase (Fig. 2f). The presence of the charged carboxyl group in FDC probably reduces product leakage into the hydrophobic oil phase and makes FDC a suitable substrate for flow cytometry-based screening in (w/o/w) emulsions. In an additional optimization step, BSA (1-2%) was supplemented to the (w/o/w) emulsions to minimize interfacial interactions and leakage of FDC/fluorescein from (w/o/w) emulsions. The optimal BSA concentration was determined by flow cytometer analysis to be 1 mg/ml BSA (1% BSA, 26.45% of fluorescent events; Supplementary Fig. S1b). BSA-influence on fluorescein leakage from (w/o/w) emulsion compartments was quantified by comparison to samples without BSA (4.75% fluorescent events, Supplementary Fig. S1a) and samples containing 2 mg/ml BSA (2% BSA, 2.50% fluorescent events, Supplementary Fig. S1c).
Cell-free cellulase production within (w/o/w) emulsion compartments. Cell-free cellulase production in (w/o/w) emulsion compartments generated by the membrane extrusion method was performed using an in vitro expression kit (FastLane E. coli Mini Kit; RiNA GmbH). Incubation time in the cell-free production mix in (w/o/w) emulsions was varied from 3 h to 24 h and after 4 h a sufficient cellulase activity for flow cytometry analysis (40.3% active fraction) was obtained ( Supplementary Fig. S2b). Incubation at varied temperatures revealed an optimal level of cellulase activity at 25 °C which was subsequently used as temperature of choice ( Supplementary Figs S2 and S3).
In order to define an efficient sorting window for libraries with a high mutational load, mutant libraries with a defined active-to-inactive ratio (e.g. 3:7 ratio of CelA2-H288F active versus CelA2-H288F-E580Q inactive ) were used to mimic conditions under which mutant libraries are often screened. A model library with an active-to-inactive cellulase ratio of 3:7 was finally used to optimize the amount of template DNA (0.164-0.656 μ M) and substrate concentration (0.13-0.75 mM of FDC) for flow cytometer analysis. Samples containing < 0.656 μ M of template DNA during in vitro cellulase production in (w/o/w) emulsions yielded upon flow cytometry analysis Flow cytometry analysis and sorting of in vitro cellulase DNA model libraries with defined active-to-inactive ratio. Cell-free expressed DNA model libraries with an active-to-inactive ratio of 5:95 and 10:90 (CelA2-H288F active versus CelA2-H288F-E580Q inactive ) in (w/o/w) emulsions were analyzed for determining parameters for efficient flow cytometry sorting. Analysis and sorting was done at an event rate of 1,500 events s −1 . Figure 3 shows that an increase in the amount of CelA2-H288F active leads to an increase in the fluorescent signal above the line indicating gate P1. In case of 0% CelA2-H288F active , 1.2% of the fluorescent population is located above the line P1 that was used for sorting active variants (Fig. 3a). Analysis of 5% of a CelA2-H288F active population results in approx. 11% positive events above the line P1 (Fig. 3b), whereas for 10% CelA2-H288F active , 26% of the population was above the line P1 (Fig. 3c). Analysis of a 100% CelA2-H288F active population shows that 46% of the population is above the line P1, (Fig. 3d). A 46% fraction of active events in the positive control (100% CelA2-H288F active ) suggests that not all (w/o/w) emulsions contain a gene template. The latter is due to the applied dilution conditions by emulsification of the gene sample which are randomly loaded according to Poisson statistics into (w/o/w) emulsions 31 . In order to ensure that the majority of emulsions contains only one gene per droplet the majority of emulsion compartments has to be without any gene. The library with an active-to-inactive ratio of 5 to 95 was sorted at an event rate of 1,500 events s −1 with a sorting efficiency of 99.6%. Reanalysis of the sorted fraction resulted in a 5.3-fold enrichment of the active emulsion fraction (Fig. 3e). The obtained enrichment validated the InVitroFlow technology platform since it is possible to isolate a significantly increased number of fluorescent events (∼ 57%).

Flow cytometry screening and sorting epPCR libraries. A random mutant library of celA2-H288F
was generated by epPCR using 0.05 mM MnCl 2. The random mutant library of celA2-H288F was subsequently employed in the InVitroFlow screening system with the previously optimized conditions. An average mutation frequency of 8 mutations per gene was estimated by sequence analysis of 12 randomly selected clones. In order to achieve high enrichment in the active fraction (> 50%) a strict sorting strategy was applied to sort out the single (w/o/w) emulsion events with the highest fluorescence (8.5% fluorescent events above the line P1) (Fig. 4a,b). Upon flow cytometry analysis of the cell-free expressed celA2-H288F random mutant library in (w/o/w) emulsions in total 1.4 × 10 7 events were analyzed (2,000 events s −1 ), and the 8.5% most fluorescent population (above the line P1) was sorted with a 93.2% efficiency (Fig. 4a). Reanalysis of the sorted population revealed an enrichment in active fraction of 8.2-fold, determined by an increase of the population above the line P1 from 8.50% to 69.75% (Fig. 4b).
Recovery of sorted DNA library from (w/o/w) emulsions and transformation in expression host for MTP analysis. The sorted (w/o/w) emulsions showing highest fluorescence were disrupted and an optimized DNA recovery method based on NucleoSpin ® Gel and PCR clean up kit (Macherey-Nagel) was used to isolate the DNA of the mutant library. After PCR amplification the PCR products of celA2 were analyzed using agarose-gelelectrophoresis ( Supplementary Fig. S6), cloned into the pET28a(+ ) vector backbone via PLICing 32 , and transformed into E. coli BL21 Gold (DE3) for expression. In total, 528 cellulase variants were screened with the coumarin based 4-MUC activity quantification system in 96-well MTPs 30 . MTP analysis revealed 33 cellulase variants with significantly improved activity in comparison to the parent CelA2-H288F (up to 7.2-fold) even though that 4-MUC and not the pair FDC/fluorescein was used in the rescreening. The latter indicates that a high quality library was isolated in which also activities for other substrates than the screening substrate can be optimized. Analysis of variant CelA2-H288F-M1 showed 17.5-fold improvement for the pair FDC/fluorescein in comparison to the parent CelA2-H288F ( Supplementary Fig. S7). The three most promising variants were rescreened and finally CelA2-H288F-M1 (N273D/N468S) revealed to be the most beneficial cellulase variant (Fig. 5).

Discussion
Protein engineering by directed evolution is limited by the complexity of the protein sequence space. Already a peptide with five amino acids can yield 3.2 million different proteins. In traditional directed evolution campaigns only a minor fraction of clones are screened and often already screening of a few thousands of clones yields improved variants 33 . In a comprehensive study the groups of Jaeger and Schwaneberg showed that 70 to 80% of beneficial positions are discovered in traditional directed evolution campaigns 34,35 . Therefore, a higher throughput with a balanced mutations frequency is of very high importance to efficiently capitalize on nature's diversity.
The InVitroFlow screening platform combines cell-free compartmentalization with flow cytometer-based analysis and achieves a throughput of 6.5 × 10 6 events per hour. In a single round of directed cellulase evolution over 1.4 × 10 7 events were screened. Rescreening in MTP format revealed the highly improved cellulase variant M1 (N273D/H288F/N468S; specific activity of 220.6 U mg −1 ) compared to CelA2 wild type (16.57 U mg −1 ), and the parent CelA2-H288F (72.62 U mg −1 ) despite that a coumarin derivative (4-MUC) and not flow cytometer prescreening substrate (pair FDC/fluorescein) was employed in the MTP rescreen. Cellulase InVitroFlow represents the first in vitro compartmentalization-based flow cytometry screening system for directed cellulase evolution and it is the first report in the last ten years. In total up to now, only three directed enzyme evolution campaigns with InVitroFlow were reported [20][21][22] . InVitroFlow technology enables to sample up to 10 10 events 21,36 which represents well coverage of the generated diversity number of a random mutant library (> 10 6 ) 37 . InVitroFlow enables to sample variant numbers that are order of magnitude higher than standard whole cell screening systems (> 10 3 -fold) 36,38 despite that InVitroFlow requires oversampling; only every 100 th emulsion droplet contains a gene template to ensure that sorted emulsions contain genes which encode in the majority beneficial variants so that diversity loss in subsequent PCR amplification is minimized.
Challenges for the rare usage of in vitro emulsion compartmentalization combined with flow cytometry lie in the limited efficiency of in vitro enzyme production within water-in-oil-in-water emulsion compartments and the confinement of the fluorescence signal in emulsions. The hydrophobic oil phase in emulsions mimicking the cell membrane causes interfacial inactivation of the produced enzyme 29 and expression kit components. In the cellulase InVitroFlow screening system, the cellulase production within (w/o/w) emulsions was optimized using the BSA protein as "sacrificial" substrate for "saturation" of the water/oil interface which yielded a 5-fold increase in fluorescence signal. Additionally, the employed fluorogenic substrate was directly encapsulated in (w/o/w) emulsion compartments and the presence of the negatively charged carboxylic group minimizes leakage and crosstalk.
The application potential and industrial opportunities of in vitro evolution of enzymes, especially enzymes from human or animal origin beyond standard bacterial and yeast proteins, is very impressive as summarized in several reviews 39,40 . Expression of eukaryotic proteins is mainly performed in wheat germ extracts (production of up to mg/ml amounts of proteins), while cell-free extracts of insect and E.coli cells are characterized by lower protein yields and production of nascent or insoluble proteins, respectively 41 . Cell-free expression offers the opportunity to express toxic 25,42,43 and membrane proteins 44,45 for which in vivo expression remains challenging. In the last five years, successful expression of three toxic proteins i.e. perisin-like protein from the cabbage butterfly Pieris rapae 43 , hemolysins of Vibrio parahaemolyticus 42 , and expression with the toxic amino acid canavanine 25 was reported. Most commonly used production strains in industry i.e. Aspergillus niger, Bacillus and Trichoderma are often limited in transformation efficiencies 28,46 and directed evolution in E. coli or yeast strains can yield different results due to differences in glycosylation or result in further codon usage optimization for high level expression. Recent reports on usage of novel eukaryotic and vesicle containing cell-extracts, i.e. insect extracts 41 and human-based cell-free extracts 47 enabled the N-linked glycosylation of the target protein and the embedding of membrane proteins into microsomal membranes, which is of great interest for the scientific and industrial community 40,48 .
In summary, Cellulase InVitroFlow is a rapid, cost-efficient and non-laborious screening platform for directed cellulase evolution which offers to cover a significant fraction of the generated sequence space and to identify beneficial positions beyond the possibility of traditional screening formats. We see the main application of the InVitroFlow technology as prescreening system enabling researchers to isolate the most active variants from a vast pool of variants and to explore novel directed evolution strategies with high mutational loads in which only a small fraction of a populations is active. Advancing InVitroFlow to cell free expression systems based on wheat germ and other eukaryotic cell extract will be challenging and will open many exciting possibilities to evolve with an InVitroFlow prescreen cell toxic proteins and proteins from human or animal origin.  The gene celA2 (GenBank: JF826524.1), isolated from a metagenome library, with 41% similarity to a Glycosyl Hydrolase Family 9 (GH9) cellobiosidase from Clostridium cellulovorans codes for a cellulase protein with a molecular weight of ∼ 69 kDa, consisting of 604 amino acids 49 . Positions 288 and 580 are reported to be key residues for activity, so in this study variant CelA2-H288F, showing 8.7-fold higher specific activity than CelA2-WT, was used for development of the uHTS IVC platform 30 . The amino acid substitution from glutamic acid to glutamine at the catalytic residue 580 is reported to lead to a complete loss of activity, therefore variant CelA2-H288F-E580Q, showing a behavior like the empty vector pET28a(+ ), was used as negative control in this study 50 . Gene cloning into expression vectors and sequencing. The construct pET28a(+ )-CelA2-H288F was modified by introducing an additional restriction NdeI site in front of His-tag sequence by PCR. Subsequently celA2-H288F gene was cut out using double restriction (NdeI and XhoI) and subcloned into pIX3.0RMT7 vector backbone. The NdeI-celA2-H288F insert was generated by PCR (98 °C for  The inactive variant CelA2-H288F-E580Q was generated by site directed mutagenesis according to the published method 51 using pIX3.0RMT7-CelA2-H288F as template DNA and specific SDM primers (SDM_ E580Q_for: 5′ -GCTATGCCACCAATCAGATTTGCATTTATTGGAATAGTCCG-3′ , SDM_E580Q_rev: 5′ -CGGACTATTCCAATAAATGCAAATCTGATTGGTGGCATAGC-3′ ). The recombinant plasmid was named pIX3.0RMT7-CelA2-H288F-E580Q. Constructs were digested (1 h, 37 °C) by DpnI (20 U), purified using NucleoSpin ® Gel and PCR clean up kit (Macherey-Nagel) and transformed into chemically competent E.coli BL21 Gold (DE3)-lacIQ 1 cells. DNA sequencing of the inserted gene was performed by Eurofins Genomics (Ebersberg, Germany) and Clone Manager 9 Professional Edition (Sci-Ed software, Cary, USA) was used for sequence analysis.  The amplified PCR products were digested (1 h, 37 °C) by DpnI (20 U), analyzed for correct size by agarose-TAE gel electrophoresis according to the standard protocol 53 , purified from agarose gel using the NucleoSpin ® Gel and PCR clean up kit (Macherey-Nagel) and eluted in 20 μ l ddH 2 O. Purified PCR products were used for PLICing and resulting hybridization products were transformed into 100 μ l chemically competent E. coli BL21-Gold (DE3) cells following the standard protocol 54 . Transformants were grown on LB/Kan agar plates for further analysis in 96-well plate.

Cultivation, expression in flask and purification of CelA2 and its variants. For cultivation Luria broth
(LB) supplemented with appropriate antibiotics and a shaking incubator (Multitron II; Infors GmbH, Einsbach, Germany) was used for incubation (20 h, 37 °C, 900 rpm). Ampicillin (100 μ g/ml) and Kanamycin (50 μ g/ml) were used as antibiotics for growth selection.
For in vivo cellulase expression the main expression culture was inoculated with 1% of the preculture and incubated until OD 600 of 0.6 was reached (37 °C, 250 rpm) in 100 ml Terrific Broth (TB) supplemented with appropriate antibiotics. Cellulase expression was induced by addition of 100 μ l IPTG (0.1 M). Cells were harvested after incubation (4 h, 30 °C, 250 rpm) by centrifugation (4000 g, 10 min, 4 °C) and pellets were stored at − 20 °C.

Determination of kinetic parameters for CelA2 activity with 4-MUC activity assay.
For determination of kinetic parameters of purified CelA2-H288F and its variants 4-MUC activity assay was performed according to the published protocol 30 .