Ultrahigh-throughput discovery of promiscuous enzymes by picodroplet functional metagenomics

Unculturable bacterial communities provide a rich source of biocatalysts, but their experimental discovery by functional metagenomics is difficult, because the odds are stacked against the experimentor. Here we demonstrate functional screening of a million-membered metagenomic library in microfluidic picolitre droplet compartments. Using bait substrates, new hydrolases for sulfate monoesters and phosphotriesters were identified, mostly based on promiscuous activities presumed not to be under selection pressure. Spanning three protein superfamilies, these break new ground in sequence space: promiscuity now connects enzymes with only distantly related sequences. Most hits could not have been predicted by sequence analysis, because the desired activities have never been ascribed to similar sequences, showing how this approach complements bioinformatic harvesting of metagenomic sequencing data. Functional screening of a library of unprecedented size with excellent assay sensitivity has been instrumental in identifying rare genes constituting catalytically versatile hubs in sequence space as potential starting points for the acquisition of new functions.

Supplementary Figure 1: Increased sensitivity in smaller droplets. Fluorescent product was generated over time by hydrolysis of sulfate monoester 1d by the sulfatase P35, released upon droplet encapsulation and lysis of cells transformed by the plasmid PC35. In 2 pL droplets, the occupancy (cells/droplets) was set to 0.09, whereas the occupancy was 0.04 in 8 pL droplets. The average fluorescence of occupied droplets ( ̅ ) was calculated using Equation 1 and normalised by the fluorescence of the majority of droplets that contain no cells to give the fluorescence signal change (FSC) (Equation 2). Average Fluorescence ( ̅ ) for positive droplets (considering only droplets with fluorescence 5-times higher than the majority of droplets with low fluorescence, i.e. empty droplets). To test the hypothesis that the increased screening capacity leads to a larger number of hits, a model library (containing a combination of ENR-S, ENR-G, ENR-M, ENR-L; library ENR-MSGL see Supplementary Table 1) was screened (using substrate 1d) with oversampling of 2-, 10-, 20-and 25-fold of the theoretical library size. If the hypothesis holds, the more we oversample, the more hits we expect. To measure the outcome of selections under various coverage conditions DNA from the selected droplets was recovered and directly transformed into E. coli. After 2 days of growth on agar plates, transformed colonies were lysed then overlaid with a solution containing sulfate monoester 1b ( Supplementary Fig. 12). Colonies producing blue dye were identified as expressing a sulfatase. The number of positive colonies was determined and plotted against the oversampling (library coverage) (a). Plasmids from positive colonies were isolated and digested using BamHI and NotI. Digested plasmids were run on agarose gels and the occurrences of unique digestion patterns were determined to assess the number of unique variant of the hits (b). The agarose gel shows the digestion patterns for the hits PC40, PC35 and BK1. Identical digestion patterns corresponding to the three hits were found from the sorted droplets when covering the library at least 10-times. The observation of an increase in hits with larger diversity supports our hypothesis that more screening gives more hits. This experiment establishes that the throughput of the screening and selection technology is directly affecting the success of a metagenomic campaign. Ladder: Hyperladder I (Bioline). Using PCR amplifications (on ten random metagenomic variants from libraries 1-5 and 8-10, Supplementary table 1) as insert size values the total plasmid sizes were calculated and compared to the values of copy numbers measured after DNA extraction (Miniprep, QIAGEN). *The plasmid copy numbers should not be considered as absolute values; according to manufacturer pZERO-2 copy number is ~700. The sorting gates were set at a multiple of the mean background peak, ranging between 2-fold (a and Fig. 2b) and 5-fold (d).  Figure 9: Validation of the activity of hits towards sulfate monoester (a) and phosphotriester (b).
(a) ~1,200 transformants grew on plate after transformation of plasmids (2.5 out of 7 μL) recovered from selected droplets after the 2nd round of microfluidic screening. 104 (~10%) displayed sulfatase activity (colonies turning blue after lysis, overlay with sulfate monoester 1b and overnight incubation at 22°Cas described in Metagenomic screening on plates) ~half of the colonies that turned blue were further confirmed as positive hits using microtiter plate (using substrate 1d) and the positive variants were sequenced. (b) ~1,200 transformants grew on plates and 368 clones were randomly selected to be re-grown in liquid culture (1 mL). Following overnight growth at 37 °C and incubation for 24 hours at 22 °C, cultures were pelleted and lysed in order to assay phosphotriesterase activity towards triester 2d. 42 variants (~10%) displayed at least a 2-fold increase (shown in green) of the fluorescence signal, when compared to a negative control. Metagenomic variants without phosphotriesterase activity were used as negative controls and are shown here in wells 2C and 11G of each plate) in a 40 hour endpoint measurement. A variant expressing a mutant of Pseudomonas diminuta phosphotriesterase was used as a positive control (here in wells 2B and 11F of each plate). Plasmids from positive variants were isolated and sequenced.  Supplementary Figure 11: Recovering weak activities from metagenomes using microdroplets.

a. Number of total plasmid copies and plasmid PC86 per DNA sample
(a) We measured fluorescent signals of lysed colonies of a metagenomic variant (PC36 without measurable triesterase activity; randomly picked) and three phosphotriesterase hits (PC83, PC84 and PC86 containing respectively the gene p83, p84 and p86 coding for triesterases with different catalytic efficiencies (Table  1)). This colony-screening assay for activity (using 2d as substrate) generated a very small fluorescence signal on colonies expressing P84 (k cat /K M ~ 57 M -1 s -1 ) even when spread as lawn (and distinction of individual PC36 and PC84 colonies was impossible) whereas higher fluorescence was observed for the hits expressing stronger phosphotriesterases (P83 and P86; k cat /K M ~9x10 5 M -1 s -1 and ~10 3 M -1 s -1 respectively). (b) Using the same substrate 2d and growth/expression conditions (growth on  14 cm-plates at 37 °C for 15 h and incubated at 22 °C for 24 h), we used microfluidic droplets (as described throughout this paper) to test whether a quantitative distinction between bacteria transformed with PC84 and PC36 was possible. The fluorescence distributions for PC36 and PC84 can be clearly distinguished in the readout of droplet screening. However, the fluorescence signals of the weak phosphotriesterase of PC84 and the negative variant PC36 overlap (after 24 h). When a selection threshold of 2-to 5-fold (increase over average fluorescence of 'negatives', i.e. droplets not containing a hit) was chosen, this overlap explains how negative variants are still present in the selected clones. Such low selection threshold is necessary to ensure that weak catalysts such as PC84 are not missed.
Supplementary Figure 12: Substrates used in this study.
Kinetic parameters for each substrate are presented in Supplementary Table 3. We note that no activity was found for any metagenomic hits towards lactones 8b, 8c, 8d 1-2 , amide 9 and β-lactam 10 under conditions described in Supplementary Fig. 14.
LG: leaving group.  345 sequences from the α/β hydrolase superfamily are represented in this network. Only edges corresponding to similarity scores below E-values of 1 x e -19 are shown; the worst edges shown represent a median 30 % identity over an alignment length of 232 residues. Only three families from the α/β hydrolase superfamily (with experimentally characterised proteins (red nodes)) are represented here: acetylcholine esterases (PF000135) (sequences annotated as such in the Uniprot database are highlighted in yellow), esterases (PF07859) (sequences annotated as such in the Uniprot database are highlighted in blue) and dienelactone hydrolases (PF01738) (highlighted in pinkthe red node corresponds to the Uniprot sequence P0A114). Some acetylcholine esterases (yellow nodes) were shown to confer insecticide resistance in insects 4, 5 and an esterase (OpdB) from Lactobacillus brevis WCP902 was shown to degrade organophosphate pesticide 6 (blue node). a b Supplementary Figure 16: Comparison of sequence identity between reported phosphotriesterases (annotated as such in Uniprot) and our metagenomic hits in the amidohydrolase superfamily (a) and MBL superfamily (b). The scale colour reflects the lowest to highest identity percentage (from low values in red to high values in blue). The alignments were performed with ClustalW2 using default settings and displayed using the percentage identity matrix.  Electron density has been traced as a 2F obs -F calc -map at .5 σ (blue grid), selected amino acids are shown as sticks (grey).

Supplementary Tables
Supplementary Library sizes were evaluated after transformation of the ligation products between either the sheared metagenomic DNA (libraries 1 to 9) or the digested metagenomic DNA (library 10) with the plasmid pZero-2. The starred libraries (*) were pooled to give the combined library ENR-MSGL that was used for preliminary experiments presented in Supplementary Fig. 3 and 4. These libraries were obtained after enrichment of microorganisms able to grow with specific nitrogen sources 7 . The DNA source could be unambiguously ascribed to either cow rumen and soils/vanilla pods since the environmental DNA was cloned into pZero-2 using two different restriction sites. Note that PC32, PC35, PC40 and BK1 were recovered when a subset of the library originating from soil was screened ( Supplementary Fig. 4).

Hit
Closest genome a (match % ; % identity) DNA source b -% GC -Insert size (kbp) PC32 Pseudomonas putida (79 ; 94) Soil -64 -6.7 Michaelis-Menten parameters (k cat , K M , k cat /K M ) of the ten hits identified by metagenomic screening (i.e. nine phosphotriesterases and one sulfatase (P35)) were obtained for a range of substrates ( Supplementary Fig. 12). Purified enzymes ( 0 nM < E < μM) were incubated with different substrate concentrations for up to three hours. The observed initial rates v o were plotted against substrate concentration. The Michaelis-Menten equation (or its modified version accounting for substrate inhibition) or linear regressions (if solubility limits precluded measurement of an entire saturation profile) were used to fit the data using Kaleidagraph. All the plots are shown in Supplementary Fig. 14.

PC35
nd': not determined. '-': no product was detected with μM of enzyme and mM of substrate over 3 hours (over a background control in the absence of enzyme). This suggests an upper limit of such activities of approximately 10-fold below the lowest activity observed here (5x10 -3 M -1 s -1 for P85) for substrates with p-nitrophenol leaving groups.      Hits The metagenomic hits found in this study are highlighted in green. Lines highlighted in blue (OPH) and purple (Promiscuous) are proteins used for comparison with our triesterase hits (Fig. 4b). Details of catalytic efficiencies for highlighted enzymes are listed in Supplementary Table 7.