Living systems depend on chemical reactions that, if uncatalysed, occur too slowly to support life. Therefore, evolution has selected biological catalysts—enzymes—that speed up reactions to rates sufficient to sustain survival and growth. The enzymes found in present-day proteomes are typically well-ordered, finely tuned proteins, which provide rate enhancements far superior to enzyme catalysts created by design (with few exceptions so far1). Moreover, most enzymes in the current biosphere are relatively large, with bacterial and eukaryotic proteins having average lengths of 320 and 472 amino acids, respectively2.

It is relatively straightforward to envision how these large and highly active enzymes evolved by iterative selection for improvements of progenitor sequences that were already active and folded. However, it is far more challenging to understand how new functions were brought about in the first place. One may hypothesize that the first minimally active sequences arose de novo from inactive random polypeptides. Understanding how functional enzymes emerged from random sequences is particularly challenging in light of a large body of work showing that fully random sequences (1) rarely fold into well-ordered soluble structures3,4 and (2) rarely bind biologically relevant small molecules. For example, a seminal study by Keefe and Szostak5 showed that random libraries of 80-residue polypeptides include sequences that bind ATP at a frequency of approximately one in 1011. Catalysis is an even greater challenge than binding. The promiscuity of catalysts has been invoked to facilitate the emergence of new reactions after gene duplication and adaptive evolution of a side activity6,7. Metagenomic libraries have been found to contain catalysts for ‘unseen’, non-natural reactions with a xenobiotic substrate8 at a frequency of one in 105. Eliciting function from non-catalytic sequences instead of re-functionalizing a (promiscuously) active enzyme is a much more difficult proposition, with very few examples on record and requiring the screening of libraries with more than 1012 members9,10. Computational design has been successful in principle11, but is far from routine, and creating high-efficiency catalysts immediately rivalling the efficiency of evolved enzymes has so far been difficult and is limited to a single case1, even though deep-learning-based algorithms are reinvigorating design approaches. Thus, in experimental and computational work, the step between inactive and active sequences seems a near-unsurmountable hurdle.

These considerations illustrate the tension between the abundant success of life on Earth and the extreme rarity of sequences capable of sustaining such life in the ‘vastness of sequence space’. Dayhoff had hypothesized12,13 that the first functional proteins emerged from short peptides (and their combinations by duplication, diversification and gene-fusion events during the course of evolution). The function of such primordial peptide building blocks then provides the historical link between prebiotic chemistry and the contemporary proteins of much larger sizes that eventually reached high efficiency and specificity. To probe the early emergence of function as a missing link to contemporary proteins, we set out to explore whether biologically relevant catalysts can be isolated from collections of unevolved de novo designed sequences. Because we anticipated that active catalysts would be rare and fully random sequences rarely fold into stable soluble structures3,4, several features were incorporated into the screen to enhance the likelihood of success. First, as a starting scaffold, we chose a collapsed globular structure that folds into a stable 4-helix bundle14. This scaffold, S-824, is a 102-residue protein (previously isolated from a semi-random library15) without natural function. Second, ultrahigh-throughput screening in microfluidic droplets (rate of ~0.8 kHz) was used to search libraries generated from S-824 for rare hits amidst a background of inactive sequences in a 1.7-million-membered library. Third, the screened substrates contained both a phosphodiester and a phosphotriester, thereby broadening the types of catalyst that might be discovered. Finally, a mixture of typical divalent metals (often found as cofactors in hydrolases) was added to the screen, thereby allowing the isolation of metalloenzymes derived from de novo design.

Our screen yielded a truncated and structurally dynamic 59-residue-long enzyme that accelerates phosphodiester hydrolysis in the presence of manganese ions by up to ~109-fold over the uncatalysed background reaction. Promiscuous turnover was observed for phosphodiesters and phosphonates, including the unreactive substrate second messenger cyclic AMP (cAMP). The combination of an unevolved protein and a metal to process a nucleotide is prebiotically interesting—metals are thought to be highly important in prebiotic catalysis16,17,18, and phosphodiester bonds of nucleotides form the basis for information storage (DNA, RNA) and signalling (cAMP, cGMP). The rapid identification of a short protein that self-assembles into a dimeric structure with considerable activity provides support for the Dayhoff hypothesis12,13 and illustrates a scenario of rapid conversion of a peptide with no measurable activity into a proficient catalyst that is crucial for evolution.


Microfluidic screening yields catalytically active proteins

The starting scaffold of S-824 is a stable 102-residue-long 4-helix bundle protein that had previously been isolated from a library randomized with binary patterning of polar and nonpolar amino acids15,19. S-824 is a rationally designed sequence of known three-dimensional structure (PDB 1P68)14 unrelated to any natural proteins. Such a 4-helix bundle scaffold tolerates enormous sequence diversity, with a hydrophobic core as the only minimal constraint19, minimizing the chances of sacrificing the folded structure and thus allowing us to interrogate the functional potential of diverse randomized sequences. S-824 was thus randomized in its apical loops and helix termini to create an active-site cavity surrounded by catalytic groups, producing a library of ~1.7 million variants (Fig. 1a)20.

Fig. 1: Droplet screening of a library of de novo designed 4-helix bundles enriches truncated sequences with phosphoesterase activity.
figure 1

a, De novo designed 4-helix bundle library. A library20 containing ~1.7 million variants based on the stably folded de novo designed 4-helix bundle protein S-824 was screened for phosphoesterase activity. The diversified residues of S-824 are shown in red (degenerate codons used: NDT/VRC/RRC). b, Ultrahigh-throughput microdroplet screening. The library was subjected to FADS on a microfluidic chip, and the 0.1–0.2% most fluorescent droplets (of a total of 4.4 million screened) were selected. c, Truncated peptide with phosphoesterase activity. The selection yielded catalytically active, truncated peptides consisting of a helix-turn-helix motif of ~60 amino acids (mutated sites are shown in red), illustrated here with a structural model of mini-cAMPase generated with AlphaFold2/ColabFold32,34. Panel b adapted with permission from ref. 59 under a Creative Commons licence CC BY 4.0.

The parental S-824 sequence showed no detectable phosphoesterase activity above background in our screening assay. To maximize the chances of capturing active enzymes, we incorporated diversity into all components of the screen. Not only were the sequences diverse, but we also screened for different catalytic activities simultaneously by using a bait mixture of fluorogenic phospho-di- and -triester substrates (Supplementary Fig. 1). Moreover, because many natural phosphoesterases utilize divalent metal cofactors, we increased the chance of finding a hit by supplementing the reaction with a mixture of MnCl2, ZnCl2 and CaCl2. We sampled all these variables combinatorially in an ultrahigh-throughput manner using microfluidic droplets, which allowed screening of the entire library in a single experiment. In this microfluidic assay, individual bacterial cells expressing a single library sequence were co-compartmentalized with the mixture of fluorogenic substrates and metals, and then lysed in the droplet (Fig. 1b and Supplementary Fig. 2)21,22. After incubation, the droplets were dielectrophoretically sorted at ultrahigh throughput by fluorescence-activated droplet sorting (FADS) on a microfluidic chip23. Sorted hits were subsequently identified by recovering and sequencing their plasmid DNA.

For the initial sort, we chose permissive conditions to maximize the number of screened clones and enhance the likelihood of finding hits. In total, 10.3 million droplets (corresponding to ~4.4 million clones) were screened in 4 h, and the 0.2–0.5% most fluorescent droplets were selected. The collected hits were clonally expanded and re-sorted under more stringent conditions to further enrich catalytically active clones. After this cumulative enrichment by droplet screening, ~250 clones were arbitrarily picked for secondary screening in microtitre plates. Average activity levels with the substrate mix increased with each round of sorting (Supplementary Fig. 3), indicating that screening cumulatively enriched sequences with phosphate hydrolase activity.

Enrichment of truncations along with gain of function

Sequence analysis of these single hits from the secondary screen revealed that some of the most active sequences (12/14) had frameshift mutations that produced truncated protein sequences (Fig. 1c). Indeed, next-generation sequencing (NGS) analysis of the entire library before and after screening (~1.4 million unique variants) suggested that droplet screening had enriched single-nucleotide deletions along with the gain of function. These deletions led to frameshift mutations and consequently an enrichment in premature stop codons (from 17% to 27%; Fig. 2a and Extended Data Fig. 1). This observed enrichment stands in stark contrast to the fact that intra-gene stop codons are typically strongly selected against, as they usually lead to a loss of structure and function24,25,26. Thus, the paradox of an observed enrichment of truncated variants—as a general trend across the library—indicated that truncations might contribute to the selected phosphoesterase activity. Further analysis of their positional distribution revealed a high abundance of truncations at positions 40 and 60, with 18% of all sequences being truncated at one of these positions after the second sorting round. Truncations at both positions are enriched compared to the input library (1.2- and 2.8-fold; Fig. 2b). These truncated sequences lack the two C-terminal helices of the 4-helix bundle (Fig. 1c) and form the template for a shorter helix-loop-helix motif. Cysteine residues (which are absent in the S-824 ancestor) were also enriched, but their enrichment can be explained as a byproduct of frameshifts, co-introducing them into the coding sequence before the stop codons (Supplementary Fig. 4).

Fig. 2: NGS analysis reveals enrichment of truncated proteins across the library.
figure 2

a, Relative frequency (percentage of truncated sequences among the total number of sequences) of truncated reads in the input library (grey; 17%), after sorting 1 (light blue; 15%) and after sorting 2 (dark blue; 27%) reveals 1.6-fold enrichment of premature stop codons after sort 2, albeit close to 0% would be expected. b, The frequency of truncations at every sequenced position in the input library (grey, broad bars) and after sorting 2 (blue, narrow bars) shows 1.2- and 2.8-fold enrichments of truncations at position 40 and 60, respectively, after sorting 2.

Source data

A manganese-dependent de novo phosphodiesterase

We focused on analysing one clone (named mini-cAMPase hereon) with high catalytic activity, strong expression and good solubility, and disentangled the combinatorial elements of the screen (multiple substrates; multiple metal cofactors). Manganese was required for activity (Fig. 3a) while not altering the thermal stability (Extended Data Fig. 2), consistent with a catalytic rather than structural role. However, manganese binding could not be measured by isothermal titration calorimetry (ITC; Supplementary Fig. 5), which prevented determination of affinity and stoichiometry.

Fig. 3: Metal requirement and substrate scope.
figure 3

a, Metal dependence of the enzymatic reaction for the mixture of fluorogenic bait substrates: 100 µM mini-cAMPase was incubated with ~200 µM substrate mixture and 200 µM Ca2+, Mg2+, Mn2+, Fe2+ or Zn2+. The histogram shows that mini-cAMPase requires Mn2+ for its catalytic activity. Data are presented as mean values of biological duplicates (n = 2), with error bars representing ±1 s.d. b, Activity was tested against substrates sampling different ground-state charges (from 0 to −2) and transition-state (TS) geometries (tetrahedral and trigonal–bipyramidal). The mini-cAMPase hydrolyses p-nitrophenyl phosphodiesters and phosphonates. The highest activity was observed for cAMP hydrolysis. The scissile bond is highlighted in blue. Cross symbols indicate no detectable activity. R indicates the p-nitrophenol leaving group.

Source data

Next, we probed the substrate specificity by challenging the novel enzyme with a range of model substrates containing p-nitrophenyl leaving groups and encompassing a range of ground-state charges and transition-state geometries. As shown in Fig. 3b, these experiments revealed that the novel enzyme catalyses the hydrolysis of phosphodiester and phosphonate substrates carrying a single negative charge. Furthermore, activity was limited to substrates hydrolysed through a trigonal–bipyramidal transition state.

To assess activity beyond model compounds with nitrophenyl leaving groups, we extended the scope of the substrates to biologically important phosphodiesters. In contrast to the expectation that promiscuous activities identified in screens for thermodynamically undemanding substrates cannot easily be extended to more difficult reactions, we found that the novel enzyme catalyses the hydrolysis of less reactive biological phosphodiester substrates, cAMP and cGMP. It also shows some activity toward deoxyadenosine dinucleotide (dA-P-dA), which can be considered a simple model for DNAse activity (Supplementary Fig. 12). Because our novel protein is most active towards cAMP, we named it mini-cAMPase.

The observation of catalytic activity for a de novo protein expressed in Escherichia coli inevitably raises concerns about the possibility of contaminating activity from endogenous proteins27,28. We ruled out this possibility by performing several control experiments, showing that the observed activity cannot be attributed to endogenous E. coli cAMPase (CpdA)29,30 or other contaminating proteins. First, mini-cAMPase requires a different metal cofactor and has a much lower Michaelis constant than endogenous E. coli cAMPase. Second, the enzymatic activity co-purifies with the mini-cAMPase fraction, independent of the purification method. Third, sequence changes in mini-cAMPase correlate with changes in enzymatic activity. These control experiments thus provide evidence that mini-cAMPase is a de novo phosphodiesterase (for details, see note 1.5 in the Supplementary Information; Fig. 4a–c and Extended Data Figs. 3 and 4).

Fig. 4: Kinetic characterization of enzymatic activity.
figure 4

a, Metal dependence of the mini-cAMPase activity. Spectral scans are shown that report on the appearance of AMP on reverse-phase (RP) HPLC (100 µM mini-cAMPase incubated with 100 µM respective divalent metal and 250 µM cAMP). Each successive trace is 1.5 h apart, with a total time of 18 h. b, Michaelis–Menten kinetics for cAMPase activity in biological duplicates with 50 µM protein and 200 µM MnCl2, measured at 25 °C in PBS (pH 7.4). The purification control (grey) shows the background activity for the same preparation steps on the protein without the His6-tag. Rates are the mean of three independent repeat datasets. Each colour shows a biological replicate. The purification control is shown in grey. c, Michaelis–Menten plot for the phosphodiesterase activity of 50 µM mini-cAMPase and 200 µM MnCl2 with the model substrate bis(p-nitrophenyl) phosphate in biological duplicates. Each colour shows a biological replicate. As before, the purification control is shown in grey. d, Competition between bis-pNPP and cAMP hydrolysis (left) shown in a Lineweaver–Burke plot (right). Lines connect kinetics for a single cAMP concentration, with higher cAMP concentrations shown in darkening hues of green. The shared y intercept indicates competitive inhibition with Ki = 70 ± 8 µM cAMP. Michaelis–Menten kinetics were assayed for bis(p-nitrophenyl) phosphate hydrolysis by 100 µM protein with 200 µM MnCl2 in increasing cAMP concentrations.

Source data

Kinetic characterization and evidence for a common active site

Kinetic characterization of mini-cAMPase with both model and biological substrates revealed the largest catalytic efficiency (kcat/KM) of 2.2 M−1 s−1 for cAMP, which corresponds to a rate acceleration of ~7 × 109 over the uncatalysed background reaction (in the absence of manganese) and a catalytic proficiency of ~7 × 1014 M−1 (Table 1 and Fig. 4b,c). Because of this high activity and biological relevance, cAMP hydrolysis became the focus of our further characterization. As with the fluorogenic substrate mixture, we found that mini-cAMPase best hydrolyses cAMP in the presence of manganese and, to a lesser extent, iron (Fig. 4a). The turnover is slow, but kinetic measurements yield a well-defined Michaelis–Menten profile for both the diesters cAMP and bis-p-nitrophenyl phosphate (bis-pNPP) (allowing calculation of the maximal velocity Vmax and suggesting active-site saturation). Notably, the parental sequence S-824 lacks detectable cAMPase activity (Extended Data Figs. 5f and 7b) but shows low activity towards bis-pNPP (kcat/KM ≈ 4 × 10−3 M−1 s−1, so ~80-fold lower than mini-cAMPase; Extended Data Table 1 and Extended Data Fig. 7c).

Table 1 Michaelis–Menten constants for mini-CAMPase with phosphonate and phosphodiester substrates

Although both the model and biological substrates (Fig. 3b) are phosphodiesters, they have very different structures. Therefore, we were curious about whether mini-cAMPase hydrolysed these substrates using the same active site. To address this question, we probed whether hydrolysis of the model chromogenic substrate bis-pNPP could be inhibited by cAMP. As shown in Fig. 4d (and elaborated in Supplementary Fig. 6), hydrolysis of bis-pNPP in the presence of increasing cAMP concentrations showed competitive inhibition with a Ki of 70 ± 8 µM for cAMP. These results indicate that, despite their structural and electronic differences, both substrates rely on the same active site.

Mutational and structural analysis of mini-cAMPase

Mini-cAMPase carries both a truncation and substitutions when compared to S-824. To address the relative contributions of both factors to the observed catalytic activity, we tested two constructs (Extended Data Fig. 6). First, probing the importance of the truncation, we maintained the substitutions observed in mini-cAMPase but removed the frameshift mutation introducing the truncation. The resulting construct ‘Substituted-824’ preserves the full length of S-824, but contains mini-cAMPase’s mutations ahead of the frameshift mutation (Extended Data Fig. 6a). Substituted-824 expressed well and, in contrast to the parental sequence S-824, showed activity towards both phosphodiester substrates, but with a substantially reduced kcat (~7-fold towards cAMP and ~17-fold towards bis-pNPP as compared to mini-cAMPase; Extended Data Fig. 6d). Second, to probe the role of the substitutions, we introduced the truncation, but not the substitutions, into the parental protein, producing ‘Short-824’ (Extended Data Fig. 6a). Short-824 expressed very poorly (Extended Data Fig. 6c), suggesting that the library substitutions contribute to stabilization of the truncated protein. These tests indicate that the library-designed substitutions and the unexpected truncation both contribute to identifying mini-cAMPase as a hit (Extended Data Fig. 6b).

To identify the residues directly involved in catalysis or binding of the Mn2+ cofactor, we mutated several residues individually to alanine and measured the Michaelis–Menten kinetics of the mutants (Extended Data Table 1). We targeted residues that might be involved in manganese binding, including histidine and cysteine, as well as the arginine consistently introduced by the truncations (Extended Data Fig. 7a). All introduced mutations reduced kcat, with C57A having the largest impact, lowering kcat by ~6.5-fold towards cAMP and ~3.7-fold towards bis-pNPP (Extended Data Fig. 7b,c). This indicates that C57 is involved in, but not essential to, catalysis.

Because only two of the original four helices are present, we surmised that mini-cAMPase might form a helical hairpin that dimerizes into a 4-helix bundle. Consistent with this expectation, size-exclusion chromatography (SEC) showed that mini-cAMPase elutes as a dimer (Fig. 5a,b). This dimer was formed under reducing conditions and confirmed by liquid chromatography mass spectroscopy (LC/MS) to be fully reduced at its only cysteine residue, C57 (Supplementary Fig. 11). Further experiments under oxidizing conditions showed that disulfide bridge formation also leads to dimerization but abolishes activity completely (Fig. 5c,d and Supplementary Fig. 11). Loss of activity implies that the disulfide-linked dimer structure is incompatible with efficient catalysis, although concomitant oxidation of the Mn2+ cofactor cannot be ruled out as a cause for inactivation. The pH dependence of enzymatic second-order rates (Supplementary Fig. 13) shows an apparent pKa of 7.8, similar to the measured pKa of a bona fide phosphodiesterase that also contains Mn2+ (ref. 31).

Fig. 5: Mini-cAMPase is a dynamic α-helical dimer.
figure 5

Across all panels, yellow represents the disulfide-bonded dimer, green is the reduced protein, and black is the parental protein (S-824). a,b, Preparation-scale (a) and analytical-scale (b) SEC shows that reduced mini-cAMPase elutes at double its monomeric molecular mass (that is, it is a noncovalent dimer). c, RP-HPLC traces show reduced (green), partly oxidized (yellow–green, 125 µM H2O2) and fully oxidized (yellow, 250 µM H2O2) protein. d, Activity under a gradient of increasingly oxidizing conditions to promote disulfide formation. Non-reducing SDS–PAGE reveals the relative abundance of disulfides (single measurement). Below each gel lane we show the HPLC traces indicating the corresponding activity of 50 µM mini-cAMPase incubated with 200 µM Mn2+ and 250 µM cAMP. The HPLC traces monitor the appearance of AMP (assayed by RP-HPLC) measured every 2 h for 24 h. The mutant C57A is included for comparison. e, 1H NMR showing the amide region for S-824 (black) and mini-cAMPase (reduced, green; oxidized, yellow). The broad and poorly resolved peaks indicate that the protein does not form a well-ordered structure and remains dynamic, even after oxidation. f, CD spectra showing that mini-cAMPase (green, yellow) is helical, but less so than the parent 4-helix bundle (black). There is no discernible shift in secondary structure caused by oxidation of C57. MRE, mean residue ellipticity. g, Melting curves measuring CD at 222 nm show that oxidized mini-cAMPase (yellow, TM > 90 °C) has a melting curve similar to S-824 (black, TM > 90 °C), with both proteins more thermostable than the reduced mini-cAMPase (green, TM ≈ 72 °C).

Source data

1H NMR and circular dichroism (CD) spectroscopy further indicate that mini-cAMPase is structurally less defined than its parental sequence S-824 (Fig. 5e–g), at least in the absence of manganese. This could indicate that it dynamically samples an ensemble of states, potentially including conformations that enable activity. Alternatively, in an induced fit model, metal-binding could potentially induce a more defined, functional structure in mini-cAMPase that is not sampled by the catalytically inactive ancestor S-824.

We performed molecular dynamics (MD) simulations to probe our hypothesis that changes in dynamics give rise to the functional changes observed in mini-cAMPase. On a timescale of 100 ns, no substantial difference in the per-residue fluctuations between the S-824 ancestor and the mini-cAMPase dimer were detected, apart from the mini-cAMPase dimer having a short, disordered C-terminal tail (Extended Data Fig. 8). Thus, the dynamic effects observed with 1H NMR and CD spectroscopy are likely to occur on a considerably slower timescale than the MD simulations. Interestingly, AlphaFold232,33,34 reproduced the NMR model of S-824, but indicated that the mini-cAMPase dimer adopts a different topological isomer in which the overall topology is mirrored (Extended Data Fig. 9). Curiously, EMSfold35 predicted that S-824 populates the other topoisomer, leaving two candidate structures to be considered. To dissect possible topological dynamism, we turned to MultiSFold36, a computational framework aimed at predicting conformational isomers. MultiSFold36 predicted a single isomer resembling the NMR structure for S-824 but a 40:60 ratio of the mini-cAMPase topoisomers. This conformational diversity is consistent with the observations from 1H NMR and CD spectroscopy and could indicate that mini-cAMPase exists in an equilibrium of distinct dimeric conformations that adopt different topologies and exchange slowly (on or greater than a microsecond timescale).


Mini-cAMPase has several notable features. (1) It was isolated from a library of variants that is relatively small (~1.7 × 106) compared to the vast sequence space available to natural evolution (20102 ≈ 10132 for a protein of the same length as S-824 or 1078 for a 59-residue mini-cAMPase). (2) Although the parental S-824 shows no phosphoesterase activity in our lysate assays, the selected mini-cAMPase recruits a metal cofactor to catalyse phosphodiester hydrolysis with high catalytic proficiency. (3) Although the inactive parental sequence was 102 residues long, screening for activity led to a truncation protein of 59 residues, approximately half the length of the parent. (4) Although the S-824 parent is a well-ordered monomeric 4-helix bundle, the selected mini-cAMPase enzyme forms a 2 × 2 helix dimer with increased flexibility after loss of a covalent constraint imposed by the original single-chain protein.

Given that mini-cAMPase has no evolutionary history, its catalytic efficiency for cAMP hydrolysis (kcat/KM ≈ 2.2 M−1 s−1; Table 1) is remarkable, being only ~1,000-fold lower than that of, for example, the native cAMPase (CpdA) from E. coli (Supplementary Table 3). Put in the broader context of the average natural enzyme that is acting on its preferred substrate (kcat/KM ≈ 105 M−1 s−1)37, mini-cAMPase is orders of magnitude less active. However, its kcat/KM is only ~14-fold lower than the median value of 31 M−1 s−1 reported for natural enzymes catalysing a promiscuous reaction38, and a ‘head start’ activity7 as the basis of further evolution is conceivable, enabling rounds of directed evolution. Although the first-order rate constant kcat of mini-cAMPase lags behind that of naturally occurring phosphodiesterases specific for cAMP (2.2 × 10−5 s−1 versus 10−1 to 103 s−1; Supplementary Table 3), mini-cAMPase displays a substantial rate enhancement (kcat/kuncat) of 7 × 109 and a catalytic proficiency ((kcat/KM)/kuncat) in the range of 7 × 1014 M−1 over the uncatalysed background reaction (Table 1). These parameters are comparable to those reached by some enzymes with their native substrates, although higher (and lower) values are on record39. Our work shows how catalysis of a difficult biologically relevant reaction with accelerations (kcat/kuncat) approaching those of large natural enzymes selected by billions of years of evolution can be brought about in an efficient screen of a million-membered library.

While representing only a small fraction of all possible diversity, the successful outcome of the screen emphasizes that Dayhoff was not wrong about the odds for finding catalysts, even in such scenarios of partial sampling of sequence space. mini-cAMPase is a rare example of rapid functionalization of a short, inactive peptide, validating a broadened version of the Dayhoff hypothesis13 and dismissing the sceptical (or creationist) view that the sequence space for peptides with no biological ancestry does not hold sufficient solutions for catalytic challenges. In particular, Dayhoff’s notion that assemblies of primordial peptides (for example, generated by duplication) can aquire functions is reflected in our findings, even though in our case a dimer is generated by truncation of a longer peptide, in contrast to Dayhoff’s original work focusing on duplication of a short peptide12. In this reverse mode of evolution (that is, from larger peptides to smaller ones), the effect of disassembly may be analogous to the effect of insertions and deletions (InDels) as motors of evolution by enabling disruptive but innovative changes that help the acquisition of function40,41,42. Specifically, the evolution of a well-packed, designed protein may need departure routes to structural, topological and conformational alternatives that hold catalytic solutions (as evidenced in our MD simulations; Extended Data Figs. 8 and 9).

Our work advances previous research studying the catalytic activity of semi-random 4-helix bundles as de novo designed proteins. For example, mini-cAMPase acts on a more thermodynamically challenging substrate than our previously reported 4-helix bundle ATPase, and does so with a higher kcat (ref. 43). Its small size is also noteworthy, being roughly half of our previously reported dimeric 4-helix bundle that hydrolyses enterobactin44,45. The lack of a designed metal-binding centre also stands in contrast to previous work looking at designed metalloenzymes46,47; unlike many designed metalloenzymes, mini-cAMPase makes use of Mn2+ largely by serendipity. The catalytic proficiency of mini-cAMPase (~1014 M−1; Table 1) goes beyond previous models: for example, a partly designed and evolved 4-helix bundle metalloprotein with a catalytic proficiency of 9.3 × 1010 M−1 for carboxyesters48 further designed and evolved into a Diels–alderase49 with a catalytic proficiency of 2.9 × 1011 M−1. Similarly, a previously reported50 de novo helix-turn-helix peptide that dimerizes into a 4-helix bundle hydrolyses the phosphodiester p-nitrophenyl phosphate (in the absence of metal) with a second-order rate constant of 1.58 × 10−4 M−1 s−1, resulting in a catalytic proficiency of ~2 × 1011 M−1, which is three orders of magnitude lower than for the mini-cAMPase. mini-cAMPase also surpasses the rate enhancement of a zinc-based biomimetic phosphodiesterase model by five orders of magnitude51. These distinguishing features make mini-cAMPase quite unlike previously characterized minimalist enzyme models.

Intriguingly, the recruitment of a divalent metal cofactor for phosphodiester hydrolysis by mini-cAMPase recapitulates the evolution of naturally occurring phosphodiesterases. Extant, naturally occurring cAMP-hydrolysing enzymes have emerged independently in three different folds, converging towards mechanisms that involve a divalent metal ion cofactor52. Most natural phosphodiesterases use Zn2+, and the E. coli cAMPase (CpdA) requires Fe2+ or Mg2+ (ref. 29). In contrast, mini-cAMPase requires Mn2+ for activity, suggesting an unprecedented variation to the theme of divalent metal-ion catalysis.

The strategy of starting a combinatorial screen by randomizing a stable fold followed by ultrahigh-throughput screening with multiple cofactors and substrates has resulted in an enzyme with catalytic proficiency surpassing that of most de novo designed and evolved proteins by several orders of magnitude. The isolation of mini-cAMPase from a relatively small library (compared to the theoretical size of the sequence space) was facilitated by two factors. First, the use of microfluidic droplet sorting allowed screening for multiple-turnover catalysis at very high throughput (4.4 million clones screened in just 4 h), ensuring that most clones of our million-membered library were sampled at least once (~2.6-fold oversampling). Microfluidic droplet screening has been used previously to evolve or enrich (mostly hydrolytic) enzymes with known activity from large libraries by fluorescence8,53,54,55,56,57,58,59,60 or other detection modes61,62, but has never been applied to a library of de novo designed proteins with unknown function. Our findings suggest that targeted randomization of previously inactive scaffolds, when analysed by ultrahigh-throughput technologies, provides unexpected catalytic solutions (for example, truncation) that complement the lessons about the acquisition of function in contemporary enzymes. Previous examples of generating function from inactive scaffolds also relied on ultrahigh-throughput screening directly for catalysis, like mRNA display, albeit with much larger >1012-membered libraries9,63. Second, targeted randomization of a de novo sequence designed by binary patterning to fold into a stable 4-helix bundle enhanced the number of sequences in the library displaying a stable fold20.

Unexpectedly, selection for functional catalysts enriched library members that ‘escape’ the pre-defined stable 4-helix bundle fold by truncation to specific lengths, forming a helix-turn-helix motif. Although such major truncations are usually expected to be detrimental to protein function, NGS confirmed this as a general trend across the entire library rather than a mutational accident (Fig. 2). As characterized in detail for the case of mini-cAMPase (Extended Data Figs. 8 and 9), the remaining 59 residues maintain the binary patterning of the original design. Observing a dimer formed of α-helices (Fig. 5a,b) suggests a return to a 4-helix bundle fold, albeit with additional degrees of freedom that may employ dynamic effects in binding and catalysis (Fig. 5e–g). It has been pointed out that the appearance of symmetrical structures is favoured as a low-complexity (high-symmetry) phenotype64 due to the fact that symmetrical structures are stochastically more likely to appear and are therefore predicted to prevail due to an arrival-of-the-frequent mechanism65. In analogy to our findings, ‘creative destruction’ has been proposed as a mechanism for the emergence of new protein folds from existing, simpler structures66.

The selection of a truncated library member invites speculation that the 4-helix bundle fold of the library ancestor S-824 is very stable but structurally limited, as its well-folded state was designed to occupy a distinct energetic minimum in the protein folding funnel. Compared to its parent S-824, the structure of mini-cAMPase is highly dynamic (Fig. 5e–g), possibly fluctuating between multiple conformational states or dimeric arrangements. This feature might represent a departure from S-824’s ‘frozen’ conformation in a deep thermodynamic well to a less well-defined folding landscape that can select alternative conformations conducive to catalytic function67,68,69. Sampling new conformational states or different topologies may also provide functional innovation that can be further enriched in subsequent steps of evolution70,71. The disorder seen in mini-cAMPase can thus be likened to a catalytically proficient molten globule enzyme72 or the melting of a zinc finger that conferred ligase activity to a non-catalytic scaffold10. Alternatively the dynamic behaviour may simply be a sign of damage incurred by mutation70,71, where structural disorder is responsible for ineffecient catalysis, even though the disruption of structure may be necessary to bring about a new function in an existing scaffold73,74.

Evolved natural enzymes typically fold into large well-ordered structures with pre-organized active sites. Presumably, these were selected to favour highly active and substrate-specific specialists that can be allosterically regulated. In the early stages of evolution, however, it may have been advantageous to express dynamic sequences that sampled an ensemble of states. Although these molten structures probably had low activities, they may have been able to catalyse several chemical reactions, acting on a range of substrates. Such multifunctional generalists would have enhanced the ‘catalytic versatility of an ancestral cell that functioned with limited enzyme resources’6,75. In addition, small, dynamic proteins may have advantages as starting points for further evolution and adaptation. Although large modern enzymes are restricted by an epistatic burden that causes mutations to interfere with structural and functional innovation76,77,78,79, small dynamic proteins could escape the limits imposed by cooperative effects and become functionally more versatile and more evolvable by reducing the cost of innovation. These patterns may reflect the role of smaller, structurally versatile peptides in Dayhoff’s model12,13. Here, functional proficiency was found by seemingly going back to the origins of life, paradoxically reaching improvements in primitive rather than sophisticated scaffolds that, as low-probability events, nevertheless become accessible via ultrahigh-throughput screening.


Library preparation

The library encoding the partly randomized de novo designed gene S-824 was cloned from the original pET11a vector20 into the high-copy-number vector pASK-IBA5+ (IBA Life Sciences) using the restriction sites XbaI and BamHI. This allowed highly efficient DNA recovery by transformation after droplet sorting as well as strain-independent tetracycline-inducible expression83.

Microfluidic library screening

Microfluidic library screening was carried out as previously described in detail in ref. 59. In brief, the library was electroporated into E. coli cells (E. cloni 10G Elite; Lucigen), yielding ~107 colonies after overnight incubation on agar plates, as determined by serial dilution. After induction and incubation for protein expression, the cells were washed in HEPES buffer (50 mM HEPES-NaOH, 150 mM NaCl, pH 8.0) and encapsulated together with the substrate mixture (~100 µM), metal mixture (MnCl2, ZnCl2 and CaCl2 at 10 µM, respectively) and lysis agent in monodisperse water-in-oil droplets on a flow-focusing chip (Supplementary Fig. 2a), generating droplets with a volume of 3 pl at rates of 0.5–3 kHz. The droplets were collected in a closed storage chamber, fabricated from an eppendorf tube as previously described84. The microfluidic devices were fabricated by soft lithography as previously described59. After incubation at room temperature, the droplets were reinjected from the collection tubing into the sorting device (Supplementary Fig. 2b). In contrast to the previously described sorting device59, this chip featured an additional flow of ‘bias oil’ from the side that forced the droplets further away from the hit channel at a flow rate of 30 µl h−1, reducing the number of false-positive droplets accidentally flowing into the hit channel. Droplets were sorted according to their fluorescence at a rate of ~0.8 kHz and collected into a tube pre-filled with water. After sorting, the hit droplets were de-emulsified by the addition of 1H,1H,2H,2H-perfluorooctanol (Alfa Aesar), the solution was purified and concentrated by column purification (DNA Clean & Concentrator-5 kit, Zymo Research), and the plasmid DNA was recovered by electroporation into highly electrocompetent E. coli cells, as previously described in detail59. In the initial sorting, permissive conditions were chosen: the cells were encapsulated at an average droplet occupancy (λ) of 0.43 and the droplets were incubated for three days. Out of 10.3 million screened droplets, 53,000 droplets were sorted (approximately the top 0.5%). In the second sort, screenings were conducted under more stringent conditions, with cell encapsulation at λ = 0.1, and the droplets were incubated for seven days. Of 4.4 million screened droplets, 7,500 droplets were sorted (approximately the top 0.2%).

Microtitre plate screening

To quantify the lysate activity of the library variants, individual colonies were picked and grown in 96-deep-well plates in 500 μl of Lysogeny broth (LB) medium (containing 100 μl ml−1 carbenicillin) at 37 °C/1,050 r.p.m. for 14 h. Subsequently, 20 μl of overnight cultures were used to inoculate 880 μl of Terrific broth (TB) medium (containing 100 μl ml−1 carbenicillin and 2 mM MgCl2) for expression cultures in 96-deep-well plates, which were grown at 37 °C/1,050 r.p.m. for 2–3 h until reaching an optical density at 600 nm of ~0.5. Expression was then induced with anhydrotetracycline (final concentration 200 ng ml−1; IBA Life Sciences) and carried out for 16 h at 20 °C and shaking at 1,050 r.p.m. Cells were pelleted by centrifugation at 3,200 × g for 60 min, then the supernatant was discarded. Subsequently, the cells were lysed by a freeze–thaw cycle, followed by resuspension of the dry pellets by vortexing (~1 min) and subsequent addition of 150 µl of lysis buffer (HEPES buffer containing 0.35X BugBuster Protein Extraction Reagent (Novagen) and 0.1% Lysonase Bioprocessing Reagent (Novagen)). The cells were incubated for 20 min at room temperature in lysis buffer on a tabletop shaker (1,000 r.p.m.) and subsequently subjected to 30 min of thermal denaturation at 75 °C. The lysate was cleared by centrifugation for 1 h at 3,200 × g, and 60 µl of the supernatant was used for the activity assay. For the reaction, 140 μl of the bait substrate mixture was added to 60-μl aliquots of the cleared lysate in microtitre plates. The assay was carried out in HEPES buffer with a final concentration of 10 µM of the bivalent cation mixture (ZnCl2, MnCl2 and CaCl2) and ~20 µM of the fluorescein phosphate bait substrate mixture. The formation of fluorescein was recorded in a plate reader (Infinite M200; Tecan) for 30–60 min at an excitation wavelength of 480 nm and an emission wavelength of 520 nm.

NGS and data analysis

After droplet sorting and DNA recovery by electroporation, plasmid DNA was extracted from all obtained E. coli colonies using the GeneJET Plasmid Miniprep Kit (ThermoFisher Scientific). The variable region of the library (positions 19–83) was amplified with polymerase chain reaction (PCR) using Q5 polymerase (NEBnext UltraII Q5 Master mix) with primers including adapters for NGS in the overhang and different indices for each sample for multiplexing (Supplementary Table 1). The cycle number was optimized using quantitative PCR with variable template concentrations to be below 15 cycles for the final PCR reaction to minimize amplification bias. AMPure Speed Beads (Beckman Coulter) were used to purify DNA after amplification. The samples were processed into Illumina TruSeq libraries by the University of Cambridge Department of Biochemistry sequencing facility according to the manufacturer’s instructions. Sequencing was performed with one Illumina MiSeq 2 × 300-bp run (20% PhiX spike-in), yielding 8.5 × 106 sequences for the input library, 4.8 × 105 sequences for sorting 1 and 6.5 × 105 sequences for sorting 2. Adapters were removed, reads were merged, and individual sequences were counted using the DiMSum pipeline using the following parameters: cutadaptMinLength 10, vsearchMinQual 20, cutadaptErrorRate 0.6, paired T, indels all, maxSubstitutions 100, mutagenesisType codon, mixedSubstitutions T85 giving information on the occurrence of 1.4 million unique variants in the input library versus the libraries at post-sorting. The processed read counts were analysed using custom Python scripts (available at to count the frequency of truncations and frameshift mutations among all library members and at individual positions.

Protein expression and purification

A single colony of E. coli BL21 carrying the plasmid to express the protein of interest was used to inoculate a starter culture in LB medium and grown at 37 °C for 12 h. This started culture was used to inoculate 1 l of LB medium (1:200 dilution), which was grown at 37 °C until it reached an optical density at 600 nm of 0.6, at which time anhydrotetracycline was added to a final concentration of 200 ng ml−1 to induce protein expression. After 12 h of expression at 18 °C, cells were collected via centrifugation (10 min, 4,000 ×g) and stored at −80 °C.

To purify protein, the cells were resuspended in phosphate buffer (50 mM Na2HPO4, 300 mM NaCl, pH 8.0) and lysed by sonication on ice at 45% amplitude for twenty 10-s bursts with 50 s between sonication events. Following centrifugation (20 min, 18,000 × g) and filtration, clarified lysate was loaded onto a nickel column (GE HisTrap HP, 5-ml volume) and purified by imidazole elution (phosphate buffer with 500 mM imidazole). His6-tagged protein was eluted with 375 mM imidazole, while untagged protein still weakly stuck to the column and could be eluted with 10 mM imidazole. This was followed by SEC (HiLoad 26/600 Superdex 75 pg) with Tris buffer. The elution time of the peak by SEC corresponds to ~14 kDa when compared to protein standards, which supports a dimeric structure. Mutations did not alter the SEC profile.

To keep the protein reduced throughout, 1 mM dithiothreitol (DTT) was added after elution from the nickel column, and 1 mM tris(2-carboxyethyl)phosphine (TCEP) was added after elution from the sizing column. Proteins were aliquoted, frozen in liquid nitrogen, and stored at −80 °C for subsequent assays. All protein was also purified metal-free (apo) by the addition of 5 mM EDTA (pH 8.0) after elution from the nickel column. The EDTA was removed by the subsequent SEC step. For protein oxidation, reducing agent was removed by a PD-10 desalting column (GE Healthcare) exchange to Tris buffer at a final protein concentration of ~120 µM. The protein was then split into samples to which increasing concentrations of hydrogen peroxide were added to promote disulfide formation87. The samples were then left in a cold room (4 °C) in the dark for 24 h. In the absence of hydrogen peroxide, the protein remained reduced due to the residual reducing agent, but increasing concentrations of hydrogen peroxide allowed the protein to form disulfide-bonded dimers (Fig. 5c,d). To monitor the presence/formation of disulfide-bonded dimers, proteins were analysed by HPLC as described in the following, or by 12% sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS–PAGE; Bio-Rad) at 20 µM concentration without added reducing agent in the loading buffer.

Size-exclusion chromatography

SEC was performed on an ÄKTA Pure FPLC system (General Electric). Preparation-scale SEC was carried out on a HiLoad 26/600 Superdex 75-pg column, and analytical SEC was run on a 5/150GL Superdex 75 increase column (Cytiva). Calibration was done against the external standards γ-globulin (bovine), ovalbumin (chicken), myoglobin (horse) and vitamin B-12 (Bio-Rad).

Confirmation of protein purity

To assay for contaminating endogenous E. coli proteins, both RP-HPLC and MS were used (Supplementary Fig. 7). To validate protein purity by RP-HPLC, 10 µl of purified protein was injected into a C-18 column (Agilent Zorbax 300SB-C18, 5 µM, 2.1 × 150 mm) without dilution or buffer exchange. Solvent A was water with 0.1% trifluoroacetic acid (TFA), and solvent B was acetonitrile with 0.1% TFA. The gradient was 97% solvent A/3% solvent B from 0 to 5 min, 97% solvent A/3% B to 100% solvent B from 5 to 25 min to elute proteins, 100% solvent B from 25 to 30 min to wash the column, and then 97% solvent A/3% solvent B from 30 to 35 min to re-equilibrate. The only peak detected was the de novo protein. The fractions were then run through electrospray ionization MS (ESI-MS; Agilent 6210 TOF LC/MS) to confirm the protein identity. From the HPLC runs, proteins were quantified by their area under the curve (AUC) using the calculated extinction coefficient of 2,980 M−1 cm−1 for His6-tagged proteins and 1,490 M−1 cm−1 for proteins without His6-tags:

$${n}={\frac{{\rm{AUC}}\times F}{d\times \varepsilon }}$$

where n is the amount of substance of the analyte (in mol), AUC is the area under the curve (in s), F is the flow rate (in l s−1), d is the optical pathlength (in cm) and ε is the extinction coefficient (in M−1 cm−1).

Steady-state kinetics

Substrate concentrations were chosen to span approximately tenfold below and above KM, unless limited by substrate solubility. The optimal starting enzyme concentration (E0) and substrate concentration ranges were determined for each variant and substrate combination by empirical sampling. Kinetics with mini-cAMPase (Table 1) were measured at an enzyme concentration of 50 µM and, because mini-cAMPase functions as a dimer, the kinetic parameters were calculated per protein dimer. The substrates were pre-dissolved in dimethyl sulfoxide (DMSO) at stocks of 200-fold the final concentration to ensure a constant DMSO concentration (0.5%) across all substrate concentrations. Upon measurement, aliquots of these substrate stocks were diluted 1:100 in HEPES buffer containing 5 mM DTT or 1 mM TCEP, of which 100 μl was subsequently mixed with 100 μl of twofold-concentrated enzyme solution in microtitre plate wells. For substrates with a p-nitrophenol leaving group, the progress of the reaction was monitored by absorbance at a wavelength of 405 nm in a spectrophotometric microplate reader (Tecan Infinite 200PRO; Tecan) at 25 °C. The initial rates were extracted by a linear fit of the first measurements (at <10% progress of the reaction) for each substrate concentration and normalized with an extinction coefficient determined from a calibration curve. To determine the Michaelis–Menten parameters kcat and KM, the data were fitted to the following equation using the nonlinear fitting function nls() in R88:

$${\frac{v}{[{E}_{0}]}}={\frac{{k}_{\rm{cat}}\times [S]}{({K}_{\rm{M}}+[S])}}$$

where v is the initial rate of the reaction (in mol s−1), [E0] is the initial enzyme concentration (in M), kcat is the turnover rate (in s−1), KM is the Michaelis constant (in M) and [S] is the substrate concentration (in M).

For cyclic nucleotide hydrolysis, unless otherwise indicated, the assays used fully reduced proteins in Tris-buffered saline (TBS) and 1 mM TCEP as described above. For each assay, protein aliquots were thawed from storage at −80 °C, diluted to the appropriate concentration, after which metals and substrate were added. Metals and substrate were added from 10X stocks. Nucleotides and metals were in TBS and bis-pNPP was in DMSO. All assays included side-by-side samples with substrate, metal and buffer but no protein to measure autohydrolysis, which was used to baseline the measurements.

Cyclic nucleotide hydrolysis was quantified by ultraviolet absorption at 254 nm for AMP and cAMP separated by HPLC, or 260 nm for GMP and cGMP. Assays were performed on an Agilent 1100 series HPLC system with a reverse-phase column (Agilent Zorbax 300SB-C18, 5 μm particle size, 2.1 × 150 mm). Solvent A was water containing 0.1% TFA and solvent B was acetonitrile containing 0.1% TFA. Elution was isocratic, with 3% column B for 5 min, followed by a 5-min flush with solvent B and a 5-min re-equilibration with 3% solvent B. An example separation is shown in Supplementary Fig. 10a. Time-resolved kinetics were followed by 10-µl sample injections using the autosampler module with a 15-min runtime and an isopropanol wash of the needle between injections.

Turnover was quantified by the AUC for AMP or GMP. An external standard curve (Supplementary Fig. 10b) matched the theoretical signal calculated by equation (1), with the NTP extinction coefficient taken from ref. 89. The bis-pNPP hydrolysis was measured in microtitre plates using the absorbance at 405 nm measured using a Thermo Scientific Varioskan system. Turnover was quantified by an external standard curve using para-nitrophenol. dA-P-dA hydrolysis was monitored like the cyclic nucleotides, with turnover calculated using the appearance of both products and the absorbance of the adenine nucleobase present in both products (dA-P and dA). Sample raw data are shown in Supplementary Fig. 12.

Circular dichroism

CD data were collected on a Chirascan CD spectrometer (Applied Photophysics) from 200 to 260 nm in triplicate and averaged. Far-ultraviolet CD spectra were collected using a 1-mm-pathlength cuvette and 40 μM mini-cAMPase (or alanine mutant) in Tris/HCl buffer or 20 μM S-824 in Tris/HCl buffer.


Protein was concentrated with centrifugal filter units (Amicon, 3-kDa molecular weight cutoff) to a final concentration of 1 mg ml−1 in PBS. Proton spectra were collected at 25 °C by using a Bruker Avance III 800-MHz spectrometer. The 1H chemical shift was referenced to the DOH line.


PCR mutagenesis was performed by whole-plasmid PCR with Q5 high-fidelity polymerase (New England Biolabs) followed by PCR purification (Qiagen) and treatment with KLD Enzyme Mix (New England Biolabs). The primers were made by Sigma-Aldrich and are listed in Supplementary Table 4. His6-tagged proteins included a tobacco etch virus protease cleavage recognition site.

MD simulations

The MD simulations were performed with Amber1890: 100-ns simulations were run for 1P68, the 1P68 structure predicted with ESMfold35, and the structure of mini-cAMPase dimer predicted by AlphaFold232,33,34. The system was parametrized using tleap90, and enzymes were solvated in a 12.0-Å octahedral box of TIP3P water91,92 with net charge neutralized by the addition of sodium ions. The ff19SB force field93 was used to describe the protein. All systems were minimized using 10,000 steps of steepest descent followed by 10,000 steps of conjugate gradient. Subsequently, the system was heated from 50 K to 300 K in 20 ps, and then simulated for 100 ns in the NPT ensemble, saving a frame every 100 ps. Langevin dynamics were used with a collision frequency of 0.2 and a 2-fs time step. A Berendsen barostat was used with isotropic position scaling. All bonds involving hydrogens were constrained using the SHAKE algorithm. Ten independent simulations were run per enzyme variant, resulting in a total simulation time of 1.0 µs per variant. All calculations were performed with the Amber18 program package (sander.MPI for minimization and pmemd.cuda for MD simulations)90. MD simulations were analysed using CPPTRAJ94. All analyses were based on Cα positions. The first 50 ns of each MD run were excluded to allow sufficient time for system equilibration. Root-mean-square deviation (r.m.s.d.) values were calculated compared to the minimized starting structures. Root-mean-square fluctuations (r.m.s.f.) were determined by first calculating an average structure for each replicate, aligning the trajectory against the average structure, and then calculating the r.m.s.f. for each protein residue. Errors indicate the standard error of ten independent replicates.

Structure prediction

AlphaFold232,33,34, EMSfold35 and MultiSFold36 were used to predict the structures of the mini-cAMPase dimer and S-824. AlphaFold2 structure predictions were run with google colab ( EMSfold structure predictions were run for S-824 with Conformer predictions for S-824 and mini-cAMPase were run with MultiSFold36 (

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.