An on-demand, drop-on-drop method for studying enzyme catalysis by serial crystallography

Serial femtosecond crystallography has opened up many new opportunities in structural biology. In recent years, several approaches employing light-inducible systems have emerged to enable time-resolved experiments that reveal protein dynamics at high atomic and temporal resolutions. However, very few enzymes are light-dependent, whereas macromolecules requiring ligand diffusion into an active site are ubiquitous. In this work we present a drop-on-drop sample delivery system that enables the study of enzyme-catalyzed reactions in microcrystal slurries. The system delivers ligand solutions in bursts of multiple picoliter-sized drops on top of a larger crystal-containing drop inducing turbulent mixing and transports the mixture to the X-ray interaction region with temporal resolution. We demonstrate mixing using fluorescent dyes, numerical simulations and time-resolved serial femtosecond crystallography, which show rapid ligand diffusion through microdroplets. The drop-on-drop method has the potential to be widely applicable to serial crystallography studies, particularly of enzyme reactions with small molecule substrates.

The authors present a nice time-resolved serial crystallography study using a drop-on-demand delivery system. They claim to be able to study chemical reaction in protein crystals using diffusing mixing in nanoliter droplets of a crystal slurry. This is a very interesting study which focuses on two proteins, the well know lysozyme and CTX-M15 kinetics. Although there are several different methods published based on drop-on-demand this is the first of its kind (to my knowledge) to demonstrate its ability to show chemical mixing and studying the kinetics of enzymes, which is of great interest to biologist. Current time resolved methods are limited to pump-probe experiments which limits the variety of proteins we can study. The authors claim this method is more feasible in terms of sample consumptions by using microdroplets. This is a promising technique which does drastically reduced the volume of sample to collect complete data sets.
Major comments: 1. The authors provide a nice representation of the time series data of the lysozyme enzymatic reaction with its ligand from rest state up to 2s. The structural analysis clearly shows the gradual docking of the ligand into the active site, with the high-resolution structural information to support this. They also demonstrate the method works for CTX-M-15 however the complete reaction of ligand docking was not detected, as the data collection time did not exceed the 2s mark for the XFEL experiment and the authors seem to have missed the completed reaction time. Instead they used a static time point collected on a fixed target at a different lights source which is OK, but it would have been nice to see the complete reaction at SACLA using the drop-on-demand source. Was there a particular reason for this? Is the drop and demand method limited to a certain duration of kinetics or are we able to capture longer time frames than 2s. This was not clear in the manuscript. It would also help to state the current limitations of this device if any? What are the maximum droplets sizes possible before the diffusion reactions rates are affected? Despite this, their results do capture a different state of the ligand at 2s. The question however still remains, does the reaction go to the completion of what was seen in the fixed target experiment and are there more intermediate states.
2. What would make this article more informative is the specific Kinetic values for the enzyme and ligands presented in the supplementary information to be inserted within the main text (p6, line 132). What possible reasons could be associated with not seeing the completed reaction with in the 2s time frame of XFEL data collection, given that the enzymatic data shows this should have been possible?
3.The mixing simulations presented in the paper show very different results to the experimental data. From my reading of the manuscript, two types of mixing have been simulated, diffusion and hydrodynamic. Fig 1b shows both simulations along with experimental fluorescent mixing studies. The hydrodynamic simulation looks to be too rapid and the diffusion looks to be too slow, compared with the experimental fluorescents data. Given the fact that the CTX-M-15 reaction was not completed within 2s (even in excess ligand concentrations) what conclusion can be drawn from the simulations?
Minor comments: P3, line 54: '….using the appropriately small crystals,….' Please specify what you consider this range to be. I think this is an important number which needs to be considered for this type of experiment. Are we talking about nm or sub-micron? P6, line 132: 'ligand affinity (uM vs, mM)…' It would be helpful to have the actual #s here to make an easier comparison.
P8, line 195, Were the substrate solutions filtered prior to mixing? P9, line227: It is stated in the manuscript that ' volume of 60pl to yield equilibrium…' It is not clear in the manuscript why 60pl drop size was used for the CaCl2 solution and 120pl was used for the HEWL experiment. Also, given that the same size 100um orifice was used how was the volume of the drop altered.
P10, line 23, '…..normalized spectra) scaled ….' Remove the bracket. In the history of biochemistry, many researchers have studied enzymes, but it has been difficult to completely elucidate their chemical reactions due to the limitations of the technology to visualize the movement of molecules during their catalyses. This situation is now changing with the development of XFEL technology in the last decade. Time-resolved serial femtosecond crystallography (tr-SFX) is the only technique that can visualize the movement of biological macromolecules during function with atomic/femtosecond-time resolutions under damage-free room temperature states. However, in tr-SFX using conventional injectors, it was troublesome to optimize the measurement conditions by repeated trial and error for each type of protein. Furthermore, the consumption of a huge amount of sample was a problem.
In the present work, Butryn et al. have succeeded in overcoming the problems by demonstrating the application of the drop-on-drop method developed from the drop-on-tape XFEL sample delivery system to enzyme-catalyzed reactions in microcrystals. I appreciate the efficiency of this measurement system and its potential versatility for a wide variety of enzymes.
I request that the authors answer the following questions/comments.
1. The microcrystals of HEWL and CTX-M-15 were both obtained with salt-based precipitants, and their slurry seems to be low viscosity. I expect the authors present reference materials in the present or the future paper that evaluate the diffusion efficiency when using various highly viscous slurries (e.g. 20% PEG4000) in contrast to the drop size.
2. If possible, the authors should prepare a movie showing the efficient mixing of PEI drops on a ADE drop as supplementary data, for a detailed understanding of the measurement system. It does not have to be a movie of the actual measurement at SACLA, but it can be a movie of fluorescent dye taken offline. Figure 3, it is clear that the contrast between blue and red in the 700 ms figure of (a) is different from that in the other figures in (a). What is the cause of this? 4. I ask a question to gain a better understanding of the quality of the structural data in this study. In Supplementary  ? I would like to see a table of CC1/2 values for each resolution  shell. 6. In supplementary Table 4, the kcat value seems to be quite low. What is the kcat in the solution state, not in microcrystals? What is the optimal pH of the enzyme?

NCOMMS-21-04895-T. An on-demand, drop-on-drop method for studying enzyme catalysis by serial crystallography.
We thank all the reviewers for their very positive comments. The comments are copied below (in black) and our responses are in blue. We have addressed all the comments fully, and we think the suggested changes and points for discussion have substantially improved the manuscript.

Reviewer #1
The object of any time-resolved crystallography experiment is generally the identification of kinetic mechanisms that evolve in a crystal and the characterization and evolution of the population of states that comprise those mechanisms. Several groups have begun to develop technologies that try to explore the boundaries of what is measurable at both XFEL and synchrotron sources -the development of these new methods can be both complicated and challenging. The major goal of this study is to create a flexible system for serial crystallography that will ensure efficient data collection without the need for optical excitation while ensuring a wide gamut of timescales can be measured. The manuscript presents a modification to the "drop-on-tape" design with the addition of another piezoelectric droplet ejection head which allows for the dispensation of multiple picolitre droplets on a larger nanoliter droplet containing a slurry of crystals -allowing for a "dropon-drop" mixing approach. Overall, the paper is clear and well written, the concept is sound, the data are convincing, and this is another novel piece of technology with the potential to become useful in the long-term as the field of time-resolved crystallography grows. Therefore, assuming my comments below can be answered, I would recommend publication in Nature Communications.
There is an apparent appeal of XFEL sources for both serial and time-resolved approaches, given the ultrashort pulse lengths and peak brightness, it's conceivable why beamtime on these sources is in high demand. Since, the measurable time-resolution should dictate the choice of radiation source. I would like the authors to comment whether their system can be used at synchrotrons. Are there any drawbacks, given the timescales that are capable with their setup this should be completely possible? Are there limitations to this setup at synchrotrons? Given the wider availability and use of microfocus beamlines, their setup can potentially become available to a larger array of users. I am missing this discussion.
We totally agree with the reviewer. In the first version of the manuscript, we focused entirely on tr-SFX and the discussion on applicability of the drop-on-drop method at synchrotron sources and the concept of tr-SSX was not included. In the revised version, we made changes throughout the manuscript, where we emphasize that the drop-on-drop, as well as time-resolved crystallography experiments in general, are as applicable to synchrotron sources as they are to XFELs (especially in the Introduction and Discussion sections, for example: P3, line 72; P3/4, lines 75-78; P4, line 85; P7, lines 167-173).
The main bottleneck in accommodating methods like drop-on-drop at synchrotron sources is the minimum exposure time of the crystals to X-rays that is required at MX beamlines in order to obtain enough signal, which automatically limits the time resolution of the method.
Longer exposure times also come with the additional problem of radiation induced damage to the crystals at room temperature. With new upgrades at MX beamlines these problems can be mitigated, when sub-millisecond exposure times will become routine (P7, lines 167-173). Moreover, presently it is difficult to accommodate the setup (designed for XFEL beamlines) on most MX endstations at synchrotrons because of its size. With this in mind and also to make it easier to transport, a smaller version of the setup is being designed and will be available soon.
Given the large size of the nanoliter sized droplet containing the crystal slurry, minimizing this volume ensures lower background scattering and will speed up diffusion. Given that the crystals are not always homogenously distributed throughout the droplet, what would the effects on the diffusion time be if the crystals were clumped together or isolated in one particular portion of the droplet furthest away from where the picolitre droplets make contact?
These are all valid general concerns and will have to be dealt with on a case-by-case basis.
Diffusion time is proportional to the square of diffusion distance. If the particle movement is caused only by diffusional flux, that creates a gradient of the substrate through the drop. For a droplet size in the few nL range, the time to reach equilibrium would be then significantly longer than the average time of the enzymatic reaction. As our simulations show, however, the impact of droplet collision plays a major role in speeding up the rate at which substrate is distributed through the drop. Consequently, we think that the distribution of crystals in the drop is less of a concern in that case as the equilibration of the ADE droplet with the ligand is faster than the delay times we probed in our study.
We do not have a way to account for increased diffusion times caused by crystal clustering or the presence of crystals that are much bigger than the average, except to avoid these issues to begin with. We noticed that crystals tend to cluster with prolonged storage and therefore whenever possible we grow the crystals on site just before the measurement time. Also, crystal samples are filtered in order to remove any big crystals/larger crystal clusters and the whole syringe used for sample delivery is shaken constantly. We note that this is not specific to the drop-on-drop system and any sample delivery system for tr-SFX/SSX faces similar issues.
The nanoliter droplets contain a slurry of crystals. Multiple diffraction patterns on a single image therefore become inevitable while imaging a single droplet. If so, how dense is the slurry and how were these dealt with?
Crystal slurries that we used in this study had between 10 7 to 10 8 crystals/mL. This concentration, assuming uniform crystal distribution, would result in 30-300 crystals/3 nL ADE drop. Multiple diffraction patterns are therefore, as pointed out by the reviewer, potentially unavoidable. We optimized the crystal concentration to achieve the maximum possible indexing rate, while at the same time ensuring that the number of multiple lattices on each detector frame is within the range that can be handled by the indexing software. This is very well reflected in the ratio between integrated patterns and frames with indexable patterns. For all datasets analysed in this study, the average integration rate was above 100%. For example, the maximal integration rate was as high as 226% for one of the CTM-M-15 datasets from SACLA (i.e., on average there were 2.26 diffraction patterns integrated from one frame that was indexable with DIALS).
Not every software package will work well when multiple equally strong diffraction patterns are present on the same image (especially for samples with large unit cells, e.g. 200 Å and larger), but the DIALS framework is able to address this challenge well (Acta Crystallogr D Biol Crystallogr. 2014 Oct;70(Pt 10):2652-66). In brief, after the spot-finding routine, a crystal setting matrix compatible with the unit cell that indexes as many observed centroids as possible is first analysed and refined. Once refinement has converged, any remaining unindexed reflections may be analysed for further lattices. In subsequent iterations, joint refinement of the crystal lattices is performed. This process may be repeated until either an insignificant number of unindexed reflections remain, or no further lattices can be identified. If at any stage refinement does not converge, the most recently identified lattice is discarded and only those lattices which were refined successfully are reported.
Examining the data statistics, with CTX-M-15 Resting, roughly ~4500 lattices were merged, can the authors comment on why the R-factors are considerably higher than the other structures, which were exposed to their ligand?
As pointed out by the reviewer, the CTX-M-15 resting state dataset from SACLA contains less than one third of the merged lattices than the other CTM-X-15 datasets included in this manuscript (4,502 vs 15,151 -18,661 patterns). This results in a relatively low overall multiplicity (26.45), low total CC1/2 (83.3%) and high total Rsplit (44.10%) values. Merging a similar, limited subset of patterns from other, bigger datasets results in similarly poor merging statistics. We believe that the relatively high Rwork/Rfree is a result of the low number of indexed patterns comprising this dataset. We estimated that for this particular sample, as many as 10,000-15,000 lattices are required in order to obtain a dataset of similar quality as was obtained for the ligand exposed conditions. Unfortunately, due to extreme time constraints during our XFEL experiment at SACLA, it was not possible to collect as many images on the unperturbed CTM-M-15 crystals as we would have liked to. Although we did collect another SSX resting state structure at the Diamond Light Source, the incompatibility of the unit cell parameters did not allow for using this dataset as the reference for isomorphous difference density maps. Also, we think that using a reference dataset collected on exactly the same batch of crystals, under the same experimental conditions is a good practice.
What is the authors cutoff criteria for their high-resolution CC1/2 values, as some structures in the highest resolution shell have very low cc1/2 values, a few below 5%, and one less than 1%. Did paired-refinement at these cutoffs show an improvement in the refinement behaviour?
The data was merged using the program cxi.merge from the cctbx.xfel package. The resolution cut-offs for the final datasets merged with cxi.merge are determined in a standard procedure based on a combination of several criteria, including where the data falls below ten-fold multiplicity, where CC1/2 no longer decreases monotonically and where the values of I/σ(I) do not uniformly decrease any more (Nat Methods. 2017 Apr; 14(4): 443-449 To enable easier inspection of merging results, we included output merging statistics from cxi.merge for each resolution shell (Tables 1-9 below). If the reviewer and/or editor consider it appropriate, we can incorporate the nine tables into the supplemental information. The low CC1/2 in the highest resolution shells is a consequence of how image data is integrated in the processing pipeline (per-image I/σ(I) cutoff), and therefore, cannot be directly compared with numbers obtained using other processing software, e.g. CrystFEL.
A detailed explanation of the differences between the approach by the two software packages can be found in the SI material of Ibrahim et al, Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12624-12635.
Validity of cutoff selection criteria, as the reviewer suggested, can be confirmed by 'paired refinement'. Similar to our other studies (Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12624-12635 or Proc Natl Acad Sci U S A. 2020 Jan 7; 117(1): 300-307), we crosschecked our applied resolution cutoffs by applying a 'paired refinement' procedure to see whether including data from the highest resolution shells has a positive effect on the refinement as compared to more conservative cutoffs. This test was done using the PDB_REDO platform, which includes the implementation of the original 'paired refinement' algorithm from Karplus and Diederichs (Science. 2012 May 25;336(6084):1030-3, IUCrJ. 2014 May 30;1(Pt 4):213-20). The 'paired refinement' test suggested the same resolution cutoffs for four out of nine datasets included in this publication. For the remaining five, the suggested resolution cutoffs were 0.04-0.1 Å lower. Taking into account that there is always a bit of variability because of the way binning is done and that PDB_REDO utilizes Refmac for refinement while our original refinements were performed with Phenix, we conclude that the results of 'paired refinement' support selection of our resolution cutoffs.
Interestingly, for the SACLA HEWL 0.6 s dataset (which has the lowest CC1/2 value in the highest resolution shell of 0.3%) 'paired refinement' gives the same resolution cutoff as the current cutoff. Conversely, for the CTX-M-15 resting state structure from I24 (CC1/2 value of 56.9% in the highest resolution shell), 'paired refinement' suggests slightly lower resolution cutoff (1.71 Å instead of 1.65 Å). This is not intuitive and therefore, the assumption that low CC1/2 in the high resolution shells equates to low quality data and that high CC1/2 automatically means higher data quality are not always valid. The benefits of merging protocols applied by cctbx protocols can be demonstrated by electron density map analysis (Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12624-12635), where it becomes clear that Including "weak" data has no perceptible negative influence on the map quality and can in fact help pick out important subtle features in the maps.
We updated the "Data processing" section in "Online methods" by adding the additional information about the applied resolution cutoff criteria and 'paired refinement' (P15, lines 367-372; P16, lines 392-393).
What was the crystal size distribution, how was this controlled? If there was a large discrepancy in crystal size 1-3x volume changes or if multiple crystals overlaid on top of one another in a droplet would result in a larger total crystal volume with correspondingly increased diffusion times, do the authors account for this?
Our batch crystallization protocols for lysozyme produce slurries that are characterised by a very uniform crystal size distribution along all three edges. This is not the case for CTX-M-15, or any other 'real life' sample that we have worked with. Most of the microcrystals that we grow produce elongated forms of crystals. This is also the case for CTX-M-15 that produces rods/needles with an average edge size of 15 µm. We try to ensure as uniform crystal size distribution as possible by using seeding protocols, which seems to work well for our samples. Such a crystal shape is not particularly ideal but at least it allows for fast substrate diffusion from two directions. As mentioned earlier, we minimize crystal clustering effects by using freshly prepared crystal slurries that are filtered before loading into syringes. It should also be noted that the spread observed in crystal size is expected to have a small effect on diffusion times compared to the delay times examined in this study.

Reviewer #2
The authors present a nice time-resolved serial crystallography study using a drop-ondemand delivery system. They claim to be able to study chemical reaction in protein crystals using diffusing mixing in nanoliter droplets of a crystal slurry. This is a very interesting study which focuses on two proteins, the well know lysozyme and CTX-M15 kinetics. Although there are several different methods published based on drop-on-demand this is the first of its kind (to my knowledge) to demonstrate its ability to show chemical mixing and studying the kinetics of enzymes, which is of great interest to biologist. Current time resolved methods are limited to pump-probe experiments which limits the variety of proteins we can study. The authors claim this method is more feasible in terms of sample consumptions by using microdroplets. This is a promising technique which does drastically reduced the volume of sample to collect complete data sets.
Major comments: 1. The authors provide a nice representation of the time series data of the lysozyme enzymatic reaction with its ligand from rest state up to 2s. The structural analysis clearly shows the gradual docking of the ligand into the active site, with the high-resolution structural information to support this. They also demonstrate the method works for CTX-M-15 however the complete reaction of ligand docking was not detected, as the data collection time did not exceed the 2s mark for the XFEL experiment and the authors seem to have missed the completed reaction time. Instead they used a static time point collected on a fixed target at a different lights source which is OK, but it would have been nice to see the complete reaction at SACLA using the drop-on-demand source. Was there a particular reason for this? Is the drop and demand method limited to a certain duration of kinetics or are we able to capture longer time frames than 2s. This was not clear in the manuscript. It would also help to state the current limitations of this device if any? What are the maximum droplets sizes possible before the diffusion reactions rates are affected? Despite this, their results do capture a different state of the ligand at 2s. The question however still remains, does the reaction go to the completion of what was seen in the fixed target experiment and are there more intermediate states.
2. What would make this article more informative is the specific Kinetic values for the enzyme and ligands presented in the supplementary information to be inserted within the main text (p6, line 132). What possible reasons could be associated with not seeing the completed reaction with in the 2s time frame of XFEL data collection, given that the enzymatic data shows this should have been possible?
Answer to points 1 and 2: The longest mixing time that we could have achieved during the SACLA XFEL experiment was in principle 6 s (stated on P4, line 95; P9, line 229). This is determined by the slowest tape speed and the position of the PEI head. Several things need to be considered before attempting to probe longer timepoints. Please note that our setup, in the form we used at SACLA, was not enclosed and the humidity was not controlled. Moreover, temperature in the experimental hutch was typically between 36-38 ℃ (this needs to be better controlled by the facilities). This was not only causing rapid droplet evaporation (which inevitably was leading to crystal damage) but also increased salt crystal formation from the mother liquor containing 2 M ammonium sulphate. In order to mitigate the problems resulting from rapid evaporation, crystal slurries can be supplemented with glycerol solution which slows down evaporation. Adding glycerol comes however at a price of increased viscosity and therefore is counterproductive for usage in diffusion-driven experiments. We found that the 2 s timepoint was the longest time we could reliably under these particular experimental conditions. The next generation of the drop-on-drop setup will be equipped with humidity control and, given that the position of the PEI head can be moved upstream, will allow us to achieve mixing times of up to ~10 s. We included a statement in the conclusion part of the manuscript to describe the possible range of crystal sizes and delay times that can be targeted with the system (P7, lines 159-166).
In CTX-M-15 data we observed an empty active site after 0.6 s and already formed acylenzyme complex with partial (74%) occupancy after 2 s. This is despite significant excess of the substrate over protein in the solution, which suggests that the first step of the reaction, i.e. acyl-enzyme formation, happens on a very fast time scale that is beyond the time resolution supported by the version of the drop-on-drop system presented in this manuscript. Moreover, the 2 s acyl-enzyme structure is essentially the same as the 'steady state' soaked structure from the fixed target. Data collected on 'steady state' soaked samples collected with longer incubation times (between 15 min and 24 h, not shown here) show that the acyl-enzyme structure remains largely unchanged. The only difference is that, at higher resolution, more than one tautomer can be distinguished, but the presence of a particular tautomer is not related to incubation time. Finally, the product actually never leaves the active site. The only consistent difference that we notice between the 2 s, 10 min and longer incubation time structures is the increasing ligand occupancy. Therefore, we would expect a > 2s structure to be largely similar to the 2 s dataset with slightly increased ligand occupancy.
CTX-M-15 behaviour is in contrast to the lysozyme data that shows > 60% ligand occupancy after 0.6 s, despite very low ligand affinity. This suggests that there is something particular about the CTX-M-15-ertapenem system that prevents the ligand from entering and leaving the active site. We think that this particular characteristic must be a combination of protein, crystal lattice and crystallization buffer properties and is not related to the physical process of diffusion in solution itself. In other words, although the CTX-M-15 data helped us to demonstrate capabilities of the drop-on-drop method itself, we admit that this crystal system might not be ideal for studying with drop-on-drop methods. It is impossible to tell what is the main cause responsible for this behaviour, especially as reliable methods for characterizing reaction rates in slurries of microcrystals are essentially non-existent.
As requested, we included binding constants directly in the main text of the revised manuscript. Now the text reads: " (...) 11. 6 µM for CTX-M-15-ertapenem [Supplementary Table 8 In this study, we used numerical simulations as a way to help us understand and interpret our experimental fluorescence data, which showed that the equilibration process after drop collision is faster than what would be predicted from pure diffusion, i.e. much faster than we expected. It was very important for us to find additional means to confirm and help describe this effect, as it was basically the prerequisite for this method to serve its purpose. We agree with the reviewer that our attempts at describing what is happening in the system after droplets collide do not provide a complete model for this process. We express this opinion in Supplementary Discussion, where we state that future work is clearly required to provide a better description of this process. Our conclusion is that the mixing in drops can be described as a combination of diffusion and hydrodynamic mixing (internal jets) and this is a good start for any future studies on droplet-based methods. We have for example started working on improvements to our fluorescence-based detection system to allow us to probe smaller droplet volumes on faster mixing time scales, which will better match timescales typically explored in simulations.
Please note that the fluorescence measurements and simulations were performed on fluorescent dye and calcium chloride solutions. Therefore, the results obtained on this simplified experimental system cannot be directly transferred to other systems. We do not think that the timescales obtained represent well what is happening in the drop composed of microcrystal slurry and we also do not claim that in the manuscript. Since any simulation uses experimentally determined diffusion coefficients, in the absence of these experimental values for ertapenem in 2 M ammonium sulphate solution, we are not able to provide an approximation for this system. Our assumption would be that the mixing time for ertapenem is slower than that of calcium chloride. Even if we were able to provide an approximation, it would be far from experimental observations. Simulations do not account for diffusion through the crystal lattice which, as we explained in our replies to questions 1 and 2, seems to play a major role in defining overall diffusion rates and will be highly protein-substrate specific.
Minor comments: P3, line 54: '….using the appropriately small crystals,….' Please specify what you consider this range to be. I think this is an important number which needs to be considered for this type of experiment. Are we talking about nm or sub-micron?
P9, line227: It is stated in the manuscript that ' volume of 60pl to yield equilibrium…' It is not clear in the manuscript why 60pl drop size was used for the CaCl2 solution and 120pl was used for the HEWL experiment. Also, given that the same size 100um orifice was used how was the volume of the drop altered.
The size of the droplets produced by any cartridge will vary depending on the factors like orifice size, dispensing frequency, temperature, surface tension, viscosity and concentration of the substance. For any given cartridge this range will be very broad. In our experience, dispensing from a 100 μm orifice cartridge can produce droplets in the range of 30 to 200 pL and it will also vary slightly from cartridge to cartridge. The volume of the droplets is calculated directly by the software controlling the piezoelectric ejector based on image analysis of the generated droplets. 0.1 M calcium chloride solution, which was used as substrate in fluorescence experiments, manifested water-like behaviour and dispensed as ~ 60 pL droplets. To allow direct comparison, we used the same 60 pL droplet size in numerical simulations. The dispensing behaviour of GlcNAc and ertapenem observed during this particular experiment at SACLA was very different to calcium chloride. These substrates were dissolved close to their maximal solubility limit and were much more viscous than calcium chloride solution. We also think that the very high temperature in the experimental hutch (36-38 ℃) had a significant impact on the dispensing behaviour. As a direct calculation from the image processing software during the experiment was not possible due to very poor lighting conditions, we estimated the volume of GlcNAc and ertapenem droplets to be around 120 pL (based on consumption data and the number of generated droplets) and we used this value for all calculations as listed in the revised Supplementary Table 4. P10, line 23, '…..normalized spectra) scaled ….' Remove the bracket.
The bracket was removed. We modified the labelling on Figure 1 to allow easier identification of the components.

Reviewer #3
In the history of biochemistry, many researchers have studied enzymes, but it has been difficult to completely elucidate their chemical reactions due to the limitations of the technology to visualize the movement of molecules during their catalyses. This situation is now changing with the development of XFEL technology in the last decade. Time-resolved serial femtosecond crystallography (tr-SFX) is the only technique that can visualize the movement of biological macromolecules during function with atomic/femtosecond-time resolutions under damage-free room temperature states. However, in tr-SFX using conventional injectors, it was troublesome to optimize the measurement conditions by repeated trial and error for each type of protein. Furthermore, the consumption of a huge amount of sample was a problem.
In the present work, Butryn et al. have succeeded in overcoming the problems by demonstrating the application of the drop-on-drop method developed from the drop-ontape XFEL sample delivery system to enzyme-catalyzed reactions in microcrystals. I appreciate the efficiency of this measurement system and its potential versatility for a wide variety of enzymes.
I request that the authors answer the following questions/comments.
1. The microcrystals of HEWL and CTX-M-15 were both obtained with salt-based precipitants, and their slurry seems to be low viscosity. I expect the authors present reference materials in the present or the future paper that evaluate the diffusion efficiency when using various highly viscous slurries (e.g. 20% PEG4000) in contrast to the drop size.
We strongly agree with the reviewer that it will be extremely beneficial to evaluate the diffusion efficiency depending on various factors, including ADE droplet size, mother liquor composition, or PEI droplet number, size, or velocity. This will allow us to select the most optimal experimental parameters as well as a crystal sample system that is the most promising. The fluorescence-based method that we demonstrated in this study has a great potential for enabling an extensive characterisation of all the above mentioned parameters in the context of the drop-on-drop technique. In addition, we will run numerical simulations that shed light on how different parameters affect diffusion and mixing in colliding drops. For this, empirically-determined diffusion coefficients in various crystallization media and fast imaging of the drop collision event will need to be determined. Our efforts to characterise the drop-on-drop method in more depth have been unfortunately heavily negatively affected by the SARS-CoV-2 pandemic outbreak as our access to the labs has been largely restricted or entirely blocked. We believe that such an in-depth characterization of the drop-on-drop method can easily serve us as a topic for a whole separate manuscript.
2. If possible, the authors should prepare a movie showing the efficient mixing of PEI drops on a ADE drop as supplementary data, for a detailed understanding of the measurement system. It does not have to be a movie of the actual measurement at SACLA, but it can be a movie of fluorescent dye taken offline.
As suggested by the reviewer, we added to the supplementary material two high-speed videos showing a typical example of ADE and PEI droplets merging in a drop-on-drop experiment ( Supplementary Video 1 and 2). In Supplementary Video 1 we show a 3 nL ADE droplet merging with a burst of ten PEI droplets dispensed at 1 kHz. In Supplementary Video 2 a similar experiment was recorded, but the camera was synchronised with droplet ejection to visualise positional and temporal precision of PEI and ADE drop dispensing. Figure 3, it is clear that the contrast between blue and red in the 700 ms figure of (a) is different from that in the other figures in (a). What is the cause of this?

In Supplementary
We thank the reviewer for pointing out this inconsistency. We carefully re-analysed the data used for generating Supplementary Figure 3. We found out that an error had crept in when we were calculating normalized B-factors for the 600 ms (previously incorrectly labelled as 700 ms) timepoint lysozyme structure. This error caused all B-factors in that structure to be undervalued. This has been corrected in the revised version of Supplementary Figure 3 (now Supplementary Figure 8), where all panels with displayed structures were replaced. We also updated the scale bar and averaged normalized B-factor of selected residues (now in Supplementary Table 7, P19 in the Supplementary Information).
4. I ask a question to gain a better understanding of the quality of the structural data in this study. In Supplementary Table 2, the refinement of the four types of HEWL drop-on-drop data is done at about 1.45A resolutions, and the CC1/2 values of the outer shell are only 5.4-0.3%. What is the resolution when the CC1/2 values of the outer shell is 50%? Or the authors should show a Table of CC1/2 for each resolution shell. Supplementary Table 3 Answer to points 4 and 5: The data was merged using program cxi.merge from the cctbx.xfel package. The resolution cut-offs for the final datasets merged with cxi.merge are determined in a standard procedure based on a combination of several criteria, including where the data falls below ten-fold multiplicity, where CC1/2 no longer decreases monotonically and where the values of I/σ(I) do not uniformly decrease any more (Nat Methods. 2017 Apr; 14(4): 443-449; Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12624-12635 or Proc Natl Acad Sci U S A. 2020 Jan 7; 117(1): 300-307). To enable easier inspection of merging results, we included output merging statistics from cxi.merge for each resolution shell (Tables 1-9 below). The low CC1/2 in the highest resolution shells is a consequence of how image data is integrated in the processing pipeline (per-image I/σ(I) cutoff). If the reviewer and/or editor consider it appropriate, we can incorporate the nine tables into the supplemental information.

Similar question as 4. In
The validity of cutoff selection criteria can be confirmed by 'paired refinement'. Similar to our other studies (Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12624-12635 or Proc Natl Acad Sci U S A. 2020 Jan 7; 117(1): 300-307), we confronted our applied resolution cutoffs by applying a 'paired refinement' procedure to see whether including data from the highest resolution shells has a positive effect on the refinement as compared to more conservative cutoffs. This test was done using the PDB_REDO platform, which includes the implementation of the original 'paired refinement' algorithm from Karplus and Diederichs (Science. 2012 May 25;336(6084):1030-3, IUCrJ. 2014 May 30;1(Pt 4):213-20). The 'paired refinement' test suggested the same resolution cutoffs for four out of nine datasets included in this publication. For the remaining five, the suggested resolution cutoffs were 0.04-0.1 A lower. Taking into account that there is always a bit of variability because of the way binning is done, we conclude that the results of 'paired refinement' support selection of our resolution cutoffs. Interestingly, for the SACLA HEWL 0.6 s dataset (which has the lowest, CC1/2 value in the highest resolution shell of 0.3%) 'paired refinement' gives the same resolution cutoff as the current cutoff. Reversely, for the CTX-M-15 resting state structure from I24 (which CC1/2 value of 56.9% in the highest resolution shell), 'paired refinement' suggests slightly lower resolution cutoff (1.71 A instead of 1.65 A). This is not intuitive and therefore, the assumption that low CC1/2 in the high resolution shells equates to low quality data and that high CC1/2 automatically means higher data quality are not valid. The benefits of merging protocols applied by cctbx protocols can be demonstrated by electron density map analysis (Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12624-12635), where it becomes clear that Including "weak" data has no perceptible negative influence on the map quality and can in fact help pick out important subtle features in the maps.
We updated the "Data processing" section in "Online methods" by adding the additional information about the applied resolution cutoff criteria and 'paired refinement' (P15, lines 367-372 and P16, lines 392-393 in the main manuscript file). In the revised version, Supplementary Tables 2 and 3 are renumbered to Supplementary Tables 5 and 6. 6. In supplementary Table 4, the kcat value seems to be quite low. What is the kcat in the solution state, not in microcrystals? What is the optimal pH of the enzyme?
CTX-M-15 is known to only poorly hydrolyse carbapenem antibiotics, like ertapenem. It is a well-known property of CTX-M-15 and similar B-lactamase enzymes. This is due to the fact that while ertapenem (and other similar carbapenem antibiotics) can bind and form a covalent attachment to the active site serine, it only very slowly resolves to the active enzyme (i.e. the steps in panel b in Supplementary Figure 12 are very slow). This is reflected in our kinetic data which show a very fast onset of acylation (k2/K, i.e. the covalent attachment of ertapenem to the active site serine) but a µM inhibition constant (Kiapp = 1.8 µM). We detail this in the Supplementary Notes and have now added a note to the bottom of Supplementary Table 5 (Supplementary Table 8 in the revised version, P20 in the revised Supplementary Information) to direct readers to the appropriate discussion. Revised Supplementary Table 8 shows the solution state kinetics (not in crystallo kinetics) of CTX-M-15 at pH 7.5. We have altered the heading of revised Supplementary Table 8 to more closely reflect this fact. This pH approximately represents the optimal pH of the enzyme for kinetics, as we note with nitrocefin (a β-lactam reporter substrate) the Kcat is >300 s -1 (Tooke et al,ref. 29 in the online methods references). We have now edited the methods to more explicitly state this, rather than only referring to previous papers.
Tables 1-9 are to be included with response to Referee #1 and #3.