Orthogonal fingerprinting for accurate and fast single- 14 molecule mechanical profiling of proteins

1 Force-spectroscopy by Atomic Force Microscopy (AFM) is the gold-standard method for 2 nanomechanical characterization of proteins. However, AFM suffers from unavoidable 3 interexperimental force calibration errors that make it challenging and time-consuming to study 4 modulation of protein nanomechanics. Here, we develop orthogonal fingerprinting to track 5 mechanical unfolding of two different proteins in the same AFM experiment, under the same 6 calibration parameters. We show that the accuracy of orthogonal fingerprinting is independent 7 of force calibration errors, reaching up to a 6-fold improvement with respect to traditional AFM. 8 Importantly, this improvement in accuracy is preserved even when unfolding force data are 9 obtained from multiple, independent orthogonal fingerprinting experiments. We also 10 demonstrate that orthogonal fingerprinting can speed up data acquisition more than 30 times. 11 Benefiting from the increased accuracy of orthogonal fingerprinting, we determine that the 12 mechanical stability of a protein is independent of its neighboring domains.


Madrid, Spain
at increasing calibration uncertainties for the traditional (blue) and the simultaneous (black) strategies. The remaining simulation parameters are the same as in panel B. In panels B-F, the number of events per experiment and protein in the simultaneous approach was half of the number of events in the traditional strategy so that RSDs were compared between conditions with equal total number of events. author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint

1
Interexperimental variation in unfolding forces in traditional AFM 2 To quantify variation in mechanical parameters obtained by AFM, we first examined the same 3 protein in different, independently calibrated AFM experiments. We produced a polyprotein 4 containing eight repetitions of the C3 domain of cardiac myosin-binding protein C 5 (Supplementary Figure 2, Supplementary Text 1) and subjected individual (C3) 8 polyproteins to 6 a linear increase in force of 40 pN/s using a force-clamp atomic force microscope. Results from 7 two such independent experiments are shown in Figure 1A. Mechanical force triggers the 8 unfolding of individual C3 domains. These unfolding events are detected as step increases in the 9 length of the polyprotein of 24 nm ( Figure 1A, middle; Supplementary Figure 3A). We 10 determined the force at which the unfolding events occur and calculated distributions of 11 unfolding forces. Despite the fact that both distributions are well defined (n > 115 events), the 12 difference in their mean unfolding force (∆ ) is 19% ( Figure 1A, right certain number of independent experiments, each one affected by a different error in force.

28
Simulations return the distribution of ∆ values obtained in the 1,000 cycles. The spread of 29 the distribution is quantified by its RSD. 30 31 Using our Monte Carlo procedure, we have simulated mechanical protein unfolding under a 32 3.6% force calibration uncertainty, which is a reasonable estimate of the lowest uncertainty that 33 the calibration by thermal fluctuations can achieve (Supplementary Text 2 and Supplementary 34 Figure 4). Figure 1B  It has been argued that determination of ∆ using a single cantilever, in the same experiment, 46 could minimize the error associated with force calibration 15,22,23 . However, to the best of our 47 knowledge, the resulting improvement in accuracy has not been quantified. We have used our 48 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint Monte Carlo simulations to estimate the accuracy achieved by simultaneous measurement of 1 mechanical unfolding of two proteins. Considering equal total number of events and 2 experiments, we find that determination of ∆ in simultaneous experiments results in a 3 decrease in RSD from 5.0% to 3.2% at a 3.6% calibration uncertainty ( Figure 1B). The RSD of 4 the ∆ distribution obtained in simultaneous experiments is further reduced at higher number 5 of unfolding events, as expected from better definition of the distribution of unfolding forces 6 ( Figure 1C, Supplementary Figure 5B). Unexpectedly, averaging multiple experiments in which 7 both proteins are probed simultaneously leads to further reductions in RSD, despite the fact that 8 each individual experiment is performed under different calibration parameters ( Figure 1D). 9 Increasing the number of events or experiments also results in better accuracy when proteins are 10 probed in traditional, separate experiments (Figure 1 C,D). We find that the relative 11 improvement in accuracy achieved by simultaneous measurement over traditional AFM 12 increases with the number of events per experiment, and remains fairly constant with the 13 number of experiments ( Figure 1E). Hence, we conclude that averaging independent AFM 14 experiments in which two proteins are probed simultaneously retains statistical power, even if 15 those experiments are affected by different calibration errors. 16 17 All our simulations above consider a 3.6% uncertainty in force calibration, which is a much 18 smaller value than usually reported 11,13-15 . Hence, we estimated the RSD of the distribution of 19 ∆ at increasing calibration uncertainties. As expected, higher calibration uncertainties lead 20 to much increased RSD in traditional AFM, whereas the RSD of simultaneous measurements 21 remains insensitive to the calibration uncertainty, even when data from several independent 22 experiments are averaged ( Figure 1F, Supplementary Figure 6A).

24
Orthogonal fingerprinting enables simultaneous characterization of proteins by AFM 25 Results in Figure 1D show that under a modest 3.6% uncertainty in force, simultaneous 26 measurement can reach the same level of accuracy with 2-4 times less experiments than the 27 traditional approach. Furthermore, at high values of calibration uncertainty, the accuracy by 28 simultaneous measurements can be 6 times higher than in the traditional approach ( Figure 1F). 29 These remarkable improvements in throughput and accuracy prompted us to design a general 30 strategy that enables simultaneous measurement of mechanical properties of proteins.

32
A fundamental requirement of force-spectroscopy AFM is to have reliable methods to identify 33 single-molecule events. In the case of mechanical characterization of proteins, this requirement 34 is fulfilled by the use of polyproteins, which provide molecular fingerprints that easily 35 discriminate single-molecule events from spurious, non-specific interactions 24,25 36 (Supplementary Figure 3). As exemplified in Figure 1A, mechanical unfolding of polyproteins 37 produce repetitive events whose length fingerprints the domain of interest. If two polyproteins 38 are to be measured in the same experiment, it is imperative that they have different 39 fingerprinting unfolding lengths. Here, we propose a widely applicable manner of achieving 40 such orthogonal fingerprinting (OFP) through the use of heteropolyproteins, in which marker 41 proteins are fused to the proteins of interest 26 . Since OFP identifies molecules through the 42 unfolding length of the marker domains, proteins of interest to be compared in simultaneous 43 AFM measurements can have the same unfolding length (e.g. mutant proteins).

45
To test whether heteropolyproteins can be employed to achieve OFP during simultaneous 46 measurement of proteins by AFM, we first followed a single-marker strategy using the The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint (C3-L) 4 , we used protein L as a marker since its unfolding length is different from the one of C3 1 21 . Indeed, mechanical unfolding of (C3-L) 4 under a 40 pN/s ramp results in the appearance of 2 16 and 24 nm steps, which correspond, respectively, to the unfolding of L and C3 domains 3 ( Figure 2B, left, and Supplementary Figure 3B). We selected unfolding traces of (C3) 8 and (C3-4 L) 4 , obtained in independent, traditional AFM experiments, and classified them according to the 5 number of 16 and 24 nm events they contain. Our results show that a gating criterion of 6 n(16nm) = 0 and n(24nm) > 2 unambiguously identifies unfolding events coming from (C3) 8 , 7 whereas events resulting from (C3-L) 4 can be safely assigned when n(16nm) > 1 and 0 < 8 n(24nm) < 5 ( Figure 2B, right). We analyzed 17 such traditional fingerprinting (TFP) 9 experiments and obtained distributions of unfolding forces for C3 in the context of both 10 polyproteins, which we found to be very similar ( = 90. 7 and 88.4 pN for the homo and the 11 heteropolyproteins, respectively, Figure 2C). We used our Monte Carlo simulations to estimate 12 the RSD in ∆ that is expected from the actual number of experiments and events obtained 13 (RSD = 3.0% and 8.2% at 3.6% and 18% calibration uncertainties, respectively) 14 (Supplementary Figure 7A, Supplementary Table 1). 15 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint 1 Following validation of the polyprotein gating criterion ( Figure 2B), we measured (C3) 8 and 2 (C3-L) 4 simultaneously in OFP experiments. Single-molecule traces were classified according 3 to the number of 16 and 24 nm steps they contain, and sorted as coming from the (C3) 8 or (C3-4 L) 4 before analysis of unfolding data ( Figure 2D). Also in OFP, the distributions of unfolding 5 probability of C3 in the context of (C3) 8 and (C3-L) 4 are very similar ( = 96.3 and 93.4 pN 6 for the homo and the heteropolyproteins, respectively, Figure 2E). Notably, only 5 OFP 7 experiments were required to reach a lower RSD than in TFP, which is a 3 times higher speed of 8 data acquisition ( Figure 2C,E, Supplementary Figure 7A). 9 10 Dual-marker orthogonal fingerprinting overcomes confounding protein dimerization 11 In the AFM experiments reported in Figures 1 and 2, polyproteins are picked up by the 12 cantilever through non-specific physisorption. Hence, experimental traces can contain different 13 number of unfolding events ( Figures 1A, 2B). Non-specific protein pickup also leads to the 14 occasional appearance of traces containing more unfolding events than engineered domains 15 comprise the polyprotein, an effect that results from polyprotein dimerization 27 . For instance, in 16 Figure  hamper proper identification of events, since a fraction of C3 unfolding events coming from 21 (C3) 8 could be mistakenly assigned to (C3-L) 4 due to the non-zero probability that some dimers 22 are included in the gating region.

24
In general, the degree of protein dimerization in AFM is dependent on the particular 25 experimental conditions. Hence, heterodimerization poses a challenge to OFP, whose extent 26 may vary depending on the system to study. However, we hypothesized that difficulties coming 27 from protein dimerization could be overcome by using a second protein marker, since traces 28 originating from dimers would be fingerprinted by the presence of both marker proteins. We 29 chose the protein SUMO1 as a second marker because its unfolding length is different from 30 those ones of C3 and protein L 28 . We engineered the heteropolyprotein (C3-SUMO1) 4 and 31 pulled it in the AFM (Supplementary Figure 2). Two population of unfolding steps, at 20 nm 32 and 24 nm are detected, corresponding to the unfolding of SUMO1 and C3, respectively 33 (Supplementary Figure 3C).

35
Having two marker proteins enables gating criteria that are based exclusively on the presence of 36 the marker domains, in a manner that protein dimers can be identified and excluded from the 37 analysis ( Figure 3A Symmetrization of orthogonal fingerprinting datasets improves accuracy 4 To understand why the improvement in accuracy is preserved when multiple OFP experiments 5 are averaged ( Figure 1D)  The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint , are the total number of unfolding events for each protein, and ̅ is the average 1 value of the error in force in experiment j, which, as a consequence of OFP, is considered to be 2 equivalent for both proteins. at higher calibration uncertainties ( Figure 4A). Indeed, under these asymmetry conditions, the 15 performance of OFP drastically diminishes and the obtained RSD approaches the one obtained 16 by TFP ( Figure 4A).

18
Since asymmetric data result in poorer performance of OFP, we examined whether 19 symmetrization of OFP datasets results in improved RSD. To simulate symmetrization, we did 20 Monte Carlo simulations of 2 OFP experiments in which resulting RSD of ∆ after symmetrization becomes independent of the calibration 25 uncertainty and is lower than the RSD of the more populated, asymmetric dataset at calibration 26 uncertainties higher than 7% ( Figure 4A). 27 28 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint We have tested the effect of symmetrization in our real AFM datasets by removing unfolding 1 events so that every OFP experiment verifies the symmetry condition . Feeding 2 Monte Carlo simulations with these trimmed datasets, we estimate that the RSD of the 3 distribution of ∆ of the symmetrized datasets becomes lower than the original RSD also at 4 calibration uncertainties higher than 7% (Supplementary Figure 7), although in both examples 5 the differences between the asymmetry and the symmetrized conditions is less prominent than 6 in Figure 4A. Indeed, we find that the extent of improvement in RSD by symmetrization 7 depends on the number of experiments performed (Supplementary Figure 9) together with the 8 degree of asymmetry as predicted by Equation 1. Hence, we recommend that improvement in 9 RSD by symmetrization is evaluated on a case-by-case basis using Monte Carlo simulations as 10 we have done here. neither of these approaches addresses more fundamental assumptions of the calibration 29 procedures that can lead to higher calibration uncertainties 32 . The impact of OFP with respect to 30 traditional AFM that is summarized in Figure 5 considers a realistic calibration uncertainty of 31 10.8%. However, it is important to note that due to its insensitiveness to calibration errors, OFP 32 avoids the effects of force miscalibrations that originate from difficult-to-detect defects in 33 specific cantilevers. Furthermore, the impact of OFP may be more relevant in the light of the 34 availability of next generation cantilevers, which are pushing the AFM limits into ranges of 35 forces, stability and time resolutions that are not accessible to conventional cantilevers 33-36 . 36 These high-performance cantilevers are more challenging to calibrate 13 , so we envision that 37 combination of OFP strategies and these new cantilevers is set to expand the reach of single-38 molecule AFM. 39 40 (ii) OFP shows much improved accuracy ( Figures 1E,F, 4A,C). This increase in accuracy 41 captured by Monte Carlo simulations is also observed in our experimental dataset, since the 42 spread of ∆ in pairs of OFP experiments is lower than in TFP experiments (SD = 8.0 vs 43 11.2 pN, respectively) ( Figure 4B). Keeping the speed of data acquisition constant at high 44 calibration uncertainties, the RSD achieved by OFP can be 6 times lower than in TFP ( Figure 5, 45 Supplementary Figure 6B). 46 47 (iii) The throughput of OFP is much increased. We estimate that OFP can obtain the same 48 accuracy more than 30 times faster than TFP at high calibration uncertainties (Supplementary 49 Figure 6C). In addition, proteins to be probed in OFP can be purified simultaneously 50 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint (Supplementary Figures 2, 10), which results in extra savings in working time and reagents 1 while ensuring equal experimental conditions for both proteins ( Figure 5).

3
The increase in throughput and accuracy of OFP come at the expense of each other. Hence, 4 depending on the goals of an OFP study, the experimenter can choose to favor one or the other, 5 or to find a balance between both. In this regard, our Monte Carlo simulations can help 6 experimental design (code is provided as Supplementary Material). For instance, in Figure 4C, 7 we show that different gains in accuracy and throughput can be achieved depending on the 8 number of OFP experiments chosen to compare two proteins, considering 10 TFP experiments 9 as a reference.

11
A direct application of OFP is to examine how neighboring domains affect protein 12 nanomechanics. Indeed, the use of heteropolyproteins relies on the assumption that the effect of 13 neighboring domains in the mechanics of a protein domain is negligible 26,37-39 . Our highly 14 accurate OFP experiments show that the mechanical stabilities of the C3 domain in the context 15 of a (C3) 8 homopolyprotein, or within a (C3-L) 4 or (C3-SUMO1) 4 heteropolyproteins, are very 16 similar (Figures 2, 3). Hence, our data lend strong support to the use of heteropolyproteins in 17 force-spectroscopy AFM. In particular, since the mechanical properties of the C3 domain are 18 independent of the flanking domains, the mechanical effects of mutations in C3 that cause heart 19 disease can be directly tested using OFP strategies 40 . 20 21 Simultaneous mechanical characterization of proteins has been achieved before combining 22 microfluidics, on-chip protein expression and AFM measurements in a combined atomic 23 Figure 5. Overview of orthogonal fingerprinting force-spectroscopy AFM. In the traditional approach, comparison of the mechanical stability of Protein a and Protein b involves independent purification and several AFM experiments to compensate for inaccurate force calibration. OFP is based on the production of heteropolyproteins composed of the proteins of interest fused to marker domains. Since the markers provide unequivocal fingerprints in single-molecule pulling experiments, OFP enables simultaneous purification and measurement in the AFM, circumventing errors in force calibration. Hence, OFP can achieve the same accuracy as conventional single-molecule AFM with much better throughput (left). Alternatively, by keeping the speed of data acquisition constant, OFP considerably improves the accuracy of single-molecule AFM (right). The improvement in throughput and accuracy are estimated from simulations at 10.8% calibration uncertainty (100 events per experiment and protein).
author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint force/total internal reflection fluorescence microscope 15,22 . An advantage of OFP is that it can 1 be readily implemented in any force-spectroscopy AFM setup. In addition, different 2 fingerprinting lengths provide additional reassurance of the identity of the probed molecules. In 3 this regard, OFP is very well suited to compare mechanical properties of proteins with similar 4 unfolding lengths, such as mutants of the same protein 17,19 . In those cases where the proteins to 5 compare have different unfolding lengths, simultaneous measurement is of immediate 6 application and can lead to the increase in accuracy and throughput described here. Examples 7 include examination of the effect of disulfide bonds, protein misfolding, multimerization, and 8 pulling geometry in the mechanical stability of proteins and their complexes 41-48 , and 9 determination of rates of force-activated chemical reactions 49 . In these examples, our Monte 10 Carlo simulations and theoretical developments can be fully applied to guide experimental 11 design and interpretation. 12 13 Orthogonal fingerprinting can be further improved in two aspects. Since the relative 14 performance of OFP is better at high number of events ( Figure 1E, Supplementary Text 4), we 15 propose that even better accuracy will be achieved by to small variations in and ∆ and therefore one single set of parameters is enough to calculate 48 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint RSD of distributions of ∆ even if the mechanical parameters of the proteins to be compared 1 are slightly different (Supplementary Table 2).

3
The kinetic Monte Carlo routine to obtain distribution of unfolding forces compares a random 4 number with the instant probability of unfolding at a given force. If the unfolding probability is 5 higher than the random number, unfolding is considered to happen at that force. Instant 6 probabilities of unfolding are calculated following a linear approximation according to reference Protein production and purification 18 The cDNAs coding for the C3-L and C3-SUMO1 constructs were produced by gene synthesis 19 (NZY-Tech and Gene Art, respectively). The cDNA coding for the C3 domain was obtained by 20 PCR. cDNAs coding for polyproteins were produced following an iterative strategy of cloning 21 using BamHI, BglII and KpnI, as described before 25 10 L) of the purified protein is deposited on the surface of a gold coated cover slip (Luigs &  41 Neumann), or directly into the Hepes buffer contained in the fluid chamber of the AFS. The 42 cantilever is brought in contact to the surface for 1-2 s at 500-2000 pN to favor formation of 43 single-molecule tethers. Then, the surface is retracted to achieve the set point force. If a single-44 molecule tether is formed, the force is increased linearly at 40 pN/s for 5 s while the length of 45 the polyprotein is measured. This protocol ensures full unfolding of C3, L and SUMO1 domains 46 (Supplementary Figure 3). Unfolding events are detected as increases in the length of the 47 protein.
In the initial characterization of polyproteins, we analyze all traces that contain at least 48 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint two events of the same size, which allows to set a fingerprinting length for the domains (24 ± 1 1 nm for C3, 16 ± 1 nm for protein L, and 20 ± 1 nm for SUMO1, see Supplementary Figure 3). 2 For the rest of the analyses, we only considered traces that contain fingerprinting unfolding 3 lengths. Unfolding forces were recorded and plotted as cumulative distributions. values 4 were obtained from Gaussian fits to histograms of unfolding forces. Force inaccuracy due to 5 laser interference was lower than 40 pN in all experiments (peak-to-peak height in baseline 6 force-extension recordings) 11 . 7 8 9 10 Author contributions 11 J.A.C. designed the research. D.V.C engineered polyprotein constructs and produced proteins. Competing financial interests 31 The authors declare no competing financial interest. 32 33 Resources 34 The code used for the Monte Carlo simulations is available as online Supplementary Material. 35 36 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/293506 doi: bioRxiv preprint