The SensorOverlord predicts the accuracy of measurements with ratiometric biosensors

Two-state ratiometric biosensors change conformation and spectral properties in response to specific biochemical inputs. Much effort over the past two decades has been devoted to engineering biosensors specific for ions, nucleotides, amino acids, and biochemical potentials. The utility of these biosensors is diminished by empirical errors in fluorescence-ratio signal measurement, which reduce the range of input values biosensors can measure accurately. Here, we present a formal framework and a web-based tool, the SensorOverlord, that predicts the input range of two-state ratiometric biosensors given the experimental error in measuring their signal. We demonstrate the utility of this tool by predicting the range of values that can be measured accurately by biosensors that detect pH, NAD+, NADH, NADPH, histidine, and glutathione redox potential. The SensorOverlord enables users to compare the predicted accuracy of biochemical measurements made with different biosensors, and subsequently select biosensors that are best suited for their experimental needs.

. Determining the range of glutathione redox potential E GSH values we can measure accurately with the roGFP1-R12 biosensor. (a) Glutathione redox potential (E GSH ) directs the oxidation of cysteines in hundreds of proteins in the same direction, resulting in their concerted regulation. (b) The reduced and oxidized states of the roGFP1-R12 biosensor have different fluorescence spectra 8 , enabling E GSH measurement via R (fluorescence ratio) microscopy. (c) The conversion map from R to E GSH is highly nonlinear. R reduced state and R oxidized state refer to the ratiometric emission of ensembles of reduced and oxidized biosensors, respectively. E 0′ is the standard midpoint potential of the biosensor. (d) The top panel shows how measurement errors in R cause observed E GSH values (E Obs ) to differ from the true E GSH values (E True ) that would be observed if R was measured with no error (R True ). The bottom panel shows how the size of an E GSH error (E Obs -E True ) depends not only on the size of the error in R but also on the value of R. Each dotted curve corresponds to a different fold-change error in R. The shaded region corresponds the interval encompassing 95% of the predicted E Obs values for each R True value, given our empirical error in R. (e) Transforming the map from R True to E True in the top and bottom panels shown in (d) produces plots showing how errors in R influence the map from E True to E Obs (top panel) and how the size of an E GSH error depends not only on the size of the error in R but also on the value of E True (bottom panel). Each dotted curve corresponds to a different fold-change error in R. The shaded region shows the interval encompassing 95% of the predicted E Obs values for each E True value, given our empirical error in R. (f) Cumulative distribution of the empirical fold error in R in live C. elegans expressing the roGFP1-R12 biosensor in the cytosol of the anterior (pm3) muscles of the pharynx, the feeding organ. This error distribution was obtained by aggregating with equal weight the empirical fold error in R of five separate experiments (see Supplementary Note 3). 95% of the errors in R fall within the interval (− 2.8%, + 2.8%), shown shaded in gray. This interval quantifies the precision of our fluorescence-ratio measurements. (g) E GSH measurement inaccuracy (the maximum absolute difference between E True and E Obs ) decreases with increased precision of R measurement. Each dotted curve corresponds to a different precision of R measurement. The shaded region shows the interval encompassing 95% of the predicted E GSH measurement inaccuracies for each E True value, given our empirical error in R. To help the community identify biosensors that are well-suited for their experimental needs, we developed a web-based tool, the SensorOverlord (https ://www.senso rover lord.org), that implements all of these analyses with a user-friendly interface.

Results
predicting the accuracy of a glutathione redox potential biosensor. In our previous work, we used roGFP1-R12 to measure E GSH in live C. elegans 12 . To map R (fluorescence ratio) measurements into E GSH values, we determined three conversion factors that quantify the properties of our imaging microscope and the spectral differences between the reduced and oxidized states of the biosensor (Supplementary Note S1). Measuring E GSH instead of R enabled us to make predictions about how the oxidation state of the network of cysteines trading electrons with glutathione is influenced by genetic determinants and environmental factors 12 . However, those predictions require that E GSH be measured accurately. Therefore, we set out to determine how the precision of our fluorescence-ratio microscopy influenced the range of E GSH values we could measure accurately.
We first modeled how errors in fluorescence-ratio measurement influenced E GSH errors. The conversion map from R to E GSH is highly nonlinear (Fig. 1c). As a result, the size of an E GSH error depends not only on the size of the error in R but also on the value of R (Fig. 1d): as R approaches its lower and upper bounds E GSH errors increase rapidly (Supplementary Note S2). Thus, even a small difference between observed and true R values (R Obs and R True , respectively) can lead to a large difference between observed and true E GSH values (E Obs and E True , respectively) (Fig. 1d).
We then determined the size of our fluorescence-ratio measurement errors. We quantified the precision of our fluorescence-ratio measurements in live C. elegans expressing the roGFP1-R12 biosensor in the cytosol of the muscles of the pharynx, the feeding organ. This retrospective analysis of 10,572 images showed that our errors in R were proportional to R-that is, R Obs = R True * (1 + error) (Supplementary Note S3). Within a given experiment, the size of the relative error in R was invariant over the range of all possible R values (Supplementary Note S3). The size of the relative error in R, however, varied up to three-fold between experiments (Supplementary Note S3). Differences in the proportion of animals moving during imaging accounted for most of the variation in the relative error in R across experiments (S.B.J., J.A.S., and J.A., manuscript in preparation). Our analysis indicated that, in a typical experiment, the median relative error in R was zero and 95% of the relative errors in R were in the interval (− 2.8%, + 2.8%) (Fig. 1f). These 95% confidence bounds quantified the precision of our fluorescence-ratio measurements.
Last, we determined how the empirical precision of our fluorescence-ratio measurements influenced the accuracy of individual E GSH observations. Knowing the precision of our R measurements enabled us to determine the 95% confidence bounds of E Obs as a function of R True (Fig. 1d). Converting R True into E True produced a map of how the 95% confidence bounds of E Obs varied as a function of E True (Fig. 1e). The maximum absolute difference between E True and either the upper or lower 95% confidence bound of E Obs represents the inaccuracy of our E GSH measurements (Fig. 1g). Our mathematical modeling indicated that the precision of R measurements, the biochemical and biophysical properties of the biosensor, and the choice of excitation wavelengths used in our experiments all influenced the E GSH values that we could measure most accurately (Supplementary Note S4). E GSH inaccuracy rapidly increased as E True moved farther away from those values.
This analysis enabled us to extract the range of E GSH values that our biosensor was well-suited to measure at a given level of E GSH inaccuracy (Fig. 1g). For example, the range of E Obs values we could measure with an inaccuracy of 2 mV was between -284 and -234 mV. This range encompassed all E GSH values we observed in wild-type nematodes under normal conditions (− 278 to − 262 mV) and under oxidative stress (− 278 to − 250 mV) 12 , indicating that our experimental set up was well-suited to measure the E GSH values that C. elegans feeding muscles exhibited in vivo: 95% of the individual E GSH observations deviated from their true value by less than 2 mV.

Balancing the need for accurate measurements with the constraints of microscopy. Our ana-
lytical framework provides a criterion for determining if it is possible to measure E GSH accurately. Scientific needs demand accurate observations, but experimental approaches constrain the extent to which observations can be made accurately. The trade-off between these scientific and experimental constraints can be visualized in a phase diagram (Fig. 2). The precision of R measurements determines the range of E GSH values that is possible to measure at a specific inaccuracy level (Fig. 2). For values outside that range, it is impossible to guarantee that an observation will be accurate. Scientific needs impose a maximum tolerable inaccuracy beyond which observations are too inaccurate and, therefore, not useful. Together, these constraints determine whether it is possible to measure E GSH accurately (Fig. 2).
Retrospectively increasing measurement accuracy with improved image analysis. To increase the range of E GSH values that we could measure accurately, we set out to improve our image-analysis methods. Movement of live C. elegans during image acquisition lowers the precision of fluorescence-ratio measurements www.nature.com/scientificreports/ in individual pharyngeal muscles. In a typical experiment 21% of animals moved during imaging. We developed a new image-feature registration algorithm that corrects for displacement and deformation of the muscles along the anterior-posterior axis of the pharynx (S.B.J., J.A.S., and J.A., manuscript in preparation). This new image-analysis algorithm reduced the relative error in R along most positions in the pharynx, especially in the boundaries between adjacent muscles and in the muscles of the anterior and posterior bulbs. For example, in the pm7 muscles of the posterior bulb, the new algorithm reduced the interval with 95% of the relative errors in R from ± 4.3 to ± 2.6% in moving animals and from ± 2.0 to ± 1.9% in stationary animals. As a result, the new algorithm increased the accuracy with which we could measure E GSH and thereby expanded the range of E GSH values that we could measure accurately in past experiments (Fig. 3a).
comparing glutathione redox potential biosensors. We determined the ranges of E GSH values that we could have measured accurately had we used different biosensors. Theoretical modeling indicated that the accuracy of a biosensor is influenced by the choice of wavelengths used for biosensor excitation, and by the biosensor's dynamic range and midpoint-potential (E 0′ , the price point where a biosensor is 50% likely to sell its electrons) (Supplementary Note S4). These biosensor physical and chemical properties vary among all existing roGFP-based biosensors (Supplementary Note S5). We estimated the conversion factors that map fluorescenceratio measurements into E GSH values for the eleven roGFP-based biosensors with published midpoint potentials and fluorescence spectra (Supplementary Note S5). This enabled us to determine the E GSH inaccuracy we would expect to observe had we measured E GSH in the feeding muscles of live C. elegans with each of those biosensors instead of roGFP1-R12 ( Fig. 3b and Supplementary Note S5). This analysis enabled us to identify which biosensors would measure E GSH most accurately under our experimental conditions: roGFP5 for E GSH values below − 297 mV, roGFP2 for E GSH values from -296 to − 258 mV, roGFP1-R12 for E GSH values from − 257 to − 240 mV, and roGFP1-iE for E GSH values above − 239 mV. We note that often many biosensors were predicted to have comparable accuracies (Fig. 3b). This analysis helped us identify underused biosensors. Neither roGFP3 nor roGFP5 has ever been used in vivo, yet we predict that these biosensors would be the most accurate biosensors for low E GSH values such as those expected for the mitochondrial matrix. We currently disfavor roGFP5, even though this biosensor was predicted to be more accurate than roGFP3, because roGFP5 can potentially form more than one type of internal disulfide bridge due to its two additional cysteines; a better understanding of roGFP5′s biochemistry is warranted given its potential utility. Balancing the need for accurate measurements with the constraints of microscopy. The empirical precision of our R measurements determines the range of E GSH values that is possible to measure at a specific inaccuracy level. Values outside that range are impossible to measure accurately (red and light red regions). Scientific needs impose a maximum tolerable inaccuracy beyond which observations are too inaccurate and, therefore, not useful (light red and orange regions). Together, these constraints determine whether it is possible to accurately measure E GSH (green region).

Scientific RepoRtS
| (2020) 10:16843 | https://doi.org/10.1038/s41598-020-73987-0 www.nature.com/scientificreports/ Comparison of the predicted accuracy of biosensors originally designed for similar purposes enabled us to identify the variables that explain why one biosensor was predicted to be more accurate than another (Supplementary Note S6). For example, both roGFP1-iE and roGFP2-iL were designed to have higher midpoint potentials than previous roGFPs, making them more suitable for measuring the higher E GSH values common in the endoplasmic reticulum 24,25 . However, while roGFP1-iE has a higher midpoint potential than roGFP2-iL, it is predicted to be more inaccurate than roGFP2-iL even for measuring higher E GSH values. The higher dynamic range of roGFP2-iL makes it a more accurate E GSH biosensor than roGFP1-iE.
identifying where new glutathione redox potential biosensors are needed. We predicted the E GSH inaccuracy that we would observe if we measured E GSH in the feeding muscles of live C. elegans with the most accurate biosensor for each E GSH value. Using a phase diagram, we visualized the trade-off between our scientific need for accuracy and the experimental constraints imposed by the precision of our R measurements and the properties of existing biosensors (Fig. 3c). This analysis indicated that we lack biosensors well-suited to measure E GSH values above − 177 mV or below − 337 mV with at least 10 mV accuracy.
A general framework to predict the accuracy of two-state ratiometric biosensors. To establish a general criterion for determining whether a two-state biosensor is well-suited to measure its input accurately, we generalized the analysis framework for glutathione redox potential biosensors to all ratiometric two-state single-ligand-binding biosensors (Supplementary Notes S1, S5, S7). To demonstrate the utility of the generalized framework, we applied it to biosensors that measure pH and small molecules, including histidine, NAD + , NADH, and NADPH. For each biosensor with a known affinity constant and fluorescence spectra, we derived the conversion factors that map its fluorescence-ratio to pH or ligand concentration (Supplementary Notes S8, S9). We then determined the pH and ligand concentration ranges that each biosensor would be well-suited to measure accurately given the precision of our R measurements and after selecting optimal excitation or emission filters for each biosensor (Fig. 4a,b and Supplementary Notes S8, S9).
Our comparison of the predicted accuracy of nine ratiometric pH biosensors identified optimal biosensors for pH measurement with dual-excitation red-fluorescent pH biosensors, dual-excitation green-fluorescent pH biosensors, and single-excitation dual-emission pH biosensors (Fig. 4a). The NADH-specific Frex biosensor 6 had a higher predicted accuracy than the FrexH biosensor 6 , as a result of its higher dynamic range (Fig. 4b). The NADPH-specific iNAP1 biosensor 7 was predicted to more accurately measure NADPH concentration than the iNAP1-mCherry biosensor (Fig. 4b). The iNAP1-mCherry biosensor sacrifices the iNAP1 dynamic range in one excitation band with pH-sensitive fluorescence, enabling pH-resistant NADPH measurement but lowering this biosensor's accuracy.
A web-based tool that predicts biosensor accuracy. To help the community find biosensors that are well-suited for their experimental needs, we developed the SensorOverlord toolkit. This open-source S4 class-based R package implements all the analyses described here. We also built a user-friendly web application, available at https ://www.senso rover lord.org (Fig. 5). The SensorOverlord toolkit enables users to model how the precision of their fluorescence-ratio signal measurements and their microscopy configuration constrain the range of input values that their biosensor is well-suited to measure accurately (Supplementary Note S10).
The SensorOverlord R package provides a set of classes, methods, and functions with which users can analyze their microscopy accuracy. Briefly, users can create a Sensor object by (1) programmatically uploading an excitation-emission spectrum, (2) inputting biophysical parameters of the biosensor, or (3) querying a biosensor database containing the excitation-emission spectra of the biosensors discussed in this manuscript. Sensor objects can then be used to generate maps between R and the predicted inaccuracy of E GSH , pH, and p[Ligand] at different levels of empirically-determined error in the measurement of R. The package also enables users to directly create Spectra excitation-emission plots, and to plot the predicted inaccuracy and predicted suitable range of any custom E GSH , pH, or ligand-sensitive two-state biosensor. We designed the package so users can not only recreate the analysis presented here, but also quickly and easily apply the SensorOverlord framework to other biosensors and experimental configurations.
The SensorOverlord web application makes the SensorOverlord R package accessible via a non-programmatic graphical user interface that can be accessed through any modern web browser. Users can generate a Sensor object by (1) selecting a biosensor from a biosensor database via a dropdown menu, (2) inputting empiricallyobtained biophysical parameters into text boxes, or (3) interactively uploading a .csv file with excitation-emission spectrum values. The application then prompts users to provide an empirical error in the measurement of R, the accuracies at which they wish to make measurements, and the excitation or emission wavelength intervals used for ratiometric imaging. Once a user inputs these parameters, they click a button to generate two figures: (1) a static plot of the suitable ranges of the current biosensor, and (2) an interactive plot of the current biosensor's measurement accuracy as a function of the biochemical parameter being measured (Fig. 5). Besides increasing the accessibility of the analysis presented here, the SensorOverlord web application enables users to more quickly and easily experiment with how modifying model parameters affects the predicted accuracy of measurements with different biosensors.
Documentation for the SensorOverlord toolkit, alongside updated links to the source code and web application, can be found at https ://apfel dlab.githu b.io/Senso rOver lord/.

Discussion
The SensorOverlord toolkit enables users to predict the accuracy of concentrations and chemical potentials derived from fluorescence ratio measurements with two-state biosensors. This tool enables users to select biosensors predicted to be most accurate for measuring specific ranges of biochemical values. The SensorOverlord also enables users to quantify the extent to which increasing the precision of their fluorescence-ratio measurements would increase the predicted accuracy of their biochemical measurements with an individual biosensor. Therefore, this tool can be used to quantify the accuracy gains resulting from improving experimental practices, and from refining image acquisition, registration, and analysis methods. A wide variety of factors can influence the precision of fluorescence-ratio measurement. In our experience, the degree of immobilization of live specimens during image acquisition can influence the precision of fluorescenceratio measurements by a factor of three, leading to large differences in the predicted accuracy of biochemical measurements. The SensorOverlord enables researchers to disclose the predicted accuracy of the concentrations and chemical potentials that they measure, simply by reporting the precision of their fluorescence-ratio measurements-similar to how manufacturers use tolerance ratings to disclose how often the quality of their products is expected to deviate from a standard. The broader scientific community may, in turn, adopt appropriate maximum tolerable inaccuracy standards for specific biochemical measurements.
Prediction of the input values of two-state ratiometric biosensors from their ratiometric fluorescence requires knowledge of conversion factors that quantify the biosensor's biochemical and biophysical properties. The values of these factors could be influenced by the cellular environment where the biosensor is expressed. Our studies with roGFP1-R12 in the cytosol of C. elegans feeding muscles showed that the biosensor's in vivo dynamic range was 7.8, slightly higher than the 5.0 dynamic range of the purified biosensor in vitro; as a result, the biosensor had a higher predicted accuracy than expected from its properties in vitro (Supplementary Note S5.2). This example highlights the need to determine those conversion factors under the relevant experimental conditions, which often is very challenging. A better understanding of how the spectral and biochemical properties of each biosensor are influenced by the temperature, pH, ionic strength, and osmotic strength of the environment surrounding the biosensor would enable better prediction of the properties of the biosensor in vivo.
We hope that the SensorOverlord motivates the development of new biosensors, microscopy techniques, and image-analysis methods, by enabling biosensor developers and users to quantify the accuracy gains that would result from modifying the biochemical and spectral properties of their biosensors and from increasing the precision of their fluorescence-ratio measurements.
Methods code availability. Mathematical modeling was performed in the R language and environment for statistical computing (v3.6.0) 26 . The web application and associated visualizations were developed with the R packages ggplot2 (v3.1.1) 27 , Shiny (v1.3.2) 28 , and plotly (v4.9.2) 29 . Source code for the SensorOverlord is available at https ://apfel dlab.githu b.io/Senso rOver lord/. Statistical analysis. All statistical analyses were performed in JMP (SAS). We tested for differences in the average R among groups using ANOVA. We used the Tukey HSD post-hoc test to determine which pairs of groups in the sample differ, in cases where more than two groups were compared. We used least-squares regression to quantify the dependency on R of the absolute error in R and the absolute relative error in R. Figure 3. Predicted accuracy of glutathione redox potential biosensors. (a) Predicted accuracy gains from improved image analysis in the pm7 (posterior) feeding muscles of live C. elegans expressing the roGFP1-R12 biosensor. Animals that moved during image acquisition showed a higher R measurement error than stationary animals. A feature-registration algorithm increased the precision of R measurements, retrospectively expanding the range of E GSH values that we could measure accurately. The colored bars denote the range of E GSH values where we have 95% confidence that an individual E GSH observation would deviate from its true value by less than the error denoted by the color of the bar. (b) Predictions of the ranges of E GSH values that we expect to measure accurately in pm3 pharyngeal muscles with eleven roGFP-based biosensors given the empirical precision of our R measurements. Coloring of bars as in (a). (c) The empirical precision of our R measurements determines the range of E GSH values that would be possible to measure at a specific inaccuracy level if we measured E GSH in the pharyngeal muscles of live C. elegans with the most accurate roGFP biosensor for each E GSH value. Values outside that range are impossible to measure accurately (red and light red regions). Scientific needs impose a maximum tolerable inaccuracy beyond which observations are too inaccurate and, therefore, not useful (light red and orange regions). Together, these constraints determine whether it is possible to accurately measure E GSH with the eleven roGFP biosensors (green region). The dotted curves correspond to the predicted E GSH inaccuracies of each of the eleven roGFP biosensors shown in (b), given the precision of our R measurements.   Figure 4. Predicted accuracy of pH and ligand-binding biosensors. Predictions of the ranges of pH (a), and histidine, NAD + , NADH, and NADPH values (b) that we expect to measure accurately in pm3 pharyngeal muscles with existing biosensors given the empirical precision of our R measurements and selecting optimal excitation or emission filters for each biosensor. The E 2 GFP biosensor can be used in two different modalities, dual-excitation green-fluorescence and single-excitation dual-emission. Differences in the predicted pH inaccuracy of this biosensor under each imaging modality arise from the differences between the values in each imaging modality of this biosensor's overall dynamic range and dynamic range in the second wavelength (Supplementary Note 8). The colored bars denote the range of values of the biosensor's biochemical input where we have 95% confidence that an individual observation would deviate from its true value by less than the error denoted by the color of the bar. p[Ligand] is the negative base 10 logarithm of the Molar concentration of the biosensor's ligand.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.