Estimating numbers of intracellular molecules through analysing fluctuations in photobleaching

The impact of fluorescence microscopy has been limited by the difficulties of expressing measurements of fluorescent proteins in numbers of molecules. Absolute numbers enable the integration of results from different laboratories, empower mathematical modelling, and are the bedrock for a quantitative, predictive biology. Here we propose an estimator to infer numbers of molecules from fluctuations in the photobleaching of proteins tagged with Green Fluorescent Protein. Performing experiments in budding yeast, we show that our estimates of numbers agree, within an order of magnitude, with published biochemical measurements, for all six proteins tested. The experiments we require are straightforward and use only a wide-field fluorescence microscope. As such, our approach has the potential to become standard for those practising quantitative fluorescence microscopy.

In fluorescence microscopy, converting measurements of fluorescence into numbers of molecules is a long-standing challenge 1 . This deficit limits our ability both to combine fluorescence measurements from different experiments and to apply quantitative analyses of time series that often must assume known numbers of proteins [2][3][4] .
Although fluorescence standards 1 and the collection of techniques known as fluorescence fluctuation spectroscopy -the most well known of which are fluorescence correlation spectroscopy (FCS) 5,6 and analysis of photon-counting histograms (PCH) 7 -provide a solution, neither are yet commonly adopted to calibrate, for example, time-lapse imaging in cell and systems biology 8 .
Instead, fluctuation-based methods have been developed, such as those that measure fluctuations in the distribution of fluorescent proteins between daughter cells at cell division [9][10][11] . These approaches, however, have been applied mostly to bacteria, are unsuitable for non-dividing cells 12 , and do not straightforwardly extend to species that exhibit differences in size between mothers and daughters.
A second approach is to study fluctuations in stochastic processes of decay. Inhibiting translation and transcription has allowed the fluorescence per molecule to be estimated in mammalian cells 13 , but the stability of fluorescent proteins can make these experiments time consuming. An alternative technique is to deliberately induce photobleaching 14 : the process by which fluorophores cease to fluoresce when continuously excited. This method has been applied in vitro 15,16 and to bacteria 17 , but the analysis relies on photobleaching exhibiting an exponential decay 14 , which is expected for single molecules but not necessarily for the fluorescence of cells 18 .
Here we develop a method for estimating numbers from deliberately photo-bleached cells that works on the wide-field microscopes used for time-lapse imaging and requires no specialized equipment. We verify our approach using six different proteins tagged with Green Fluorescent Protein (GFP) in budding yeast. In all cases, our estimates of the numbers of molecules are within an order of magnitude of estimates made using biochemical techniques, such as quantitative Western blotting and mass spectrometry.
Before beginning, we remind the reader that fluctuation analyses work because the magnitude of fluctuations in fluorescence are determined not by the concentration of the fluorescent molecules but by their numbers. A fluorescent measurement, Y, is given by the product of the brightness per molecule, ν, and the number of fluorescent molecules, X, and is therefore agnostic to their individual values. If X varies with time, the magnitude of the fluctuations in the underlying biochemical process changing X, however, typically scale with the mean 19 : . Therefore the variance in the fluorescence, scales as  Figure 1. The magnitude of fluctuations is determined by numbers of molecules, which we illustrate by simulating stochastic exponential decay for both a small number of molecules in red and a large number in blue. By normalizing to the same starting value, these simulated time series show fluctuations of a different magnitude depending on their initial numbers of molecules. The mean behaviour is common (black line), but the data for low numbers of molecules (red) shows larger deviations from the mean than the data for high numbers of molecules (blue). with constants a and e for time scales λ and time Δt between measurements (here 10 s). All data have been corrected for autofluorescence using wild-type cells that do not express GFP. Right: bleaching is heterogeneous across the population of cells. There is substantial cell-to-cell variation in the best-fit values for  1 and  2 .
An estimator for the number of molecules. We therefore propose instead an estimator that is insensitive to the details of the photobleaching. We motivate the estimator by first considering bleaching with a single rate constant, λ: Using the linear noise approximation 19 and writing = λ for the probability of a fluorophore remaining unbleached at the time t of interest, we can show (Methods) that from x 0 molecules at = t 0 Dividing Eq. 3 by x 0 2 implies that the magnitude of these normalized fluctuations in X x / t 0 scale as x 1/ 0 (Fig. 1). Using Eq. 2, we can replace  in Eq. 3, which holds for all t. If we ignore measurement noise then fluorescence Y X ν = , and multiplying Eq. 4 by ν 2 , assumed to be the same for each molecule, gives t t t 0 0 and so our estimator for x 0 is and no measurement noise, we can show that x 0 is a maximum likelihood estimator of x 0 (Methods).
To gain intuition about when the estimator should be reliable, consider the extreme case where each molecules has its own brightness, ν i , and bleaches with its own rate. This bleaching should be independent and if  i is the probability of molecule i remaining unbleached at a time t, then The ν i and  i for each molecule are unknown, and to proceed we assume that they are samples from a probability distribution, because there are x 0 terms in the sum. We can write Eq. 8 as www.nature.com/scientificreports www.nature.com/scientificreports/ and again is most easily satisfied midway through the bleaching.
In summary, our estimator, Eq. 6, can work even in the completely heterogenous case where each molecule has its own brightness and rate of bleaching, but is more sensitive to variation in ν than in . Measurement noise if sufficiently high will cause underestimation and if too high will, as expected, undermine accuracy.
Using the estimator. After correcting for autofluorescence, flatfield (inhomogeneous illumination), and background (Methods), the data set comprises n r cells with a time series of n d fluorescence values for each cell. We initially analyze the cells one at a time. We consider the time series for the first cell as a collection of pairs of measurementsy y ( , ) t 0 -with one pair for each positive time point and apply Eq. 6 to each pair. To find E[Y t ] and Var[Y t ] in Eq. 6, we smooth the time series using a Gaussian process 25 . From the smoothed time series, we estimate E[Y t ] for all t as the mean of the Gaussian process. Given this mean, we estimate Var estimates of x 0 for that cell. We repeat this process for all cells and therefore obtain n n ( 1) r d − estimates of x 0 in total, which we pool. We take the mode of the resulting distribution as the best estimate for the number of molecules.
Comparison with biochemical estimates of protein numbers. To verify our method, we compare our results with biochemical measurements of the numbers of molecules. We selected six proteins from budding yeast that have a range of absolute numbers -Fus3, Hog1, Guk1, Def1, Gpm1, and Pgk1 -and subjected GFP-fusions of these proteins 20 to a photobleaching analysis (Methods). We compare with a unified data set from 19 separate biochemical experiments 26 , involving either mass spectrometry, GFP-tagging and microscopy, or TAP-tagging with immunoblotting.
For all proteins, there is an agreement within an order of magnitude between the two approaches ( Fig. 3) showing that the measurement noise is not prohibitively large, at least for yeast.

Discussion
A challenge in quantitative fluorescence microscopy is converting measurements of fluorescence to absolute units. Absolute units enable the pooling of data from different laboratories and are needed both for models to fit single-cell data 4 and to validate the values of fitted parameters 27 . As such, absolute units facilitate the long-term success of systems and synthetic biology 28,29 .
We have presented a fluctuation analysis to estimate numbers of fluorescent proteins by deliberately photobleaching cells on a wide-field fluorescence microscope. The method provides a straightforward calibration for time-lapse imaging.
Although our estimate is within an order of magnitude of the numbers estimated using biochemical methods, it is often an underestimate (Fig. 3). This behaviour is consistent with a sufficiently large measurement noise ] -will follow a sufficiently large fluctuation whereas the true mean will not. A too small Var[Y t ] overestimates the numbers of molecules. In practice, the measurement noise appears sufficiently high to disrupt large fluctuations so that our estimates of E[Y t ] do approximate the true mean, but not too high to undermine using Eq. 6.
We might expect an underestimation too because not all tagged proteins fluoresce. Considering the extreme case of a (long) maturation time of the fluorescent protein of 45 minutes and that its degradation is caused by growth with a (short) doubling time of 100 minutes 30 , then we expect the fluorescence at steady-state to be at least  + . 1/45/(1/45 1/100) 0 7 of the total amount of protein. This fraction is, however, large enough that the order of magnitude of our estimate is unchanged and lies within the expected error (width of the blue distributions in Fig. 3).
It is surprising that more principled methods, such as Bayesian inference, work poorly (Methods). Although a single exponential decay does not describe photobleaching in yeast (Fig. 2), we know neither how many exponential decays do nor even if exponential decay itself is correct. Further, both the brightness and time scales associated with any exponentials may be the same for all cells in the population or specific to each cell. Compounding the uncertainty in the model of bleaching, there is no consensus on a model for the measurement noise for fluorescence microscopes, which has been described as normal 10 , log-normal 4 , and with a variance that is a function of fluorescence 31 (and Methods).
Measurements both in single cells and in absolute numbers are necessary to bring discoveries from different laboratories into one predictive framework. Our method of calibrating fluorescence measurements will help enable such a quantitative, single-cell biology.

Methods
Selecting proteins to study. When selecting proteins to test, we looked at three whole-proteome datasets of absolute numbers of protein obtained by quantitative Western blot 32 , mass spectrometry 33 , and fluorescence microscopy 34 . Proteins were selected that appeared in all three data sets. To obtain proteins whose levels would be robust to any stresses from our fixing procedure, we used a protein localisation atlas 35 to select cytoplasmic proteins that showed no significant fold change under starvation and dithiothreitol-and peroxide-induced stress. From these proteins, we selected four giving a broad range of levels: Def1, Guk1, Gpm1, and Pgk1. To these proteins we added Hog1 because of its regular study in our laboratory 36 and Fus3 because of the availability of a measurement by fluorescence correlation spectroscopy (FCS) 37 . Taking a cytoplasmic concentration of 180 nM for Saccharomyces cerevisiae cells and a three times higher concentration for the nucleus 37 along with cellular and nuclear volumes of 42 μl and 3 μl 30 , we estimate 5,200 molecules of Fus3 per cell from this FCS data.
During the course of our work, however, a comprehensive collection of whole-proteome data sets was published 26 , and this data is the data we use for comparison. Our estimate for Fus3 from the FCS data is within the range of values reported in this data set.

Cell preparation.
Cells from the open-reading-frame GFP collection 20 were grown overnight in YEPD (2%) media past the diauxic lag, and 0.5 ml of this culture diluted in 5 ml of fresh media. After 5 hours and at an OD of 0.5, the cells were fixed 38 . Microscopy and image analysis. All experiments were performed on a Nikon Eclipse Ti inverted microscope, controlled with the Perfect Focus System and custom MATLAB scripts (Mathworks) written for Micromanager 39 . We used a 60X 1.2 NA water immersion objective (Nikon), and images were acquired using an Evolve camera (Photometrics) with a 512 × 512 sensor in CCD mode. Cells were adhered to slides using concanavalin A.
To photobleach, the GFP excitation LED was kept at full power for the duration of the experiment with cells imaged for fluorescence for 400 ms every 10 s. We repeated this procedure for multiple, isolated fields of view and for wild-type cells, which do not express GFP.
Given that we imaged fixed cells, cells were selected and segmented at the first time point from a bright-field image and whole-image registration used to propagate this outline to other times. Cells in the bright-field image were chosen by eye to be isolated, well focused, present for the whole experiment (i.e. not washed away), and in a region where the illumination intensity (taken from the flat field correction) was at least 80% of the median illumination. Selected cells were outlined based on the out-of-focus bright-field image using custom MATLAB scripts, and these outlines curated by hand.
Cell fluorescence was calculated as the sum of the values of the brightest 80% of the pixels within the cell boundary. This measure reduced the effect of movements of the stage. We discarded cells that displayed large systematic deviations (such as a sudden drop in fluorescence).
Correcting for flat field and background. Fluorescence images were corrected for both flat field and background. We obtained a flat-field image by flowing 0.001% fluorescein (by mass) through a microfluidic device 40 and imaging multiple positions over a time course. Any microfluidic features were 'blotted out' of the images, and the modified images averaged and normalised to a median of one. A correction was applied to the fluorescence images through an element-wise division by the flat field. To remove any background particular to a slide, we subtracted the average pixel fluorescence for a region of the image containing no cells from each pixel value. Fluorescence images were also registered to correct for drift in the field of view.
Correcting for autofluorescence. Cells were also corrected for autofluorescence. We corrected bleached wild-type cells for flat field and background, and their average value was subtracted from all fluorescent cells at each time point.
Inference fails assuming mono-exponential decay. Analysing the data using a previously proposed estimator 14 and following the procedure of Kim et al. 17 , we find that numbers of proteins are underestimated by several orders of magnitude (the maximum number of molecules over all six proteins is approximately 150). This failure is because that method both uses a poor approximation to the mean behaviour of the photobleaching of individual cells (the means in our data are not well described by exponential decay with one time scale) and its sensitivity to the quality of the fit even when the photobleaching is a decay process with a single time scale.
A maximimum likelihood estimator for x 0 . If there is negligible measurement noise, the data comprises We use the identity to evaluate the integrals. First, we perform the integral over μ x using the first delta function in μ and then over x Integrating over , we have

estimating e[Y t ].
We smooth each cell's time series, y t , to estimate its mean, Y To perform the smoothing, we use a Gaussian process with a covariance given by a squared exponential function - , and θ 2 , which controls the magnitude of the estimated measurement noise, being restricted to θ < < 10 1 0 2 10 . We use an implementation in Python 41 .
Using simulated data and normally distributed measurement noise, we find that there is an optimum range for the magnitude of the measurement noise. If measurement noise is too low, the estimated mean follows fluctuations in the data and at times is closer to the data than the true mean. Therefore Var[Y t ] in Eq. 6 is too small, and the estimate of x 0 is too large. If measurement noise is too high, the estimated mean can sometimes be further from and sometimes closer to the data than the true mean, but the estimator in any case then underestimates (Eq. 9). For intermediate magnitudes of the measurement noise, Eq. 6 is accurate, and the measurement noise prevents fluctuations being long-lived and so corrects for bias in the estimate of E[Y t ] and Var[Y t ].
The estimator can perform poorly if the time scales of bleaching are shorter than the the duration of the experiment. Then, all the molecules are bleached at late times, and the resulting noisy data can undermine the estimate of E[Y t ]. The low numbers of molecules also mean that the linear noise approximation and Eqs 2 and 3 are potentially no longer valid.
Inference with an explicit model of bleaching and measurement noise. In general, given data www.nature.com/scientificreports www.nature.com/scientificreports/ The dynamics of the protein numbers, x, are determined by chemical reactions and can be described by a master equation for P t x ( , ), the probability distribution of the state x at time t 19 . We use the linear noise approximation (first-order terms in an expansion of the master equation in the size of the system -the volume of a cell 19 ). This approximation makes P t x ( , ) a normal distribution if the initial distribution is either a normal or a delta function.
If we let the stochiometric matrix be S and the hazards (propensities) be h and noting that x describes numbers of molecules not concentrations, then where  denotes a normal distribution and μ(t), its mean, and t ( ) Σ , its covariance matrix, obey 24 T and T T with J as the Jacobian: ij j k ik k and H as a matrix of zeros with t h( , ) μ on the diagonal. We let the biophysical model have two decay processes (Fig. 2). For each cell, indexed by j, there are two pools of fluorescence proteins, x j 1, and x j 2, , which bleach at constant, but cell-dependent, rates, λ j 1, and λ j 2, : where initial values have superscripts of zero. We use a measurement noise that depends on the numbers of fluorescence proteins giving a standard deviation that scales with the mean. If y t ( ) j is the measured fluorescence then  ν σ ν σ ν σ for constant e i , σ and any residual autofluorescence f j . This model does not have the positive skewness of log-normal noise and has support for negative fluorescence values, which we observe after correcting images for background fluorescence.

Equations 23 and 24 then give
Linear noise and sequential data -a Kalman filter. We wish to infer θ given Y t . Bayes's rule states: t t θ θ θ | ∝ | or, by using the rules of probability to factorize the likelihood, t i t i i 1 1 We sequentially find each term in Eq. 31 by considering the dynamics of x from one time point to the next and then correcting that dynamics given the observed data 24 . Assume that at time is normal with a known mean i 1 ⁎ μ − and covariance matrix i 1 Σ − ⁎ : Using the linear noise approximation for the dynamics of x, we can, with Eq. 32 providing the initial condition, integrate Eqs 23 and 24 over one time interval to time point i to find μ i and Σ i and that www.nature.com/scientificreports www.nature.com/scientificreports/ We next wish to extend the conditioning in Eq. 33 to include the data point, y i , at time point i. To do so, note that P P y P y P P y using Bayes's rule, conditioning on − Y i 1 and θ, and assuming that the measurement noise only depends on the current value of x. If the measurement noise too has a normal distribution with U being a constant projection matrix and V being a covariance matrix, then we can simplify Eq. 34 using the properties of normal distributions. We find that P x Y ( , is also normal with a mean μ i ⁎ and a covariance Σ i ⁎ that satisfy 24 The predictions of μ i and Σ i found from μ − and are therefore the normalizing factors for Eq. 34, satisfying Hence from a normal prior distribution for x 1 , θ | P x ( ) 1 , we use Eq. 38 to find θ | P y ( ) 1 , the first term in the factorization of the likelihood (Eq. 31) and Eq. 36 to find P y x ( , ) 1 1 θ | , the starting normal distribution in Eq. 32 for the sequential inference.
To specialize the algorithm to photobleaching, the matrices in Eq. 35 are ν ν = U [ ] and σ = V e 2 , a constant. We subtract the autofluorescence, f j , from each data point before applying the Kalman filter. The Kalman update, Eq. 36, can result in unphysical, negative components of μ, which we set to zero.
We must specify a prior P x ( ) m 1 θ | to begin the inference scheme. To do so, we introduce two parameters: x 0 , which is the total amount of fluorescent protein at t 0 = , and α, which is the partitioning of this fluorescent protein between the two pools. We infer both these parameters. For the Kalman filter, we require that P x ( ) m 1 θ | be a normal distribution, but we wish to start x 1 at a known value and so use ⁎ ⁎ x 1 and 0 0 0 0 We distinguish between heterogenous parameters, which are specific to each cell (λ 1 , λ 2 , f, x 0 and α), and homogeneous parameters, which have the same values for all cells (the σ e,i and ν). The homogenous parameters and x 0 have scale-free priors: for example, P( ) 1/ ν ν = . The heterogenous parameters other than x 0 have flat priors. All priors are proper and bounded to physical values. To ensure the expected behaviour is dependent only on (2019) 9:15238 | https://doi.org/10.1038/s41598-019-50921-7 www.nature.com/scientificreports www.nature.com/scientificreports/ the heterogeneous parameters and to improve the mixing of the Markov chain, we propose the combination νx 0 , referred to as y 0 , rather than x 0 .
To generate samples from the posterior, we use a Metropolis-within-Gibbs scheme 42 with the heterogeneous parameters updated separately from the homogeneous parameters. We employ adaptive parallel tempering to accelerate mixing in the Gibbs sampler 43 , which performs well on benchmark biochemical models 44 . Specifically, we use 10 chains with their temperatures chosen adaptively. Parameters are proposed independently, with λ 1 , λ 2 , f and α proposed from normal distributions and y 0 , the e i σ , and ν proposed from log-normal distributions. We adaptively select the scales for the proposal distributions for the heterogeneous and homogeneous parameters.
To start the Monte Carlo method, we try to find parameters that maximize the likelihood We use a nested optimisation scheme 45 : 1. We find starting values for the heterogeneous parameters by fitting a bi-exponential decay to each cell's time series. 2. All parameters, including the heterogenous parameters, are then fitted for each cell, independently of all other cells, using a particle swarm. 3. We create an initial parameter set for the homogenous parameters by taking the median of the homogeneous parameters from the individual fits of Step 2. 4. We perform iterative optimisation: first optimising the homogeneous parameters with the heterogeneous parameters fixed then vice versa until a maximum is reached. Each individual optimisation uses Matlab's fmincon. 5. To provide diverse starting points for the chains, half of our chains are initialised at the parameter values found in Step 4 and the other half are initialised from optima found by performing the iterative optimisation of Step 4 from random parameters rather than from those found in Step 3.
Estimating the expected numbers of proteins. Given a sample of parameters from the posterior distribution P Y ( ) θ| , and a fluorescence measurement y, we would like to find the posterior distribution of the number of proteins, x:  www.nature.com/scientificreports www.nature.com/scientificreports/ We assume that the measurement y does not change the posterior probability of θ (because there is substantially more data in Y): , we can use that the distribution of a sum of independent normal variables is also normal with a mean equal to the sum of the means of the variables in the sum and with a variance equal to the sum of the variances 19  where the kernel functions are isotropic log normal distributions with variance σ K 2 as before. In principle, these kernel density estimates allow Eq. 47 to be evaluated at any θ; in practice, we use a Metropolis algorithm to sample θ to overcome the dimensionality of the parameter space.

Results
Our results give a lower bound on the number of molecules (Fig. 4). Note that here we ignore the first five time points of each time series because these points can have unexpectedly large fluctuations. www.nature.com/scientificreports www.nature.com/scientificreports/ Modelling measurement noise. To determine the importance of the different contributions to Eq. 29, we used an informative log-normal prior for the parameter ν with its mode equal to the median of the value of ν estimated from the biochemical data on numbers and its empirical standard deviation equal to half of their interquartile range. Repeating the inference with this prior, we find that 19  These results imply that the term proportional to the number of molecules in the variance of the measurement noise dominates the constant term. For all the proteins studied, the linear term (proportional to e,1 σ in Eq. 29) is at least an order of magnitude larger that the constant term (σ e,0 ), and the quadratic term (proportional to σ e ,2 ) dominates for proteins with high numbers (Gpm1 and Pgk1).