An automated Bayesian pipeline for rapid analysis of single-molecule binding data

Single-molecule binding assays enable the study of how molecular machines assemble and function. Current algorithms can identify and locate individual molecules, but require tedious manual validation of each spot. Moreover, no solution for high-throughput analysis of single-molecule binding data exists. Here, we describe an automated pipeline to analyze single-molecule data over a wide range of experimental conditions. In addition, our method enables state estimation on multivariate Gaussian signals. We validate our approach using simulated data, and benchmark the pipeline by measuring the binding properties of the well-studied, DNA-guided DNA endonuclease, TtAgo, an Argonaute protein from the Eubacterium Thermus thermophilus. We also use the pipeline to extend our understanding of TtAgo by measuring the protein’s binding kinetics at physiological temperatures and for target DNAs containing multiple, adjacent binding sites.

S ingle-molecule binding assays allow the interrogation of individual macromolecules from a biological process using purified components or cellular extracts. In contrast to ensemble measurements, single-molecule assays can report the order and kinetics of individual molecular interactions [1][2][3][4][5][6] . The introduction of commercial microscopes designed for singlemolecule imaging spurred wide adoption of this technology. However, the absence of easy-to-use software with automated pipelines for extracting kinetic data from an image series makes data analysis slow and tedious. Many key steps for obtaining accurate kinetic parameters from co-localization single-molecule spectroscopy (CoSMoS) images still require manual user intervention and the selection of parameters guided by user experience [7][8][9] . User-dependent parameter choice and manual inspection of images dramatically limits throughput. For example, after spots are detected via user-defined intensity and bandpassfilter thresholds, the user must still inspect the images to remove overlapping spots and false-positive events. Finally, no standard procedure exists to systematically assess the quality of the analysis. To overcome these hurdles, we constructed a pipeline for rapid processing of CoSMoS images while quantitatively assessing experimental data quality. The process automates experimental calibration and high-confidence spot detection and localization using just minutes of computational time. CoSMoS data processing is controlled through a single graphical user-interface, and the modular interface allows individual functional modules to be adjusted for a wide variety of experiments. The pipeline improves detection of co-localization experiments, data analysis speed, and experimental reproducibility.

Results
Pipeline development. Figure 1 shows the key steps in our pipeline. The package includes detailed installation instructions together with print documentation (User Manual) and a demo video (Supplementary Movie 1). The interface comprises a series of tabs, each corresponding to a step in the analysis. The user progresses left to right along, but can readily return to an earlier step, with changes propagating to subsequent steps. The pipeline uses graphics processing unit (GPU) processing to achieve rapid analysis and supports multiple graphics cards.
The first module, preprocessing, consists of electron multiplying charge coupled device (EMCCD) camera gain calibration, multichannel alignment, and drift correction (Fig. 1a, b). The gain and electronic offset of the camera determine the conversion between the number of photons recorded by the camera and the number of digital units contained in the image 10 . Current CoSMoS methods do not estimate the gain and offset of the cameras, and express signal intensity in arbitrary units. Therefore, parameters required for detection of single molecules are arbitrarily chosen by the user. Because signal-to-background ratios vary between experiments, these parameters should be adapted for every dataset. Based on calibration data, our pipeline estimates gain by exploiting the linear relationship between the noise variance and the mean intensity (see User Manual-Gain calibration), allowing automatic parameter estimation and optimal detection, localization and co-localization of single molecules.
After calibrating the gain, fields of view from the wavelength channels corresponding to the different fluorophores used in the experiment must be aligned 1,7,11 . Alignment corrects differences in rotation, scaling, translation, and shear. The pipeline addresses misalignment by estimating a ''mapping function'' to relate positions of the target locations in one camera to the mobile components in the other camera. The mapping is obtained via an affine transformation from calibration images of fluorescent beads that emit in both channels (see User Manual-Alignment of the cameras).
Next, the pipeline corrects for drift caused by movements of the stage 7,11 . To overcome the need for the traditional fiducial markers, the pipeline estimates drift based on the correlation between consecutive recorded images (see User Manual-Correction for lateral drift).
The second module, signal detection and localization, allows identification of target locations, detection of the binding complexes, and co-localization of the diffusible molecules at each immobilized target (see User Manual-Target spot detection and Co-localization analysis). Current methods identify target positions by using a bandpass-filter set by a user-specified intensity threshold 7,12 . Consequently, considerable manual effort is required to eliminate overlapping spots to prevent the signal from one target molecule from becoming conflated with that from a second, nearby molecule. Unlike methods in current use, the pipeline employs an alternative detection method that uses the photon statistics from the preprocessed images to deliver a minimum number of false-negative detections at a controlled/ fixed number of false positives 13 (Fig. 1c, d). To automatically eliminate overlapping spots, the pipeline measures the distance from each spot to its neighbors, its circularity, and its width, which enables it to quantitatively discard any spot located within 50 nm of another.
Next, co-localization events are detected. Current methods sum the fluorescence intensity of the mobile component over a small region (~0.4 μm) centered on the mapped and drift-corrected location of the target molecule 1,2,14 . Co-localization events begin with an abrupt increase and end with an abrupt decrease of the summed fluorescence of the mobile component. To avoid false positives and false negatives, the current methods measure the deviation of the center of mass of the mobile component from the target location 7,15 . However, the precision of the position estimation of the center of mass quickly deteriorates with the low signal-to-background ratios often present in CoSMoS experiments 16 . Thus, abnormally detected events persist and must be removed by visual inspection of the images corresponding to the co-localization intervals, slowing analysis, introducing subjectivity, and degrading reproducibility as noted by Friedman et al. 7 . To address this issue, the pipeline performs maximumlikelihood estimation on the target locations and on the mobile components. This yields an unbiased estimate of the position, local background, spot intensity, and spot width, together with the estimation precision that has the theoretical maximum precision 17 . Subsequently, these estimates are used by the pipeline to quantitatively score binding events and to define the colocalization intervals. The pipeline requires that authentic binding events meet three user-defined criteria: (1) the mobile component, e.g., an RNA-binding protein, must be detected within a user-specified distance of the target molecule, defined according to the average estimated co-localization precision. The distance between the mobile component and the target location is used to eliminate non-specific binding events caused by protein binding to the cover glass near a target molecule. (2) The spot width must be smaller or equal to the user-specified spot width, defined according to the width of the point-spread function of the microscope 18 . This criterion ensures that only a single mobile component is specifically bound to the target location. Finally, (3) the fluorescent signal must be above a user-specified signal-tobackground ratio, i.e., the fluorescent signal must be a specified number of times greater than the background. This criterion ensures that fluctuations in background fluorescence are not recognized as binding events. This approach also accounts for variations in field illumination, which typically are caused by the relay optics delivering light to the sample 19 . The pipeline assists the user in setting these criteria by reporting best-practice values for their dataset.
The third module, data analysis, calculates association and dissociation rates, as well as the correction for non-specific binding of the mobile component to the glass surface 7,11 . The data analysis module also estimates the number of complexes bound to target molecules with multiple binding sites (see User Manual-Analyzing binding kinetics, Correction for the nonspecific binding and Hidden Markov Models). Automated analysis of single-molecule data for targets containing multiple binding sites poses a significant technical challenge, because the single-molecule intensity and background fluorescence vary across the field of view. To achieve this, the module uses a Hidden Markov Model (HMM), to determine, based solely on probability, the number of mobile components bound to the same target molecule and the rates of exchange between the different binding states [20][21][22][23] . Multiple HMM analysis frameworks have been proposed to estimate the number of binding states using ''information criteria'' 24 . However, when binding events are rare and most target sites are unoccupied, the HMM fit is biased toward an estimate that tries to model the noise due to background fluorescence (also called an unbalanced estimation problem). Furthermore, the number of states of the HMM model is not easy to estimate, because the goodness of the fit increases with additional states.
To overcome this issue we rely on Bayesian (evidence-based) reasoning, which assumes prior knowledge and penalizes models with many parameters more severely than models with fewer parameters 25 . One of these Bayesian approaches is the maximalization of the model's (log) evidence, i.e., the probability of   Values were derived from data collected from several hundred individual DNA target molecules (indicated as number of molecules); standard error from bootstrapping is reported. « PB: not determined because k off was slower than the rate of photobleaching the data given the model. Maximization of the evidence is often analytically intractable, but an attractive Variational Bayesian approximation exists and maximizes tractable lower bound of the evidence [26][27][28][29][30][31][32][33] . This approximation, which assumes that the unknown parameters being estimated are independent of each other, was first introduced for Bayesian HMMs by Beal et al. 26,27 . The Bayesian HMM method has been successfully applied to single-particle tracking and fluorescence resonance energy transfer, assuming either a zero-mean Gaussian emission distribution 31 or a one-dimensional Gaussian emission distribution 29,32,34 . Our pipeline extends this framework and enables the estimation of multivariate Gaussians accounting for multi-dimensional, non-zero mean, Gaussian distributed variables 28 . This permits the use of state estimation in situations where variables are not independent, which is the case for the fluorescence signal and background in CoSMoS experiments (Fig. 1e).
For each module, all steps are controlled via a user-friendly interface; no knowledge of MatLab syntax or scripting is required. Results from the pipeline can be readily exported to PDF files, and processed data can be exported to MatLab or other software for further analysis. Processed data from an experiment can be saved and merged later with processed data from other replicates in order to estimate the kinetic behavior of the mobile component using a larger number of molecules. Finally, the pipeline uses scripting to save all user-defined parameters, allowing later replication of an experiment or the analysis of another dataset using previously defined parameters.
Experimental validation of the pipeline. To test the pipeline, we reexamined the binding properties of Thermus thermophilus Argonaute (TtAgo), a DNA-guided, DNA-cleaving endonuclease 35,36 (Fig. 1f-i). TtAgo binds 5′ phosphorylated, 16-nt DNA guides and targets foreign DNA in vivo 36 . TtAgo preorganizes the ''seed'' segment (nucleotides g2-g8) of the guide, pre-paying the entropic penalty for binding the target 11,35,[37][38][39] . Like other Argonaute proteins, extensive complementarity between the guide and the target allows TtAgo to reach a catalytically competent conformation that can cleave the phosphodiester bond between target nucleotides t10 and t11. Previous single-molecule measurements at 37°C of the on-(k on ) and off-(k off ) rate constants of TtAgo, guided by a 16-nt DNA corresponding to the first 16 nucleotides of the animal microRNA (miRNA) let-7, revealed that the protein accelerates target finding by > 100-times compared to the 16-nt DNA guide in the absence of the protein 11 . Target complementarity beyond the seed does not increase k on . TtAgo remains bound to a fully complementary target DNA, but rapidly dissociates from targets complementary to only the seed or the seed plus four 3′ supplementary nucleotides.
Salomon et al. 11 analyzed single-molecule fluorescence images of TtAgo binding using imscroll 7 . That method applied the spot detection procedure twice, using high and low intensity thresholds. The beginning of a binding event was scored when the intensity of the mobile component exceeded the high intensity threshold and its center of mass was detected within 180 nm of the target. The end of a binding event was scored when the intensity of the mobile component dropped below the low intensity threshold or its distance to the target was > 270 nm. Because such thresholds cannot be optimal for the entire field of view, Salomon and co-workers manually inspected each binding event analyzed, a process more time-consuming than data collection. We compared imscroll to our automated pipeline using the same single-molecule data recorded for TtAgo:guide DNA complex binding a seed-matched DNA target ( Supplementary Fig. 1). The pipeline and imscroll detected a similar number of target locations and similar on-(k pipeline on = 7.1 ± 0.1 × 10 7 M −1 s −1 vs. k imscroll on = 8.6 ± 0.1 × 10 7 M −1 s −1 ) and off-(k pipeline off = 0.6 ± 0.01 s −1 vs. k imscroll off = 1.0 ± 0.01 s −1 ) rates. Imscroll required 348 of 1274 putative single target molecules to be manually discarded; the pipeline required no user intervention.
To further test the pipeline, we replicated published experiments analyzing the effect of guide:target complementarity on TtAgo binding 11 . Using the pipeline to analyze the data gave the expected result that complementarity outside of the seed sequence has little effect on on-rate constant: fully complementary, k on = 8.5 ± 0.1 × 10 7 M −1 s −1 ; seed only, k on = 6.9 ± 0.1 × 10 7 M −1 s −1 ; seed plus four, 3′ supplementary nucleotides (guide nucleotides g13-g16), k on = 5.5 ± 0.1 × 10 7 M −1 s −1 . As expected, binding of TtAgo: guide complex to the fully complementary target was too longlived to permit its off-rate constant to be measured, because photobleaching of the guide occurred before dissociation. When the target was complementary to just seed or to the seed plus four, 3′ supplementary nucleotides, TtAgo dissociated with the similar, rapid kinetics reported previously (seed only, τ off = 1.6 s vs. seed plus 3′ supplementary, τ off = 1.5 s after binding). Thus, our automated approach, using a different method to detect TtAgo binding, calculated k on and k off values in good agreement with published results 11 .
TtAgo binding dynamics are temperature-dependent. Previous single-molecule studies examined the binding of the TtAgo:guide complex to DNA and RNA targets at 23°C 40 , 37°C 11 , or 45°C 41 , but T. thermophilus grows at 62 to 75°C 42 . Thus, knowing the effect of temperature on TtAgo binding is central to understanding the function of the protein in vivo, we measured the temperature dependence of binding kinetics of TtAgo for 285-nt DNA targets with different extents of complementarity to the DNA guide (Table 1 and Supplementary Fig. 2). Key to conducting these experiments was our development of an optically transparent sample heater ( Supplementary Fig. 3) that enables single-molecule experiments at temperatures as high as 55°C. At all temperatures tested, the TtAgo:guide complex bound the three targets with similar, near diffusion-limited on-rates (Table 1). Interestingly, mouse AGO2 RISC, which has a similar structure to the TtAgo:guide complex and also possesses endonuclease activity, finds seed-matched targets~10 times more slowly than fully complementary targets 11 . Our data suggest that TtAgo does not discriminate between seed-matched and fully complementary targets during its initial search.
The dwell time of TtAgo on a target with complete complementarity to the guide remained long and was limited by photobleaching at all temperatures tested. Although at room temperature the TtAgo:guide complex dissociated from targets complementary to the seed or to the seed plus four, 3′ supplementary nucleotides, faster than from the fully complementary target, binding events were stable, τ off~1 0 s (k off0 .1 s −1 ; Table 1). Thus, at low temperature, TtAgo displays miRNA-like binding behavior and acts like the RNA-binding, miRNA-guided mammalian Ago2 4,11,43 . However, at higher, more physiological temperatures, TtAgo displayed shorter dwell times on targets complementary to the seed or the seed plus four, 3′ supplementary nucleotides, averaging 56 ms (k off = 18.0 s −1 ) and 76 ms (k off = 13.2 s −1 ), respectively. Unlike mammalian Ago2, at near-physiological temperature TtAgo binds only transiently to seed-matched targets and requires extensive complementarity to its targets for stable binding. Our data are consistent with the idea that the primary function of TtAgo is to catalyze cleavage of DNA with extensive complementarity to its DNA guide 37 . The finding that temperature alone, absent any change in amino acid sequence, can convert an Argonaute protein with miRNA-like binding properties into one requiring extensive target complementarity for stable binding, has important implications for the evolution of Argonaute function.
Testing the pipeline with simulated data. We developed a method based on Variational Bayesian Evidence Maximization (VBEM) and Multivariate Gaussian Hidden Markov Models (MGHMM) to study binding to multiple sites on a single target without the use of additional dyes. We validated our approach using simulated single-molecule switching kinetics. The observations were modeled to match experimental conditions with the same number of states, transition rates, and fluorescence intensity and background (Supplementary Fig. 4 and Supplementary Table 1). We bootstrapped a dataset of 600,000 data points (400 traces, 1500 frames each, typical experimental conditions) and subjected it to VBEM-MGHMM analysis setting priors as described in Supplementary Table 2. The correct number of states was recovered using ≥ 6000 data points ( Supplementary Fig. 4b), and our method accurately estimated the ground truth parameters ( Supplementary Fig. 5). Bias originating from the a-priori information was observed only when using < 6000 data points ( Supplementary Fig. 5b). The transition rates, fluorescent intensity distributions and the occupancy were recovered with high precision (SD < 7 × 10 −4 , SD < 7 × 10 −4 , and SD < 5 × 10 −3 , respectively) for ≥ 600,000 data points, comparable to a standard experiment. Variational Bayesian approaches weigh the data against the prior knowledge, meaning that in small datasets, models with fewer parameters are more prone to be selected, whereas in large datasets models with too many states are more prone to be selected. This phenomenon is known as Lindley's paradox 44,45 . The propensity to select a higher model order for large datasets has been investigated in ref. 46 . Variational Bayesian approximation was also compared to other Bayesian (and non-Bayesian) approximations recapitulated in ref. 47 . Therefore, we performed an analysis of the same dataset with different priors for the fluorescent intensity and the background (Supplementary  Table 3), making the assumption that all the background and signal distributions are of equal mean and therefore overlap. We indeed observe that VBEM-MGHMM algorithm has a propensity to select higher orders ( Supplementary Fig. 6), which illustrates the importance of choosing biologically reasonable priors (see User Manual-Hidden Markov Models).
Binding of TtAgo to adjacent target sites is not cooperative. In mammals, Argonaute proteins can function cooperatively over short distances, although it is not known whether functional cooperativity reflects cooperative binding 48,49 . To further test our method on an experimental dataset, we performed multi-state analysis of TtAgo binding to DNA targets containing one, two or three binding site(s) fully complementary to the DNA guide. We could detect several TtAgo:guide complexes simultaneously bound to a target molecule, and the pipeline successfully identified the expected number of states ( Supplementary Fig. 7).
Cooperative binding of a complex to one site can either accelerate binding of a second complex at an adjacent site (increasing k on ) and/or can stabilize binding at adjacent sites (decreasing k off ). To detect differences in binding between multiple and single sites requires a dwell time (1) sufficiently long to allow observation of sequential binding of several TtAgo: guide complexes to the same target molecule, but (2) nonetheless short enough to allow observations to be made before extensive photobleaching occurs. Our standard experimental conditions do not meet these criteria, because TtAgo binding to a seed-matched target is too short to be able to observe simultaneous binding ( Supplementary Fig. 8), whereas the departure of TtAgo from a fully complementary target is slower than photobleaching (Table 1). To circumvent these issues, we used a seed-matched DNA target with deoxyguanosine in the first position (t1G). TtAgo contains a t1G binding pocket 39,50,51 , and the dwell time of TtAgo for a t1G seed-matched target is > 7-times longer (i.e., a small k off ) than for any other t1N target 40 (Supplementary Fig. 9 and Supplementary Table 4). Our DNA guide starts with deoxythymidine (g1T), excluding possible effects of introducing an additional g1:t1 base pair.
Multi-state analysis of TtAgo binding to a DNA target containing two, 7 nt-long, t1G seed-complementary sites 11 nt apart found that k on for the second site was 0.60 times smaller than for the first site (Fig. 2), consistent with a multiple independent sites model (k 2 bound on = 0.5 k 1 bound on ). Supporting this interpretation, k on for TtAgo binding to a DNA target with two t1G seed-matched binding sites separated by 56 nt was not significantly different from the k on for the adjacent sites ( Supplementary Fig. 10). Similarly, k off for the second site was 2.11 times faster than for the first site (Fig. 2), and was not significantly different from k off when the distance between the two sites was increased (Supplementary Fig. 10). As for k on , the k off values agree well with a model of multiple, independent sites in which k 2 bound

Discussion
We have developed an automated pipeline to analyze singlemolecule binding experiments. Our pipeline performs for the first time a complete statistical analysis of CoSMoS data from beginning to end without extensive, and therefore time-consuming, user intervention and reduces analysis times from several weeks for a few hundred traces to a few days for thousands of traces. Our pipeline has a user-friendly interface and is composed of three modules. The first module, preprocessing, does not constitute a novelty, as it applies established tools from localization microscopy. Our main innovation resides in the next two modules: signal detection and localization and data analysis. Our pipeline estimates the position of target molecules and of mobile components, fluorescent signal, background, and spot width with maximal theoretical certainty. Moreover, the pipeline systematically assesses the quality of the analysis and gives an estimate for this certainty in the parameters. Therefore, our method does not require heuristic choice of parameters, a process that limits the throughput and introduces subjectivity.
We validated the pipeline by replicating published results for TtAgo binding kinetics and extended these studies to other temperatures. At near-physiological temperature, TtAgo does not discriminate between miRNA-like targets and siRNA-like targets during the initial search for binding sites, but remains stably bound only to fully complementary targets.
Finally, we have established an extension to the Variational Bayesian Hidden Markov Model (VBHMM) framework. Our pipeline enables performing VBHMM analysis on multivariate Gaussian signals, which is a major extension from onedimensional zero mean processes 46 . This novel approach permits the use of state estimation in situations where observation variables are multi-dimensional and not independent. This is the case for the fluorescence signal and background in CoSMoS experiments, and can be also applied to a wide range of problems, e.g., single-particle tracking (SPT) 23,52 and single-particle tracking photoactivated localization microscopy (sptPALM) 53 . Using a VBEM-MGHMM strategy, our pipeline correctly determines the number of binding sites on a target, allowing us to discover that TtAgo binds independently to adjacent sites.
Preparation of DNA targets. Single-stranded DNA targets were generated by annealing synthetic oligonucleotides to a Klenow template oligonucleotide (Supplementary Data 1). In a typical labeling procedure, 100 pmol DNA target was mixed with a 1.5-fold molar excess of Klenow template oligonucleotide in 7.5 μl of 10 mM HEPES-KOH, pH 7.4, 20 mM sodium chloride, and 0.1 mM EDTA. Samples were incubated at 90°C for 5 min in a heat block. Then, the heat block was switched off and allowed to cool to room temperature. Afterwards, the annealed strands (30% of final reaction volume) were added without further purification to a 3′ extension reaction, comprising 1 × NEB buffer 2 (New England Biolabs, Ipswich, MA), 1 mM dATP, 1 mM dCTP, 0.12 mM Alexa Fluor 647-aminohexylacrylamido-dUTP (Life Technologies), and 0. Values are reported as mean ± standard deviation for three independent replicates the Klenow template oligonucleotide. The entire reaction was precipitated overnight at -20°C with three volumes of ethanol. The labeled target was recovered by centrifugation, dried, dissolved in loading buffer (7 M Urea, 25 mM EDTA), and incubated at 95°C for 5 min. The samples were resolved on 6% polyacrylamide gel and isolated by electroelution.
Single-molecule experiments. Fresh cover glasses were prepared for each day of imaging. Cover glasses (Gold Seal 24 Å~60 mm, No. 1.5, Cat. #3423), and glass coverslips (Gold Seal 25 Å~25 mm, No. 1, Cat. #3307) were cleaned by sonicating for 30 min in NanoStrip (KMG Chemicals, Houston, TX), were washed with ten changes of deionized water and were dried with a stream of nitrogen. Two~1 mm diameter lines of high vacuum grease (Dow Corning, Midland, MI) were applied to the cover glass to create a flow cell. Three layers of adhesive tape were applied outside of the flow cell. The coverslip was placed on top of the cover glass, with ã 0.3 mm gap between the cover glass and coverslip. To minimize non-specific binding of protein and DNA molecules to the glass surface, microfluidic chambers were incubated with 2 mg ml −1 poly-L-lysine-graft-PEG-biotin in 10 mM HEPES-KOH, pH 7.4 at room temperature for 30 min and washed extensively with imaging buffer (30 mM HEPES-KOH, pH 7.9, 120 mM potassium acetate, 3.5 mM magnesium acetate, 20% [w/v] glycerol) immediately before use. To allow immobilization of biotinylated DNA targets, streptavidin (0.01 mg ml −1 , Sigma) was incubated for 5 min in each microfluidic chamber. Unbound streptavidin was washed away with imaging buffer.
Data acquisition. A syringe pump (KD Scientific, Holliston, MA) running in withdrawal mode at 0.15 ml min −1 was applied to the flow cell outlet to introduce TtAgo:guide complex (pre-heated to 23, 37, 45, or 55°C) supplemented with an oxygen scavenging system and triplet quenchers. Continuous acquisition of frames began when the TtAgo:guide solution was introduced. Typically, 1500-8000 frames were collected at 5-67 frames s −1 .
Imaging was performed on an IX81-ZDC2 zero-drift inverted microscope equipped with a cell^TIRF motorized multicolor TIRF illuminator with 405, 488, 561, and 640 nm 100 mW lasers and a 100× , oil immersion, 1.49 numerical aperture UAPON TIRF objective with FN = 22 (Olympus, Tokyo, Japan). Alexa555 and Alexa647 molecules were excited with only the 561-nm laser, as the presence of 17 Alexa647 dyes on the target produces sufficient signal at the lower wavelength. Use of a single laser ensured that both dyes were excited within the same focal volume. Fluorescence signals were split with a main dichroic mirror (Olympus OSF-LFQUAD) and triple emission filter (Olympus U-CZ491561639M). The primary image was relayed to two ImagEM X2 EM-CCD cameras (C9100-23B, Hamamatsu Photonics, Hamamatsu, Japan) using a Cairn three-way splitter equipped with a longpass dichroic mirror (T635lpxr-UF2, Chroma) and bandpass filters (Chroma 595/50) in front of the ''green'' camera. Illumination and acquisition parameters were controlled with cell^TIRF and MetaMorph software (Molecular Devices, Sunnyvale, CA), respectively. The TIRF imaging system was isolated from floor vibrations with a Micro-g laboratory table (Technical Manufacturing Corporation, Peabody, MA).
A digitally-controlled heater (TP-LH, Tokai Hit) maintained objective temperature at 40°C (except when experiments were performed at 23°C; in this case the heater was switched off). A custom-fabricated heating stage ( Supplementary  Fig. 3) was heated to 45, 55, or 80°C to achieve sample temperatures of 37, 45, or 55°C, respectively. Temperature on the surface of the cover glass was independently monitored with a Type E, 0.25 mm O.D. thermocouple (Omega Engineering Inc., Sutton, MA) inserted between the top and the bottom cover glasses. All the experiments were performed at 37°C, unless otherwise stated.
Custom-fabricated heating stage. The heating stage was developed at University of Massachusetts Medical School, Worcester, MA, USA. Eventual intellectual property rights will be hold by University of Massachusetts Medical School. Two surface heating elements (SRMU100101; Omega) were coupled with thermal paste to a custom-built aluminum slide-holder that heated the sample slide and a ½-inch thick fused silica optical flat (#01-913-000; Edmund Optics). The optical flat allowed the sample to be uniformly heated from the top while allowing scattered light to exit the sample. The heating elements were controlled by a proportionalintegral-derivative (PID) controller (ITC-106VH; Inkbird) through a solid-state relay (SSR-40DA, Inkbird). The temperature feedback loop used a K-type thermocouple that can be placed at the sample or at an intermediate heating stage. In the latter case, the temperature at this intermediate stage (which is kept constant by the PID controller) must be measured such that it corresponds to the desired temperature at the sample. To increase temperature uniformity throughout the sample, the objective was heated using a heating collar (Tokai Hit TP-LH) set to the maximum temperature specified in the safe-operation range for the objective. The heated aluminum sample holder assembly was clamped to the slide-holder (Prior Scientific, H473XR) using an adapter (custom-fabricated from polyoxymethylene [DuPont ''Delrin'']) that provides stable mounting and thermal insulation from the microscope body. The target sample temperature was tested by thermocouple at various temperatures ranging the sample from 37 to 55°C. In principle, the maximum temperature is limited by the approved temperature rating of the objective. Drawings and CAD-models of the custom parts and the stageheater assembly are available at [https://www.thingiverse.com/thing:2791422].
Data analysis. Images were recorded as uncompressed TIFF files and merged into stacked TIFF files. Images were processed using the pipeline (see User Manual-Data processing and Analysis). First, 100 images of a grid slide and of background were used to estimate the gain of CCD cameras 13 . Second, ten images of fluorescent streptavidin-labeled microspheres (Life Technologies F-8780) were used to determine alignment of images from multiple wavelength channels. Third, lateral drift of the surface was determined for each frame using target molecules as immobilized markers. Locations of target molecules were picked in the first frame acquired by performing a Generalized Likelihood Ratio Test in each pixel 13 . Large clusters of positive pixels where filtered out, but all identified spots were visually inspected, and locations corresponding to multiple target molecules were removed. To obtain binding traces in all frames the identified locations were fitted using Maximum-Likelihood Estimation. Co-localization events required that (1) the intensity of TtAgo complex > 150 photons, (2) ratio intensity of the TtAgo:guide complex to the local background > 1, (3) the distance between the target and guide was < 1 pixel, and (4) σ gf < 4.6. To exclude short, non-specific events, the minimal event duration was set to 2-5 frames. To overcome short temporary loss of TtAgo fluorescent signal due to blinking of the fluorescent dye, the gap parameter was set to 2-5 frames. Only the first binding event at each target location was used for estimation of arrival time and dwell time, in order to minimize errors caused by occupation of sites by photobleached molecules. The same analysis was automatically performed on ''dark'' locations, i.e., regions that contained no target molecules; these served as a control for non-specific binding of TtAgo complex to the surface of the cover glass. The analysis was scripted to ensure reproducibility of user settings. The individual experiments were saved, combined, and error evaluated by 1000-cycle bootstrapping of 90% of the data.
To calculate the number of binding sites, VBEM-MGHMM analysis was first performed with priors manually estimated from fluorescence intensity time traces (see User Manual-Hidden Markov Models). The starting point of the signal and background priors, m, is set to the mean signal and background of a single binding event of TtAgo. The starting point of priors κ (variance of the Gaussian variance of signal values), v (variance of the prior on the variance of the signal values), and W 1/2 (mean of the prior on the variance of the signal values) for model order selection are set to 10. Subsequently, the estimated prior parameters (m, κ, ν, and W 1/2 ) are used to automatically segment the traces with a correct model order 44 .
Testing the pipeline with simulated data. Single-molecule switching kinetics was modeled to match experimental conditions with the same number of states, transition rates, and fluorescence intensity and background. Supplementary Table 1 provides the parameters used to generate a dataset of 600,000 data points (400 traces, 1500 frames each). The dataset of 600,000 data points was bootstrapped to generate sub-datasets of 750; 6000; 12,000; 18,000; 24,000; 30,000; 60,000; 120,000; 240,000; 360,000, and 480,000 data points. The dataset and the sub-datasets were then subjected to VBEM-MGHMM analysis setting priors (Supplementary Table 2). To illustrate the importance of choosing biologically reasonable priors, the dataset and the sub-datasets were subjected to VBEM-MGHMM analysis setting different priors for the fluorescent intensity and the background (Supplementary Table 3).