SMTracker: a tool for quantitative analysis, exploration and visualization of single-molecule tracking data reveals highly dynamic binding of B. subtilis global repressor AbrB throughout the genome

Single-particle (molecule) tracking (SPT/SMT) is a powerful method to study dynamic processes in living cells at high spatial and temporal resolution. Even though SMT is becoming a widely used method in bacterial cell biology, there is no program employing different analytical tools for the quantitative evaluation of tracking data. We developed SMTracker, a MATLAB-based graphical user interface (GUI) for automatically quantifying, visualizing and managing SMT data via five interactive panels, allowing the user to interactively explore tracking data from several conditions, movies and cells on a track-by-track basis. Diffusion constants are calculated a) by a Gaussian mixture model (GMM) panel, analyzing the distribution of positional displacements in x- and y-direction using a multi-state diffusion model (e.g. DNA-bound vs. freely diffusing molecules), and inferring the diffusion constants and relative fraction of molecules in each state, or b) by square displacement analysis (SQD), using the cumulative probability distribution of square displacements to estimate the diffusion constants and relative fractions of up to three diffusive states, or c) through mean-squared displacement (MSD) analyses, allowing the discrimination between Brownian, sub- or superdiffusive behavior. A spatial distribution analysis (SDA) panel analyzes the subcellular localization of molecules, summarizing the localization of trajectories in 2D- heat maps. Using SMTracker, we show that the global transcriptional repressor AbrB performs highly dynamic binding throughout the Bacillus subtilis genome, with short dwell times that indicate high on/off rates in vivo. While about a third of AbrB molecules are in a DNA-bound state, 40% diffuse through the chromosome, and the remaining molecules freely diffuse through the cells. AbrB also forms one or two regions of high intensity binding on the nucleoids, similar to the global gene silencer H-NS in Escherichia coli, indicating that AbrB may also confer a structural function in genome organization.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Getting Started
Installation Unzip the content of the SMTracker package. Start MATLAB and add the extracted folder "SMTracker 1.0" to the search path by either using the dialog box Set Path → Add with Subfolders or by using the command line: >> addpath(genpath(uigetdir)) >> savepath Please make sure to install the GUI Layout Toolbox v2.2.1 and add u-track/Trackmate to the path before running SMTracker.

Program overview
The SMTracker software is designed to analyse singlemolecule tracking data mainly in prokaryotic cells, but it is easily extendable to other organisms. The program offers various ways of analysing single-molecule diffusion in terms of localization, mode of diffusion, identification of various populations in the sample and molecular binding times. SMTracker combines information from tracking data and segmentation data to analyse single-molecule diffusion on a cell-by-cell and track-by-track basis.

Data structure
SMTracker handles the analysis of several single molecule fluorescence movies from different treatments or conditions in parallel. To import and match all data of an experiment, it is mandatory that the corresponding files Data are organized in a tree-like structure such that a protein can be analyzed by different conditions, e.g. condition1 and condition2. During imaging, pictures were acquired in the DAPI and the phase contrast channel, and afterwards raw movies were processed into "sum" images (temporal average of all frames), tracked ("tif") and cell contours were produced ("cell_meshes").
are named identically. To this end, phase contrast or bright field images, cell mesh files produced by Oufti or MicrobeTracker and folders generated in u-track need to be named in an ascending order and saved in the corresponding folder (Figure 1).

Folder Description cell_meshes
Folder must contain standard MATLAB ".mat" files generated in Oufti or MicrobeTracker. The SMTracker software reads the variable "cellList" and imports the variables mesh, model, box and length from each cell in the object "cellData" (see Table M4).
dapi Folder may optionally contain pictures acquired using any fluorophore as long as the file format is ".tif". User can leave this folder empty to analyse their data. phase Folder may optionally contain pictures acquired using any fluorophore as long as the file format is ".tif". User can leave this folder empty to analyse their data. raw Folder may optionally contain multidimensional or multi-image ".tif" files, which can be visualized within the SMTracker software and which were used for the tracking by u-track or TrackMate. User can leave this folder empty to analyse their data, but it is not possible to watch the movie in the SMTracker software.
sum Folder may optionally contain pictures acquired using any fluorophore as long as the file format is ".tif". User can leave this folder empty to analyse their data. tif Folder must contain the output folder from u-track. Basically, each folder contains the following files and folders (Figure 2): -movieData.mat à Stores movie information such as channels, number of frames, pixel size, time interval -seriesXXX à Folder stores each frame of the multi-image ".tif" file as single TIFF image -seriesXXX.mat à Same as movieData.mat -TrackingPackage à Stores localization / detection data in folder "GaussianMixtureModels" and single-molecule tracks in the folder "tracks". Each folder contains a standard MATLAB file accounting for its specific information (Channel_1_detection_result.mat and Channel_1_tracking_result.mat) Please use seriesXXX as the base of the filenames. Note that it is recommended to set up the data structure and insert data into this structure before starting the analysis process including tracking, cell contour determination or image processing. If u-track is the chosen software for tracking, please put the raw movie files into the "tif" folder and later move it to the "raw data" folder. Once saved in the folder called "raw", movies can be loaded and played in the SMTracker software. A general description of the individual folders and/or files stored in the data structure can be found in Table 1. If TrackMate is the chosen software for tracking (Tinevez et al., 2016), the user needs to store the output files from TrackMate (appropriately named according to the requirements above) in the condition folder as subfolders "trackmate/tracks". Please note, SMTracker does not read the TrackMate .xml session file, but the tracking file exported in TrackMate using "Export tracks to XML file". The SMTracker imports the x-and y-coordinates of the single particles and the frame number where the particle was detected.
Running the software To run the SMTracker software, launch its graphical user interface by typing in the Command Window:

>> SMTracker
Import and explore data The software opens and shows the IMPORT panel (Figure 3), which allows the user to either import data into the software or to load previously analyzed data. To start a new analysis using data from u-track, just click "Select Folder". If the source of data is TrackMate, please first provide the pixel size in nanometers [nm] and the time interval in milliseconds [ms] and then click "Select Folder". A dialog box opens and allows navigating to the folder containing the datasets. Selecting the folder "protein_name" (Figure 1) start the import of all data contained in the sub-directories "condition1", "condition2", etc. If successful, the panel will be populated with additional features that provide full software functionality (Figure 4), as explained below.
To load a dataset that was analyzed and saved before, go to File → Load, and select the file "protein_name.mat". If the tracking data produced is with TrackMate, first go to File → Type of data and select TrackMate. Then, proceed as above.  The IMPORT tab has the following components:

Menu bar:
• File: Uses the File menu to load, save, export and import, or select the type of data.  (Perrson et al., 2014) or SMMTrack (Schenk et al., 2017). Please see Additional Information at the end of the document for further instructions. o Import: Imports residence times calculated in SMMTrack. o Print figures: Use the print menu to open a dialog box, which allows saving and printing publication-ready high-quality figures (300 dpi) in different formats.
o Exit: Exit SMTracker • Help: User Manual and Information about the developers Panels: • Import data: Click "Select Folder" to search and load the protein folder. Click on "Reset" button to clear the window. • Display movie data: Uses the popup list to navigate through conditions, movies, cells and tracks. Radio buttons ("Cells" and "Tracks") and checkboxes ("Phase", "DAPI", "Sum" and "White") can be used to show different channels as background in the panel "Overlay cells and track on images", and overly them with cell contours or recorded tracks. If a track needs to be removed, it can be deleted using the checkbox "Delete". It removes the selected track from the internal data structure of the SMTracker software but not from the original tracking C B A  Table 2. • Overlay cells and tracks on images: Shows the channel selected in panel "Display movie data" as background, possibly overlaid with cell contours (in white) and tracks (in blue).
Here, individual tracks can be selected by clicking directly on them, which opens the "Track explorer" interface, providing extended information about each track.
Dwell times for a two-components fit Size of fraction belonging to τ 1 or τ 2 Once a track or a cell is selected either in the "Display movie data" panel or by directly selecting it in the "Overlay cells and tracks on images", the "Track explorer" interface appears on the right side of the main window ( Figure 5).
The "Track explorer" is composed of five panels displaying the following information: • Projection of tracks on cells: The cell containing the selected track is aligned horizontally to display the long axis of the cell parallel to the x-axis. Either the first track in the cell or the selected track is highlighted in a color-coded manner indicating the start of the track in red and the end of the track in blue. • Track movement: Scheme in which the start of the track is set to the origin (0,0) and the movement of tracks is shown relative to the origin in a color-coded manner. The circle indicated by the dashed line is the confinement radius.

Gaussian Mixture Model (GMM) tab
The Gaussian mixture model tab (Figure 6) allows estimating the diffusive properties of two molecular subpopulations across different experimental conditions (details of the GMM method are provided in Roesch et al, 2018). Briefly, the method considers the displacements of molecules in x-and y-direction between consecutive image frames and fits the resulting histograms with either a single Gaussian probability distribution function (PDF) (single fit) or with a linear combination of up to three Gaussian PDFs (double fit and triple fit), where the mixture parameter 0 ≤ ≤ 1 describes the fraction size of the slowly diffusive subgroup.
The following panels belong to the GMM tab: • Axes parameters: Change of settings for axes and histograms.
• Data & fits: Use pop-up menus to select the conditions to be plotted inside the panels on the right-hand side of the main window. The pop-up lists refer to the different folders (conditions) in which your experimental dataset is organized. By default, the software shows the first as "reference" and second one as "comparison" according to the order in the file system. The checkbox "projected data" determines whether x-and y-displacements are given in an image-centric coordinate system ("projected data" unchecked) or projected (rotated) to a cell-centric coordinate system ("projected data" checked), with the long cell axis paralleling the x-axis. The checkboxes "single fit", "double fit" and "triple fit" run the corresponding Gaussian fits to estimate the underlying parameters. The radio buttons allow the user to select the direction of the displacements: "xy" pools x-and y-displacements together whereas "x" and "y" considers the x-and y-displacements individually. When dealing with more than 2 conditions, "fix pair" checkbox enables the user to exclusively the selected conditions on the pop-up lists are pooled together for the fit procedure.
After choosing the settings, click the "Plot" button to proceed and new panels appear: • Plot Panels: Histograms of the frame-to-frame displacements in the selected direction superimposed with the PDFs of the corresponding fits.

Mean Square Displacement (MSD) tab
The most common approach to describe molecular motion is by the mean square displacement (MSD) analysis, which is recommended if only a single population of diffusive molecules is considered. MSD curves are very useful to identify the type of motion exhibited by a particle.
The following panels belongs to the MSD tab (Figure 7 The MSD can be calculated using different set of data. The radio buttons allow the user to select the source of the displacements: "xy" for a 2D displacement of the localizations between two consecutive frames, "x" and "y" for a displacement along each coordinate axis. • Axis scaling: Sets axis limits.
After clicking the button "Plot Curves", new panels appear inside the MSD tab: • MSD calculation results: Table showing

Square displacement analysis (SQD) tab
The mean squared displacement analysis provides a measurement of the population dynamics, which does not account for heterogeneous movement of single particles or molecules. Besides the introduced GMM method, the SMTracker also offers a second method to assess the diffusive behavior of multiple subpopulations in the sample. The method is based on the cumulative distribution function (CDF) of square displacements r², which represents the probability P( ², ) that a molecule remains in a circle of radius r in time t. The software allows the user to calculate up to three diffusive groups in a set of trajectories (see Roesch T. et al, 2018 for details).
The following panels belongs to the SQD tab (Figure 8): • Choose best fit: SMTracker can estimate the most approximate number of diffusive groups found in the experimental data using the Bayesian information criterion (BIC) (see Supplementary Text). If set to "Automatically", SMTracker performs the analysis without taking into account the suggestion of the user. If set to "Manually" the user can manually determine the number of diffusive subspecies in the sample.

Figure 8 | SQD tab.
After clicking the button "Calculate fit", the following result panels appear: • Axes parameters: Settings for the graphs displayed to the right. • Plotting conditions: Click on the conditions and number of diffusive groups checkboxes to select which plot to show and expand the time-lag pop-up to select which one of the four distribution function P( ², !"# ) to plot.
• CDF Plot: Empirical cumulative density function of the experimental data superimposed with the fit of P( ², ). The lower panel shows the residual differences between model and experiment as a function of ! . • SQD analysis results: This table shows the results of the analysis, such as the suggested best-fitting model, number of populations selected for the fit, the R-squared for each curve, the diffusion coefficient for both diffusive groups, their fraction size and the result of the difference between calculated CDFs hypothesis test. • Bubbleplot CDF: The plot shows the fraction size, which is proportional to the area of each bubble at the corresponding diffusion coefficients.

Spatial distribution (SDA) tab
To assess the subcellular distribution of molecules, all tracks of each condition are summarized in a heat map and a spatial distributions histogram in the spatial distribution tab (Figure 9). To this end all tracks are projected onto a unit cell of 1 x 1 µm.
• Select conditions: Choose condition and type of view. Additionally, the user can choose to visualize the heat maps in 2D or 3D or change the number of bins once the plots are shown.
After selecting a condition and cell size category, the following panels appear: • Heat map of trajectories: Spatial localization of particles according to the trajectories info.
• Heat map with mirroring: Same as above, but with axes mirroring. This representation takes into account the symmetry with respect to the transversal and longitudinal axis of the cell. • Normalized x/y axis distribution: Normalized histograms of particle positions projected onto the x-and y-axis. For fine-tuning the histogram representations, the number of bins can be increased or decreased using the corresponding buttons.

SMTracker 16
After choosing the export format (SMMTrack or vbSPT), SMTracker saves the x-and ycoordinates of the trajectories either in a ".csv" or ".mat" format. Files are stored in the folder of the protein under investigation (e.g. "protein_name" in Figure 1).

Additional Information
Run vbSPT with data generated by SMTracker Note: Only use letters and numbers in filenames. Sometimes, we observed that the vbSPT software (Persson et al. 2013) produces inhomogeneous results due to outlier data in the trajectories. To remove outliers showing a deviation of more than 10 times the standard deviation of the frame-to-frame displacement, run >> remOutlier and then start the vbSPT user interface:

>> vbSPTgui
In the "Settings" section enter the following information: -Choose one of the ".mat" files under "Input data", which were exported by the SMTracker -Click "Output data" to define the location where results are stored -Choose number of Bootstraps, runs, hidden states and minimal trajectory length In the "Parameters" section enter the following information: -Set "Units of length" to nm and "Dimensionality" to 2.
-Set the initial range of the diffusion coefficient & the dwell time.
-Click on "Generate ID" Finally click "Save" at the bottom of the window to save the runinput file ("runinput_day_month_year.m"), which stores all input data and the results. Start the analysis by clicking on "Run". After finishing the analysis to see the results type >> VB3_getResult('runinput_day_month_year.m')' Run SMMTrack with data generated by SMTracker SMMTrack (https://github.com/SMMTrack; Schenk et al., 2017) runs on most Windows distributions, but to run it on a macOS distribution, download and install Wine or WineBottler (http://winebottler.kronenberg.org/). After installation, create a folder "SMMTrack" where you store the executable file "SMMTrack.exe" and where you have to move the ".csv" files exported from the SMTracker. Start SMMTrack, go to menu "File" and choose item "read tracks .csv" to load all files in the folder. Note, sometimes ".csv" files need to be first loaded into Excel before they are accepted by the SMMTrack software.
Output format of the SMTracker software Each SMTracker session can be saved for later analysis. The SMtracker saves three different structures (data, cellstats and params), which will be further explained in the next tables. The field 'cellData' in the 'data' structure contains all relevant tracking data and is hierarchically ordered in cell arrays. The first array corresponds to the conditions of the experiment, the second layer gives access to all the movies of a condition, and the next layer gives access to the individual biological cells detected for the movie. If the user wants to enter the information stored for the first biological cell in the first movie of the first condition, the user needs to type: See Table 6 for further explanation of the fields stored in the "cellData" object.     Table 6: Explanation of the fields stored in the "cellData" object.

Additional Figures
Figure 10 | Dependence of estimated fraction α of the slow subpopulation on simulation parameters and inference method. Here, the SQD method was applied for only single frame displacements (t=1Δτ), showing that SQD and GMM methods lead to equivalent results under these inference parameters.

Figure 11
| Dependence of estimated diffusion constant D 1 (slow subpopulation) on simulation parameters and inference method. Here, the SQD method was applied for only single frame displacements (t=1Δτ), showing that SQD and GMM methods lead to equivalent results under these inference parameters.

Figure 12 | Dependence of estimated diffusion constant D 2 (fast subpopulation) on simulation parameters and inference method.
Here, the SQD method was applied for only single frame displacements (t=1Δτ), showing that SQD and GMM methods lead to equivalent results under these inference parameters.