An end-to-end software solution for the analysis of high-throughput single-cell migration data

The systematic study of single-cell migration requires the availability of software for assisting data inspection, quality control and analysis. This is especially important for high-throughput experiments, where multiple biological conditions are tested in parallel. Although the field of cell migration can count on different computational tools for cell segmentation and tracking, downstream data visualization, parameter extraction and statistical analysis are still left to the user and are currently not possible within a single tool. This article presents a completely new module for the open-source, cross-platform CellMissy software for cell migration data management. This module is the first tool to focus specifically on single-cell migration data downstream of image processing. It allows fast comparison across all tested conditions, providing automated data visualization, assisted data filtering and quality control, extraction of various commonly used cell migration parameters, and non-parametric statistical analysis. Importantly, the module enables parameters computation both at the trajectory- and at the step-level. Moreover, this single-cell analysis module is complemented by a new data import module that accommodates multiwell plate data obtained from high-throughput experiments, and is easily extensible through a plugin architecture. In conclusion, the end-to-end software solution presented here tackles a key bioinformatics challenge in the cell migration field, assisting researchers in their high-throughput data processing.

shows an overview of some of currently available software tools for the visualization and validation of high-throughput, time-lapse microscopy image data of individually migrating cells or objects [16][17][18] . Despite this table not being comprehensive, it clearly indicates that the field is still lacking a user-friendly, freely-available and open-source software that provides an end-to-end solution for automated management, processing and inspection of the data generated by the tracking software. Such a tool needs to allow for quality control, cell migration parameter extraction, and statistical comparison of different conditions. Proprietary software solutions exist (such as Metamorph, Imaris or Volocity) that provide tools to perform (manual) editing of cell trajectories, that calculate object and track statistics, and that can export selected data. These commercial tools however, are bound to the vendor's imaging systems. The plethora of free, mostly open-source, solutions for single-cell segmentation and tracking are instead focused on solving the tracking problem only, leaving downstream analysis to the user's choice 16,19 .
The absence of an open solution for automated data management, inspection, quality control, and analysis constitutes the major bottleneck in the processing of high-throughput single-cell migration experiments today, because this process remains very labor-intensive (e.g., cell trajectories are manually inspected and/or data necessarily have to be transferred between different software tools). As a consequence, these data are frequently insufficiently mined 20 . Cell speed and persistence of motion for instance, are only calculated at the cell population level, thus averaging those parameters across a collection of recorded cells, de facto losing a lot of potentially interesting information. To fill this gap, we here present an open-source solution, implemented as new module in CellMissy, for the automated quality control, inspection, and analysis of single-cell migration experiments.

Results and Discussion
This article presents a free, cross-platform and open-source solution for the automated inspection, quality control assessment and analysis of high-throughput single-cell migration experiments (e.g., in a 96-well format). The platform, developed as a new module in the CellMissy software, and illustrated in Fig. 1, takes text files containing 2D single-cell migration data from a time-lapse experiment as input. These text files take the form of spatial (x, y) coordinates in time, and can be extracted from the user's choice of cell tracking software. These data can be derived from different experimental setups. Typically, cells are sparsely seeded either on a surface or on a layer of extracellular matrix and then imaged in time. The cells can be assayed after applying a chemotactic gradient (chemotaxis) or not (random migration). Recently, microfluidic systems for chemotaxis assays have become  21 . Alternatively, cells can be seeded as a confluent layer in part of the well (e.g., in the well periphery in a cell-zone exclusion assay, or in the center in a fence assay (for more details on assays see ref. 11). Single-cell migration data can also be recorded by imaging in time the single cells escaping from multicellular spheroids or tumor organoids 22 .
Data are typically derived from a common multiwell plate experimental setup, as shown in Fig. 2, in which multiple conditions are tested in parallel, e.g., with different cell lines and/or different treatments, such as sets of chemical compounds, concentration ranges of a specific compound, selected libraries of siRNA and so on. Multiple technical replicates (i.e., multiple wells) are usually performed for each condition, and the module allows data visualization and analysis at both the condition and replicate level. Obviously, less complicated experiments can also be imported and analyzed. As shown in Figs 1 and 2, the module expects text files reporting four distinct columns: (i) the trajectory unique ID, (ii) the time point, (iii) the x coordinate (either in pixel or μ m), and (iv) the y coordinate (either in pixel or μ m). This is described in detail in a dedicated user manual at the project homepage.
Once the data associated with an experiment are loaded, the software provides the following four key functions: (i) data inspection and visualization, (ii) parameter extraction, (iii) quality control, and (iv) statistical data analysis. The following sections describe these functionalities in detail using two multi-sample data sets as examples. These data sets were selected because they were generated via different assay types (Fig. 3), using different cell types (immune cells versus cancer cells). Furthermore, these data sets reflect a different level of complexity (i.e., number of samples tested in parallel). The details for each experiment are reported in Table 2. Single-cell trajectories from a single replicate in each of the experiments are visualized in Fig. 3. The experiments were performed in phase-contrast, but other imaging techniques are of course also fully supported. The complete single-cell migration analyses for these experiments are available on figshare (see "Methods" section for details).

Data inspection and visualization.
Upon starting the single-cell analysis module, the user is presented with the "Data Inspection and Cell Tracks" view. Here, single-cell trajectories imported in CellMissy can be visualized for each replicate (i.e., each well), or across all replicates for a specific biological condition. These plots can either use all trajectories, or only a random subset (with a user-defined sample size) in case of very large data sets. The visualization can either be based on the raw cell trajectories, or on re-centered trajectories where the starting point of each track is located at the origin of the coordinate system (usually depicted in the literature as a 'rose plot'). It should be noted that the visualization of rose plots for all the biological conditions of an experiment, with identical axis scaling as provided by the module, provides a fast yet powerful way to visually compare migration behavior in diverse settings. This enables the detection of possible differences in cell migration phenotypes, e.g., upon drug treatment in the experiments used here, or differences in response to a gradient in chemotaxis experiments. Figure 4 shows this kind of visualization for the two experiments described in Table 2. Note, for example,   Table 2) and for untreated MDA-MB-231 cells (right, experiment 2, cell exclusion zone assay, Table 2). Numbers indicate individual cell trajectories. Colors indicate temporal evolution of x and y cell positions (blue: earlier in time, red: later in time). The red line in the right image indicates the border between the confluent cell layer and the cell-free zone that has shifted in time. Single cells are only identified as such in experiment 2 when they escape the expanding cell sheet.
that the three Ba/F3 BcrAbl cell conditions treated with Rock-inhibitor y27632 cover shorter distances (Fig. 4a). Furthermore, the multiwell plate of the experiment under analysis can be visualized as a heat-map where each well in the plate color encodes one of the four parameters: (i) number of cell trajectories, (ii) mean or median cell speed, (iii) mean or median cell directionality, and (iv) robust z*-score 23 . An example of this visualization is shown in Fig. 4c.    Trajectory-centric and step-centric parameter extraction. There are generally two different ways to compute cell migration parameters and therefore conduct data analysis, as schematically shown in Fig. 5. On the one hand, one can first carry out a calculation of a specific parameter for each tracked cell separately (we here refer to these as 'trajectory-centric' parameters), and then compute an aggregated value from these distributions (mean or median) (see Fig. 5, right). One migration parameter is the instantaneous displacement d i (defined in Fig. 6a). The distribution of the 'trajectory-centric d i ' or 'track d i ' of all trajectories in a cell population (plotted as probability density plot in Fig. 5, bottom right) contains these trajectory-centric values, calculated by d tot /N, where d tot is the total displacement per trajectory and N is the number of steps in this trajectory. On the other hand, a distribution and an aggregated value (mean or median) of all separate migration steps in a data set can be calculated, independent of which cell trajectory migration steps belongs to (we here refer to these as 'step-centric' parameters) (see Fig. 5, left). Both these quantification methods have their advantages and disadvantages and can therefore complement each other. Consequently, it is useful to combine these in a complete cell migration data analysis pipeline. A trajectory-centric approach is in general preferable, because this is more intuitive. In addition, this approach is required to address the heterogeneity in cell migration behavior across a cell population. Indeed, the cell population under analysis may consist of different sub-populations of cells that show different and biologically relevant migratory patterns. Such differences will be discovered by plotting the distribution of extracted migration parameters on a trajectory-centric basis, while in a step-centric approach these subpopulations will be overlooked. This is demonstrated for displacement d i in the bottom panels of Fig. 5. The density plot of the trajectory-centric d i reveals that in this data set two or more distinct cell populations are present in several replicates of the condition (see Fig. 5, bottom panels, potential subpopulations indicated with dashed arrows). This aspect finds an important application in drug design, where targeting of specific populations in a complex cell mixture (e.g., containing tumor and stromal cells) is sometimes desired 24,25 .
On the other hand, trajectory-centric parameters can be biased for several reasons 26 . First of all, the number of steps that are monitored can vary a lot between cells, and this can affect the calculated parameters. Second, if a cell population is imaged with a certain time interval, fast moving cells tend to be associated with relatively straight trajectories, high confinement ratios and low turning angles. Therefore, they could be seen as a distinct subpopulation. Researchers frequently also discard cell trajectories shorter than a certain time period from their analysis, thus possibly introducing another bias towards more slowly translocating cells.
Another source for bias in trajectory-centric parameters comes from artefacts intrinsic to the tracking algorithms. For example, both the splitting of cell trajectories as well as the re-entry of cells in the field-of-view can cause a single cell to contribute more than once to a trajectory-centric parameter. This problem is overcome in a step-centric analysis, because it does not matter to which cell a movement step belongs. Even the switching of trajectories does not constitute a major problem in this case, because only the steps involved in the switch are affected rather than the entire trajectory (as would happen in trajectory-centric analysis).
For these reasons, we have chosen to present a framework that allows for both these computations to take place at the same time, enabling researchers to use a step-centric or a trajectory-centric approach, or even a combination of these approaches. Figure 6 illustrates the quantitative parameters that are currently derived in our new single-cell analysis module. Given the (x, y) coordinates in time, each cell trajectory is described as a collection of points in the 2D Euclidean space, as shown in Fig. 6a. From these coordinates, displacement d i and (track) speed s i (in pixel or μ m and pixel/min or μ m/min, respectively) are computed as listed in Fig. 6. The distributions of the step-centric and trajectory-centric displacements, and of the cell speed for each condition at the replicate level are reported using boxplots and kernel density estimations. The latter are particularly useful for differentiating between unimodal and multimodal distributions. These two types of plots are illustrated for track speed in Fig. 7. Cell turning angles α i between successive time frames (see Fig. 6b) are computed (both step-centric and trajectory-centric) as a speed-independent indicator for directionality, and are presented through traditional as well as angular histograms (see Fig. 8). Furthermore, ep_dr, end-point directionality ratios (also known as confinement ratios or meandering indices) are computed trajectory-centrically as ratios between the net distance and the cumulative path traveled by the cell (see Fig. 6).
All of the visualizations discussed on this section can be made on both the replicate level as well as the condition level. Data quality control: applying a minimal-motility threshold using a two-step filtering approach. Artifacts are always present in data sets generated at high throughput, due to variation in positioning during automated image acquisition, errors during segmentation, and errors in defining the center of mass of segmented objects during automated tracking. As a consequence, non-moving objects (dead cells, non-cell particles, artifacts segmented due to differences in contrast, etc.) are assigned a non-zero displacement (or speed) by the image processing software. Finding and applying a lower limit or threshold on these variables is thus desired when cell tracking has been performed fully automatically. Obviously, this filtering should not remove relevant data and therefore needs to be flexible.
In the single-cell analysis module, the raw data are run through a quality control step, which is implemented with a minimal-motility filtering: only trajectories that meet a tunable two-step filter for minimal cell motility are retained for final analysis.
The first criterion in this two-step filter is step-centric: cell steps are either labelled as 'true' or 'false' if these are larger or smaller than a minimal translocation, respectively. This minimal translocation can be specified (in pixels or μ m) by the user. The second filter criterion is trajectory-centric: trajectories are only retained if they meet the first criterion in at least a specified percentage of their steps (e.g., 30% motile steps/trajectory). Based on our experience with different cell lines and inhibitors, a minimal translocation/step of 0.1-0.5 μ m (for immune cells) and 1.6-2 μ m (for cancer cells), and between 20 and 30% of motile steps/trajectory appear suitable defaults for these two criteria. Of course, these criteria may vary depending on the effect of a treatment, e.g., when using a strong inhibitor of cell migration.
The module therefore allows a range of criterion values to be tested simultaneously. Evaluation of the different settings can then be performed by visually inspecting the kernel density functions of the corresponding displacement distributions, and population trajectories. The module presents an overview of these trajectories, highlighting which ones would be retained (marked 'yes' in green), and which would be removed (marked 'no' in red) (see Fig. 9). This allows the user to qualitatively judge if filtering is useful, with the goal of removing artifacts and dead cells. For instance, in the example in Fig. 9, a minimal step threshold of 0.4 μ m, together with a minimum of 30% motile steps, removes more than 30% of the tracks for the untreated cells, which indicates over-filtering. A desired level of filtering can be applied for each condition separately, or be set globally for all conditions. Furthermore, a summary is provided for each condition, detailing the number and the percentage of retained cell trajectories, Figure 9. The quality control view of the software: two-step thresholding. The user can input a range for the threshold step size (i.e., displacement) to be tested (box on the left). The choice is guided by displaying the median and 5% percentile of the displacement distribution for the condition (across replicates) and the experiment (across conditions). Second, the user inputs a percentage of desired motile steps per cell trajectory (box on the right). The effect of each tested threshold value for the displacement can be evaluated through the density plots of the track displacement distributions (red box). For each tested value (columns), the table reports the list of cell trajectories (rows) marked in green (YES) if retained, and in red (NO) if not retained. This functionality is here shown for condition 3 of experiment 2.
to aid the selection process (see Fig. 10). In the example the threshold was set at 0.1 μ m and 30% motility; in this condition the number of retained tracks is only low (~45% retained tracks) in the conditions were high doses of the specific inhibitors (crizotinib, purple curve, lapatinib, olive green curve) were used. This conforms to the fact that these doses were partly toxic in a proliferation test (data not shown).
This flexible, two-step filtering approach can outperform a single, low cut-off-value for either mean trajectory or cell displacement when it comes to excluding true artifacts. In fact, this two-step filter allows the frequently observed phenomenon of cell pausing during migration to be taken into account (see for example the tracks shown in Fig. 3), and will retain living cells that are only motile in part of the time sequence. Nevertheless, the module also provides the simpler, single-value thresholding option, where all trajectories of cells whose median displacement is below a user-defined threshold are discarded (this single threshold is then applied across all the data sets for the overall experiment). Figure 11a shows this option for the same experiment used in Fig. 10. Figure 11b illustrates the higher stringency when using a single cut-off value for the filtering, given that for several conditions the number of rejected trajectories is higher.
Data analysis and statistics. Once the data have been controlled for quality, statistical analysis and final interpretation of the experiment can take place. Importantly, at this stage, the user can still choose to analyze the raw data (i.e., without filtering). For each biological condition, statistics on speed, turning angle, and directionality ratio are reported, together with a visualization of the parameter distributions (see Fig. 12).
Subsequently, the user can select a set of conditions on which they wish to perform statistics. Because the migration data generally display a non-parametric distribution and are skewed to higher values, a Mann-Whitney U test is implemented instead of a simpler Student's t-test. The user can choose the parameter to test (currently either speed s or directionality ratio ep_dr), and can furthermore opt to correct for multiple hypotheses testing (with either Bonferroni, or Benjamini-Hochberg correction). A table reporting p-values is then shown, and a final data visualization provides a recapitulation of the entire analysis (see Fig. 12). All figures throughout the module can be interactively customized (e.g., axis range, font size of labels) and all results (figures and tables) can be easily copied or exported for downstream reporting.

Conclusions
The new single-cell analysis module for CellMissy presented here, provides a powerful and largely automated pipeline for high-throughput, single-cell migration experiments downstream of image processing. It supports multi-parametric (speed, directionality), multiscale (step, trajectory, cell population; replicate, condition, multi-sample experiment), and quality-controlled analysis of the differences between tested conditions. It therefore enables more in-depth analyses compared to existing tools (see Table 1). Moreover, given its plugin architecture, it can be further extended and customized in the future to incorporate additional analyses. For example, given the increasing focus on 3D cell migration analysis, more parameters can be added to accommodate for xyz-trajectories. Because this module is integrated in the CellMissy data management system, it is immediately useable and works on all operating systems. Moreover, integration with CellMissy ensures that all quantitative data and associated detailed metadata are stored in a relational database, thus enabling continuous data mining as well as data export and sharing with other CellMissy users.  . Single cut-off filtering. (a) In this option, the user can simply provide a minimal trajectory-centric displacement threshold, and this cut-off is used to filter out cell trajectories whose displacement is below this threshold. This functionality is here shown for all conditions of experiment 2. (b) Table reporting the difference in stringency across the two filtering approaches.

Methods
Ba/F3 and MDA-MB-231 cells in Table 2. Here, the number of replicates/condition and the number of conditions tested in parallel are reported.
Single cells migrated into the central cell-free zone in time. Migration was monitored using phase-contrast imaging on a Cell^M Live Cell Imaging system (Olympus) every 2.5 or 20 minutes for Ba/F3 and breast cancer cells, respectively. Image processing was performed with a software tool (CELLMIA), developed in collaboration with DCILabs Belgium (http://www.dcilabs.com/). The (x, y) coordinates in time generated by the imaging processing were then automatically imported into CellMissy.
Implementation. The software here presented is a new module in CellMissy, an open-source and cross-platform package dedicated to the management, storage, and analysis of cell migration data 15 . This new module has been specifically designed to accommodate for single-cell migration data. The "Experiment Manager" of CellMissy provides the means to set up and document an experiment, i.e., specify the experimental details (cell type, assay type, treatment, etc.). Data and metadata (both related to the experimental setup and to the imaging) can be stored into a relational database through a new import module for single-cell migration data.
The new module is written in Java, and inherits the cross-platform support from CellMissy, which is available on any system that supports a Java Virtual Machine version 1.8.0 or above (Windows, Linux, and Mac OS-X). Furthermore, the new module has been built around a plug-in architecture, which means that it can be extended with additional algorithms and functionalities by any interested researchers or developers.