MIST: Accurate and Scalable Microscopy Image Stitching Tool with Stage Modeling and Error Minimization

Automated microscopy can image specimens larger than the microscope’s field of view (FOV) by stitching overlapping image tiles. It also enables time-lapse studies of entire cell cultures in multiple imaging modalities. We created MIST (Microscopy Image Stitching Tool) for rapid and accurate stitching of large 2D time-lapse mosaics. MIST estimates the mechanical stage model parameters (actuator backlash, and stage repeatability ‘r’) from computed pairwise translations and then minimizes stitching errors by optimizing the translations within a (4r)2 square area. MIST has a performance-oriented implementation utilizing multicore hybrid CPU/GPU computing resources, which can process terabytes of time-lapse multi-channel mosaics 15 to 100 times faster than existing tools. We created 15 reference datasets to quantify MIST’s stitching accuracy. The datasets consist of three preparations of stem cell colonies seeded at low density and imaged with varying overlap (10 to 50%). The location and size of 1150 colonies are measured to quantify stitching accuracy. MIST generated stitched images with an average centroid distance error that is less than 2% of a FOV. The sources of these errors include mechanical uncertainties, specimen photobleaching, segmentation, and stitching inaccuracies. MIST produced higher stitching accuracy than three open-source tools. MIST is available in ImageJ at isg.nist.gov.


Introduction
This document describes the technical detail about MIST algorithm and implementation.
Section 1 is about the evaluation methodology. It has seven subsections that describe image segmentation, the creation of the reference dataset and test datasets, and the evaluation of a stitching result based on the selected metrics. Section 2 describes MIST implementation and performance measurement. Section 7 is about stitched image reconstruction from single tiles after translation optimization.
1 Evaluation Methodology

Overview
We evaluate the accuracy of an image stitching algorithm in three steps: (1) generate reference data, (2) create test dataset and run the stitching algorithm, and (3) compute the error metrics. The reference dataset generation is done in five steps: (1) find relevant regions of interest that fit within one field of view in the experiment, (2) automatically or manually locate them on the microscope stage, (3) acquire high resolution images of these regions, (4) compute their reference areas, and (5) save their corresponding reference locations.

Reference Image Content
The reference and test datasets consist of three plates of stem cell colonies fixed on days 2, 3, and 4 respectively. We imaged each plate at 10x magnification in the fluorescent modality (Cy5 staining).

Image Segmentation
An image segmentation algorithm, shown in Figure 2 is required to measure the reference ROI area's and centroid location's. To simplify this requirement, we selected the exposure time to produce high contrast images where the manually selected segmentation threshold of 500 intensity units correctly segments an image. Additionally, we remove objects smaller than 2-3 individual cells by requiring all foreground objects be at least 2000 pixels in area.

Generate Reference Dataset
We generated reference stitching datasets on a calibrated and motorized Olympus microscope stage controlled by MicroManager. MicroManager enabled us to convert the open loop stage controller system into a closed loop using image processing. The reference stitching data consists of high accuracy ROI location and area metadata. To accurately measure the area, each ROI must fit within a single microscope Field of View (FOV). To accurately measure the ROI location, we control the microscope stage via image analysis to place each ROI's centroid in the middle of the acquired reference image. By centering the ROI, we can measure its stage location to within the mechanical stage repeatability. By placing the ROI completely within the FOV, we can measure its area to within the segmentation accuracy. Both measurements are independent of stitching. See Section 1.4.1 for more details on our closed loop stage feedback system.
Once each ROI is centered within the FOV, we acquire a reference image of the colony. Each reference data point consists of the centered colony image, its stage position , and its area. One of the three resulting reference datasets is shown in Figure 3. The colony centering algorithm is part of the closed loop microscope control software written for this evaluation. The centering algorithm moves the microscope stage to place the centroid of a colony in the middle of the image FOV. If there is more than one colony in the FOV, the colony with the centroid closest (as defined by Euclidean distance) to the middle of the FOV will be centered.
The centering algorithm acquires an image and saves the current stage location as read by the stage controller. This image is segmented and object centroids are computed with respect to the FOV. The object with the centroid closest to the middle of the FOV is selected, all other objects are deleted from the foreground mask. The distance (in pixels) between the current colony centroid and the middle of the FOV is computed. That distance is converted to micrometers with the user supplied pixel to micrometer conversion ratio (e.g., 0.658 pixels per micrometer for this experiment). The centering algorithm translates the current stage position to the new position by adding the distance and moves the stage to the new position. The algorithm repeats until the position is stable ( < 1 ) or a maximum number of iterations is reached (10 iterations). The colony centering typically converges in 1 to 3 iterations. The colony centering steps are shown in Figure 4. The algorithm performs iterative refinement to handle colonies that start partially out of the FOV and to correct calibration errors in the stage, pixel to micrometer ratio, and camera angle. For example, if the camera coordinate system is rotated with respect to the stage coordinate system the computed translation delta will not produce the expected colony movement within the FOV. Figure 5 demonstrates the effect of a 10 degree camera rotation.  (Table 1). Each set of image tiles covers the same region on the plate as the reference measurements. Stitching algorithm performance is evaluated by comparing colony measurements (count, centroid location and area) derived from the stitched image with the reference measurements. There are two major steps involved: (1) extract single stitched colonies and compute their relative locations and size, and (2) match colonies between the reference dataset and the test datasets.

Colony Extraction
After running the stitching algorithms on the test datasets we segment the stitched image. However, to account for sample photobleaching, we used an optimized threshold per dataset (Day and Overlap) to match the number of segmented colonies with the relevant reference dataset ( Table 2). The thresholds were selected by minimizing the false positive and false negative counts that occur during colony matching. Since colony matching requires a stitched image, the naïve (stage coordinate based) stitching was used. Each segmented colony is cropped out of the stitched image using its bounding box. The extracted colony images are then adjusted to move the colony centroid to the middle of the extracted image. This converts the stitched image into a series of individual images each containing a single colony with its relative location in the stitching image. Figure 6 summarizes this process, showing the conversion of the stitched image into a set of individual colony images.

Colony Matching
We find an optimal match between the reference colonies and the colonies coming from the stitched image by computing a similarity matrix based on the Normalized Cross Correlation ( ) in the range of -1 to 1 where a higher score correspond to better matching. The Hungarian algorithm [1] is then used to find an optimal matching between the reference and stitched colonies. Figure 7 summarizes the matching problem. It is important to note that the centroid locations computed from the stitched image must be adjusted for rotation and translation due to the camera angle and the difference in location values between the absolute positions read on the microscope controller and the relative positions computed from the stitched image. We use the Kabsch algorithm [2] to compute the optimal translation matrix that minimizes the root mean squared deviation between the two sets as shown in Figure 8. However, we ignore the rotation component because the reference colony coordinates have already been rotated to match the camera angle (e.g., 0.0036 radians) as measured at the time of acquisition.

Stitching Performance Metrics
The stitching performance is assessed by computing the following 4 metrics; false positive, false negative, distance error, and size error. The output of a stitching method might result in adding some colonies by duplication or deleting some colonies by misaligning tiles. The result is a different number of reference and stitched colonies. Therefore, we label the number of reference colonies that are missing in the stitched image as False Negative (FN) and the number of colonies added in the stitched image but that do not exist in the reference dataset as False Positive (FP). The centroid distance error ( ) is a global stitching accuracy measure and the colony size error ( ) is a local stitching accuracy measure. The mathematical formulas for the two performance metrics are: where is the centroid distance error metric, ( , ) is the measured centroid location of reference colony and ( , ) is its centroid location as computed from the stitched image. is the size error metric; is the measured area in pixels of reference colony and is the computed area from the stitched image and is the number of reference colonies.
The size error ( ) is predominantly affected by segmentation and not stitching. For each Day, the images were acquired in the following order: 10 %, 20 %, 30 %, 40 %, 50 %, and Reference. Photobleaching accumulates with repeated light exposure, so the reference images have the lowest fluorescent expression, as shown in Figure 9. The threshold selected for the Reference images results in additional pixels being considered foreground for the 10-50 % overlap images. This effect was minimized by dynamically selecting the threshold to minimize colony mismatch (see Section 1.6.1).
To quantify the error related to non-stitching factors, we computed on a subset of stitching evaluation colonies known to be unaffected by stitching. We used only the stitching evaluation dataset colonies which fit completely within a single FOV (image tile) and are larger than 2100 pixels. The size threshold is 5 % higher than the regular minimum colony size defined in Section 1.3. This avoids small colonies that could have been pushed below the size threshold by photobleaching. Furthermore, during the matching phase (which relies on normalized cross correlation) only matches that are well correlated (correlation >0.9) were considered. Table 3 shows the resulting . These values are similar to the results computed after stitching (presented in the main manuscript), they demonstrate that most of the size error comes from non-stitching sources (e.g., photobleaching). Figure 9 shows the change in colony foreground intensity due to photobleaching. 1±7 0±7 0±8 0±8 Figure 9: Change in colony intensity due to photobleaching with the sum intensity annotated.

Stitched Image Heat Maps
To visualize the spatial distribution of the , heat-maps were generated where the colony mask was colored based on its distance error. Figure 10 shows the stitched image distance error heat-maps for the Day4 10% overlap dataset. Most colonies are dark blue indicating a distance error of less than 10 µm. As the colonies get farther from the center of the image their distance error generally increases, showing up as cyan or green. All runtimes discussed here were generated using the following evaluation system. The reported runtimes are, unless otherwise specified, the average of 10 consecutive runs discarding times below the 10 th percentile and above the 90 th percentile to reduce potential measurement variability.

Demonstration Datasets
The following datasets were used to demonstrate stitching on a variety of image types and content. Table 4 shows the image grid size, overlap between adjacent images, and resulting stitched image size from the examples shown in Figure 2 in the main document. A subset of these cropped zoomed example images are shown below to demonstrate the scale of the full stitched images: the A10 cells in Figure 11, the Carbon Nanotubes in Figure 12, and the Rat Brain cells in Figure 13.

Application Details
MIST was initially prototyped, designed, and validated in MATLAB. Once the MATLAB prototype was stable, MIST was ported to Java as an ImageJ/Fiji plugin for ease of use. By releasing MIST as a plugin within the ImageJ/Fiji platform it will likely see wider use than releasing a standalone tool. The ImageJ/Fiji implementation of MIST runs on the CPU and/or GPU.
There are three variants (execution types) of the MIST implementation: 1) Java: the pure Java implementation leverages the mines.edu toolkit [3] for 32 bit Fourier transforms during the translation computation step. 2) FFTW [4]: replaces the slower Java FFTs with compiled native libraries in order to perform 32 or 64 bit Fourier transforms during the pairwise translation computation step. 3) CUDA[5]- [8]: uses one or more GPUs on a system to perform the computationally intensive parts of the stitching algorithm for large datasets.
There are three major attributes that impact stitching runtime: (1) system memory, (2) number of CPU threads/number of cores, and (3) image storage location (network drive, hard drive, SSD, etc.). MIST is cognizant of the available system memory and scales the number of compute threads accordingly, which affects the amount of work that can be done in parallel. The performance of these threads is impacted by the time required to stream the input images to the computing hardware (i.e. reading from disk). This problem is further complicated when streaming to a GPU.
Our paper, Blattner et al [9], describes in detail the performance oriented software architecture including threading, memory management, application state management, and accelerators.
Please see our wiki for more details about the application and its use https://github.com/usnistgov/MIST/wiki.

Time-Lapse Application Dataset Performance
A primary motivation for developing MIST was the lack of stitching tools able to handle feature sparse image data. As the quantity of identifying features in the overlapping image area increases, the stitching task becomes easier as there is more information available for image stitching. Feature sparse image grids occur in the early time-points of long time-lapse image sequences of growing stem cells to give colonies room to grow.
To demonstrate the impact of this feature sparsity, we use the Stem Cell Replicate 1 time-lapse experiment imaged for a period of five days (available at https://isg.nist.gov/deepzoomweb/data/stemcellpluripotency). Each image grid in Replicate 1 consists of 18 x 22 phase-contrast images at 10% overlap. We stitched this dataset using all four methods. A full combinatorial exploration of the FijiIS parameter space was performed to determine that only the regression threshold (rT) affects the stitched image quality for this dataset. The default value of rT does not produce a viable stitched image for any time-point. As the rT value is lowered, FijiIS requires more runtime but produces more accurate stitched images. "If the regression threshold between two images after the individual stitching are below that number [regression threshold] they are assumed to be non-overlapping" (see https://imagej.net/Image_Stitching under the "regression threshold" paragraph of the "Grid/Collection Stitching" section for more details about the FijiIS parameters). The rT parameter controls when two images are estimated to be non-overlapping. We speculate that by lowering the rT parameter, fewer pairwise translations are immediately discarded, thereby allowing the FijiIS translation optimization to eventually converge to a satisfactory answer. The general correctness of the stitched images is quantified using , the average distance between each image tile's position in the stitched image and its position in an ideal image grid of exactly 10% overlap rotated to match the measured camera angle of 0.011 radians. To compensate for the lack of a shared coordinate system, the image tile locations are translated to minimize the least squares distance between the two sets of points. The average image tile and required runtime are shown in Figure 4 in the main document for MIST, TeraStitcher, iStitch, and FijiIS.
As the time-lapse image sequence progresses and the images become more feature rich, the FijiIS stitching results improve in accuracy while requiring less runtime. Given that a microscope's Field of View (FOV) is 1040 x 1392 pixels, any above 10 2 is roughly equivalent to 10% of a FOV and any above 10 3 is roughly equivalent to a full FOV. To link to a visual evaluation of the stitched images, three time points (10, 80, and 150 or the nearest time point with a stitched image) are shown in Figure 3 in the main document. To ensure that each image shares the same pixel size (scale) they have been padded to match the expected dimensions given the number of image tiles and the approximate overlap. With a ≈ 10 2 gaps can appear in the middle of the stitched image (main document Figure 3: Time Point 10 -FijiIS rT=0.01). These gaps were deemed unacceptable for quantifying the fluorescence expression of cell colonies. Despite FijiIS producing viable results ( = 32 pixels) for feature rich time points, the lack of reliability with feature sparse images and the runtime required (500-3000s per time point) makes FijiIS impractical for this type of application.
MIST was created to address these two concerns; (1) reliably stitching feature sparse microscopy images and (2) reducing compute time. If the stitching runtime is reduced to interactive rates, the user can efficiently interact with their data. Stitching the 161 StemCell-Replicate1 time points requires 73.7 hours with FijiIS at a rT of 0.01. MIST requires just 36.6 minutes using one GPU or 67.2 minutes using the two high-end Intel Xeon CPUs on our test system.

MIST Stitching Limitations
MIST has three noteworthy limitations.

A Regular Grid is Required
MIST was designed to stitch microscopy images acquired using a mechanical stage which can move the sample in a repeatable pattern. More specifically, MIST expects the overlap between images to be approximately constant per direction (vertical and horizontal). For example, an image grid with an overlap of 10% ± 1% in the vertical direction and 13% ± 2% in the horizontal direction meets this requirement. Overlap uncertainties above 3% (the estimated actuator backlash) are generally unreliable.
More generally, if a collection of images to be stitched does not fit into our Stage Model (See Supplementary Document Section 4) MIST will not be able to correctly stitch the images.

Limited to 2D+time+channel Image Sequences
MIST is designed to perform 2D time-sequence multichannel stitching. It cannot perform volumetric stitching (3 spatial dimensions). However, MIST can stitch a numbered series of independent image grids. The nominal use case is stitching a time-series of grids. For this to work the images must be named so that the time slice is denoted as a number within the image filename. For example, the row-column filename pattern "img_<rrr>_<ccc>_<ttt>.tif" where the "<ttt>" is replaced with the current time slice number. This allows MIST to iterate over the time points, stitching each grid individually. This functionality can also be used to iteratively stitch Z slices of an image volume; however, this will treat each z-slice independently and no registration is performed between slices that are adjacent in the z-stack.
On the other hand, datasets that can use the same image positions, such as stitching multiple channels, can be stitched by MIST. Unlike time series, channels do not necessarily have an easy numerical representation which makes iterating over them difficult. Therefore, MIST was designed to allow the user to stitch a single channel and then apply those image positions to any other collection of images with the same grid dimensions. This functionality was used for our StemCell-Replica1 dataset where all 161 time points were stitched in the phase-contrast channel before assembling the fluorescent channel using the same image positions.

MIST uses normalized cross correlation (
) as the metric of image similarity when performing registration. To refine translations between images, MIST performs a constrained hill climbing [10] with as the cost function. Hill climbing searches the local neighborhood (valid translations 4-connected to the current translation) and selects the translation with the highest correlation. The algorithm terminates when a local maxima is found. Please see the technical algorithm documentation on the wiki (https://github.com/usnistgov/MIST/wiki) for more details about the translation refinement algorithm. However, in the presence of high noise levels this surface has many local maxima, which can cause the hill climbing algorithm to converge on an incorrect local maximum, rather than the global maximum. A preprocessing step may be required to reduce the effects of noise. Figure 14 shows the hill climbing refinement for three pairs of images with different levels of noise. The FIB-SEM image data is used to demonstrate the effect of noise on an application dataset. These images have a very low signal to noise ratio, contain prohibitive levels of image noise, and are very feature sparse. An example image is shown in Figure 15. The red boxed region shown in Figure 15 belongs to two consecutive images on the horizontal direction with 10% overlap. Figure 16 demonstrates the smoothing effects on the surface, computed between the left sub-region and the right sub-region, using different Gaussian filter sizes. Preprocessing the images helps reduce noise levels while smoothing the ncc surface, providing the hill climbing a better opportunity to find the correct peak. Figure 16 (a) shows a sub-region of both overlapping images as well as the surface. The surface is very noisy and finding the correct translation peak buried within this noise is difficult with hill climbing. The same procedure is repeated, except each image is filtered using two Gaussian filters: (b) = 1 and (c) = 2. The smoothing reveals a dominant peak, which hill climbing will converge to. Eventually, a large enough sigma for the Gaussian filter will blur out too many details and the stitching accuracy will degrade. For this dataset, a Gaussian filter with a sigma of 1 was sufficient to enable stitching.

XY-Stage Actuator Movement Model
A motorized mechanical XY-Stage moves a biological sample with respect to the microscope's optical imaging system. This movement is carried out by two independent stepper motor linear actuators, one per direction. Therefore, it is important to understand the mechanical limitations of a stepper motor linear actuator mainly with regards to its accuracy and repeatability.
Accuracy: Actuator motion can only be as accurate as the mechanical components permit. The accuracy is the actuator's ability to provide precise increments of motion along its axis. In a mechanical system, this primarily concerns the lead screw as well as the feedback device (encoder, linear scale, etc.). Lead accuracy of the screw, resolution of encoders, and ability of the controller must combine to produce the desired accuracy. In most microscopes, the accuracy is very high and can be calibrated.

Repeatability:
Repeatability is the device's ability to reach a specific location repeatedly. It does not consider the accuracy of the position but rather the ability to go back to the same position. Many times, the actuator will follow a slightly bowed or twisted path due to imperfect construction. The repeatability is the measure of uncertainty relative to a given actuator movement. The repeatability cannot be calibrated, but it can be measured for any microscope stage.
In a grid tiling, the positions (x, y), that the stage will go to, have an uncertainty equal to the stage repeatability (x ± r , y ± r ). However, translations (dx, dy) computed in the vertical or horizontal directions between consecutive tiles are differences between respective positions. Thus, the uncertainty on the computed translation values is 2×repeatability (dx ± 2r , dy ± 2r ). The actuators are independent from one another. However, since they are very similar and often from the same manufacturer, we will assume that r = r = r.
When computing vertical translations between two consecutive tiles, the dx and dy components will correspond to the projection of the stage displacement on the camera coordinate systems (which may have a fixed rotational angle) with an uncertainty equal to 2×repeatability. We use this information to estimate the stage repeatability from the computed translations. The image overlap is estimated per direction by fitting a model to the translations using maximum likelihood estimation. The model expects that the translations can be split into two subsets; a set of normally distributed valid translations and a uniformly distributed set of invalid translations. This model is described by three parameters: the mean (µ) and standard deviation (σ) of the normal distribution, and the probability of a translation belonging to the uniform distribution (p). The likelihood of any given model ( , , ) for a set of translations ( 1 , 2 , … , ) is given by the following equation: where the range of possible translations for that direction is given by . By finding the model which maximizes this likelihood, an estimation of the overlap between images can be extracted from the average valid translation.
The stage repeatability is estimated by heuristically filtering the translations per direction and looking at the range of valid translation values. For each translation direction, the x & y repeatability metrics are computed independently. The maximum of all the computed repeatability values is used to constrain the hill climbing translation optimization.
Please see the technical algorithm documentation on the wiki (https://github.com/usnistgov/MIST/wiki) for the specific implementation details of the estimation of the stage model parameters. 6 Tilt and plate movement Assessment We analyzed the tilt and the displacement of the plate as the stage moved as well as when a biologist must take out the plate for feeding living cells every 24 h. During this live cell experiment, the biologist had to physically take out the plate and change the solution with added nutrients, place the plate back on the stage and re-center the plate every 24 h.
To quantify the effect of the tilt and plate movement, we tested the procedure using a plate with added fiduciary markers ('X' and dots placed on the plate itself). We imaged the plate and computed the centroid locations of all fiduciary markers and some cell colonies. We then followed the procedure of feeding the cells (placed the plate back on the stage and centered the stage using the microscope controller). We then visited all previously registered fiduciary markers and cell colonies centroid locations. They were all within the repeatability limit of the microscope.

Image Composition
To perform mosaic image composition, each adjacent pair of images must have a relative translation (displacement). These translations form an over-constrained system that one can represent as a directed acyclic graph where vertices are images and edges are weighted based on the correlation between adjacent images. The over-constraint in the system is due to the equivalence between absolute displacements of images and path summations in the graph; these summations must be path invariant to yield a well-formed image. In general, this phase resolves the over-constraint in the system and computes absolute displacements. This is achieved by selecting a subset of the relative displacements [11]- [13] or applying a global optimization approach to adjust them to a path invariant state [14]. These absolute displacements are used to compose the stitched mosaic image.
Each image (vertex) has translations relative to (up to) 4 neighbors. Some of these translations may be conflicting. MIST resolves this over-constraint conflict by selecting a subset of the relative displacements resulting in a path invariant and well-formed image. This is achieved using a maximum spanning tree with the translation normalized cross correlation values as the undirected graph edge weights. Edges are selected to maximize the sum of all selected edge weights such that each image tile is connected once and only once. We use Prim's weighted maximum spanning tree algorithm to perform this selection [15]. The edge weights from valid translations are promoted with a constant value to ensure they are always used before a non-valid (estimated) translation when determining the spanning tree. A maximum spanning tree is used to maximize the set of edge weights used in assembling the tree because, the higher the edge weight, the higher the confidence in the translation associated with the particular edge. With the absolute displacements computed, each image tile in the Image Grid has a global (x, y) location in the stitched mosaic image. The stitched image is built by copying the individual image tiles into position within the stitched image and blending the image tiles together. Three types of blending are implemented: overlay blending places the image with the higher correlation on top, average blending averages all overlapping pixels together, and linear blending weights the pixels based on the distance to the edge of the component image tile [14].