Abstract
We introduce the Interaction Factor (IF), a measure for quantifying the interaction of molecular clusters in superresolution microscopy images. The IF is robust in the sense that it is independent of cluster density, and it only depends on the extent of the pairwise interaction between different types of molecular clusters in the image. The IF for a single or a collection of images is estimated by first using stochastic modelling where the locations of clusters in the images are repeatedly randomized to estimate the distribution of the overlaps between the clusters in the absence of interaction (IF = 0). Second, an analytical form of the relationship between IF and the overlap (which has the random overlap as its only parameter) is used to estimate the IF for the experimentally observed overlap. The advantage of IF compared to conventional methods to quantify interaction in microscopy images is that it is insensitive to changing cluster density and is an absolute measure of interaction, making the interpretation of experiments easier. We validate the IF method by using both simulated and experimental data and provide an ImageJ plugin for determining the IF of an image.
Introduction
A fundamental question that many fluorescence microscopy experiments are trying to answer is whether or not the molecules under study interact and how this interaction changes under varying experimental conditions^{1}. With the development of superresolution (SR) fluorescence microscopy techniques, it is now possible to study biological samples at the subdiffraction level, allowing researchers to observe molecules and their interactions at the tens of nanometers scale^{2,3,4,5,6}. Since interactions are not directly observed, the spatial overlap, or colocalization between molecules is used as a surrogate for interaction. The colocalization observed at the scale provided by SR is more likely to represent true interaction, creating a need for improved methods of analysis. However, colocalization can occur at random and change with molecule density^{7}. For example, differing experimental conditions can cause an increase in density and thus an increase in colocalization, while the interaction between molecules does not change. Here, we introduce a measure that takes into account this randomness and is not affected by changes in density.
In order to measure colocalization, there are generally two types of methods: intensitybased and objectbased methods. Intensitybased methods focus on the correlation of pixel intensity levels in the color channels of the image, which correspond to the fluorescently labeled molecules from the experiment^{1,8,9}. These methods can be affected by noise^{10} and therefore rely on the correct subtraction of background pixel intensities. In addition, an increase of molecular density can cause an increase in the value of these measures^{1}. They can also be difficult to assess for statistical significance when compared to randomized images since it is difficult to recreate the autocorrelation of pixels present in an experimental image^{1,7,9,11}.
Objectbased methods identify objects in an image in order to quantify colocalization^{8,9,12}. Image segmentation techniques^{8,13,14,15,16,17} can be used to delineate objects or alternatively, each object can be represented by a point, such as the centroid. When the entire object is outlined, measures of object overlap can be used for analysis and randomized images produced for testing statistical significance^{13,14}. In addition, the degree of colocalization can be further quantified by comparing the distributions of overlap measurements from experimental data to that of simulations that model an increased probability of attraction^{18,19}. If objects are represented by points/coordinates in an image, spatial point process analysis tools such as the nearestneighbor distance^{20,21}, and the crosscorrelation function can be utilized^{22,23,24}. Statistical significance is then calculated by distinguishing values of these second order statistics from the null hypothesis that points are randomly distributed^{9,12,25}. These methods can be directly applied to the results of single molecule localization microscopy (SMLM)^{26,27,28} a type of SR microscopy that localizes individual fluorophores and yields particle coordinate lists rather than intensity images as output. However, these coordinatebased methods internally apply radial averaging and therefore do not account for irregularly shaped objects^{29}. In addition, the application of these methods can be time consuming and require a high level of experience in statistical techniques and computer programming^{29}. Additional methods have been recently developed specifically for evaluating colocalization in SMLM data where a measure of colocalization is calculated for each coordinate (based on radial density) and then the aggregated results are evaluated graphically^{30,31}.
We have developed a colocalization measure called the Interaction Factor (IF), which is based upon measuring the amount of overlap between segmented objects, i.e. clusters of molecules, in an image. It is a probability estimate between 0 and 1, where 0 indicates that the colocalization observed is due to random occurrence and 1 indicates that all objects are colocalizing. This new measure addresses many of the drawbacks with other methods: it makes a comparison to realistic random images, it is insensitive to cluster density, it is easy to use and fast to calculate, and it provides an absolute rather than a relative measure of interaction. We provide both an ImageJ plugin and a python package that implement the IF calculation.
Results
Description of the algorithm
The input to the IF algorithm is a twocolor fluorescence microscopy image with segmented objects corresponding to the molecular clusters in the image, and optionally a corresponding region of interest (ROI) (Fig. 1a(i)–(ii)). Firstly, a series of images are simulated by random placement of both sets of clusters within the ROI. Then, the frequency that each cluster of one color (the reference color) colocalizes (overlaps with at least one pixel) with any cluster of the other color is averaged over the total number of simulations. This estimates the probability of random colocalization for clusters of the reference color. (Fig. 1a(iii)–(iv)). Finally, the Interaction Factor (IF) for the image is calculated from Equation 1, which describes the percentage of overlapping clusters of the reference color as a function of the IF. This equation can be derived from the algorithm described in Supplementary Figure 1, which provides a method for placing clusters in an image where the probability of overlap is greater than that of random placement. The input to this algorithm is the IF, which determines the increased probability of overlap (see Online Methods for derivation):
where P = fraction of overlapping clusters of reference color, f = Interaction Factor (IF), P _{ k }(0) = probability of overlap of the kth cluster in the randomized simulations and n = total number of clusters.
The IF can be calculated in the context of either color and therefore we specify the reference color used for the probability calculation, e.g. redgreen (RG) IF indicates that the probability is calculated using green as the reference color. The IF ranges from 0 to 1, where 0 indicates colocalization merely due to random occurrence and 1 indicates complete colocalization (all clusters colocalized). Figure 1b shows an SR image from experimental data of a cell nucleus with two fluorescently labeled proteins, red and green. Two scenarios with randomized (IF = 0, Fig. 1c) and highly correlated (RG IF = 0.9, Fig. 1d) placements of the two proteins are also shown, generated by the algorithm referenced above (Supplementary Fig. 1 and Methods).
An alternative method to estimate the IF of the original image is to create groups of simulated images at IF levels ranging in steps from 0 to 1, and then find the IF that produces simulated images where the mean percentage of overlap of the reference color is closest to the percentage of overlap of the original image. For the experimental image in Fig. 1(b), Fig. 1e(i) shows the percentage of green colocalizing clusters for groups of simulated images (20 simulations per IF) plotted as a function of the RG IF. The yellow curve is its analytical prediction (Eq. 1). The curve follows the data points of the simulations, confirming the derivation of Equation 1. When utilizing the IF to distinguish the degree of colocalization in an image from that of random occurrence, the diagram shown in Fig. 1e(ii) can be used as a general guide.
Validation and comparison to other methods
In order to determine if the RG IF is affected by varying red cluster density, groups of simulated images with different numbers and sizes of red clusters were produced at RG IF = 0 (random), 0.25, 0.50, 0.75, 0.90 and 0.95 using the algorithmic procedure described in the Online Methods/Supplementary Fig. 1. The number of red and green clusters drawn in the simulated images (100) was taken from the average cluster counts of an experimental data set and the number of red clusters was then varied from 25 to 200. The size distributions of the clusters were based on measurements of segmented clusters from the experimental image shown in Fig. 1b. These clusters were estimated as ellipses and drawn over the same ROI as shown for the experimental image and the sizes of red clusters were then varied by multiplying the axes of the ellipses by a factor of 0.5, 1.0, and 1.5 (while keeping the number of clusters the same). Both red cluster number and size were varied to cover a larger range of density levels than was observed experimentally.
The left panel of Fig. 2a shows example simulated images with 25, 100, and 200 red clusters at IF = 0 (green cluster number = 100, all groups). The right panel of Fig. 2a shows example simulated images where the original cluster sizes were reduced by 50%, kept the same size (100%) and increased by 150% at RG IF = 0 (green cluster size unchanged and red/green cluster number = 100, all groups). Figure 2b(i) shows results of RG IF calculations for images simulated at RG IF = 0 while varying red cluster number. In the left panel, both predicted RG IF (left yaxis) and the percentage overlap of green clusters (right yaxis) are plotted. As the number of red clusters increases the percentage of overlapping green clusters increases (KruskalWallis H test, all groups: p < 0.0001), however there are no significant differences in calculated RG IF for all groups (25 clusters: n = 20, 0.14 ± 0.17 (SD); 200 clusters: n = 20, 0.04 ± 0.08 (SD); Oneway ANOVA, all groups: p = 0.53). A similar result is shown for RG IF = 0.90 (25 clusters: n = 20, 0.90 ± 0.02 (SD); 200 clusters: n = 20, 0.91 ± 0.02; Oneway ANOVA, all groups: p = 0.25; Fig. 2b(ii)). The right panels of Fig. 2b(i),(ii) show a more detailed view of each group of simulations. Results of varying the size of red clusters are shown in Fig. 2b(iii) for RG IF = 0. Similar to the results of varying red cluster number, while the percentage of green overlapping clusters increases (Oneway ANOVA, all groups: p < 0.0001), the calculated RG IF accurately measures the simulated RG IF for groups of images with different red cluster sizes, with no significant difference among groups (50% of size: n = 20, 0.10 ± 0.18 (SD); 150% of size: n = 20, 0.10 ± 0.13 (SD); Oneway ANOVA, all groups: p = 1.0). Simulations at RG IF = 0.90 also show that it is independent of red cluster size at this level (50% of size: n = 20, 0.89 ± 0.03 (SD); 150% of size: n = 20, 0.90 ± 0.02 (SD); KruskalWallis H test, all groups: p = 0.30; Fig. 2b(iv)). These results illustrate that the IF is not affected by the density of clusters in the image.
Additional simulations were performed where green cluster density was varied (Supplementary Fig. 2) and both red and green cluster densities were varied simultaneously (Supplementary Fig. 3) and we found no dependence in the ranges tested between cluster density and IF. In addition to percentage of overlap, other common measures of colocalization in fluorescence microscopy, including the Manders’ coefficients^{32} and the Pearson’s coefficient also change with cluster density level (Supplementary Fig. 4).
Simulations at all RG IF values measured from 0 to 0.95 are shown in Fig. 2c, where the coefficient of determination (R ^{2}) of a simple linear regression model (calculated RG IF (y) = simulated RG IF (x)) is shown for images with varying red cluster numbers. The R ^{2} provides a summative measure of the accuracy of the calculated RG IF for all simulated RG IF values at each increment of cluster number. Simulations were also performed while varying red cluster size at all RG IF values. Figure 2d shows the effects of varying both size and number of red clusters on the accuracy of the RG IF calculation. The R ^{2} is shown for each combination of size and number, calculated using the measurements for all images in each group (left panel) and the mean of each group (right panel). Note the low accuracy of the RG IF calculation in the upper left corner of the heat maps in Figure 2d (R ^{2} = 0.56, left panel and R ^{2} = 0.84, right panel) for sparse images (low cluster density).
Precision and statistical significance
Since the IF is based on the probabilities of cluster overlap from a set of random simulations, the IF value can vary with repeated runs of the calculation for a single image. To illustrate the degree of variation that can be expected, we generated simulations over a range of IFs and then repeated the IF calculation 20 times for each simulated image. The range of variation in the calculated IF for a random image was 0.10, while the range of variation in the calculated IF for an image at IF = 0.94 was 0.01 (higher variation occurred in IFs closer to random), Supplementary Table 1.
While calculating the IF of a single image can be informative for preliminary work, it is recommended to draw conclusions based on a larger data set. In addition to data set size, the ability to distinguish between data sets is also dependent upon the IF value: the calculation is more precise for images with higher IFs. The IF curve (Fig. 1d) is relatively flat initially (near random levels of overlap), followed by a steeper section as the IF increases. Therefore, the IFs of images closer to random colocalization levels tend to have higher variance and are more difficult to distinguish: Fig. 1e, blue line at a low green percentage of overlap (near random) intersects data points over a large range of xvalues (simulated RG IFs). Higher IFs have less variance and are easier to distinguish: Fig. 1e, red line at a high green percentage of overlap intersects data points in a narrower range of xvalues. Since the range of IFs that may give rise to a low level of overlap is larger, the IF for this type of data set can be better estimated by increasing the number of images measured.
In order to determine how the calculation of statistical significance is affected by the IF value for different data set sizes, we generated groups of images to simulate experimental data sets at a range of IF values where n = 5, 10 or 20 images per group. The RG IF was calculated for each simulated image and a Student’s ttest was performed between groups of images to obtain the associated pvalues. To reduce the potential of bias from using a single experimental image to make the simulations, the comparison was repeated using identified clusters from 20 different images from an experimental data set, and the average pvalue was calculated. The results of this comparison are shown in Fig. 3. The data sets that showed no significant difference are colored dark purple; note the larger proportion of this color on the heat map for smaller sized data sets, but also note the extent of the dark purple for groups with low RG IF as opposed to groups with high RG IF (left side of heat maps vs. right side of heat maps). Figure 3 can be used as a general guide for the number of images required for significance when gathering data, based on the expected or preliminary IF level.
Analysis of the interaction of DNA damage repair proteins
Following the development of SR microscopy, researchers have successfully devised assays to use SMLM to probe the spatial and temporal recruitment of DNA damage repair proteins to double stranded breaks (DSBs)^{33}. We believe the IF approach could bring new insights to the proteinprotein interactions involved in key aspects of both nonhomologous end joining (NHEJ) and homologous recombination (HR), the principal DSB repair pathways^{34}. We tested the IF approach in three experimental datasets that looked at damaged and undamaged cells and included proteins believed to be involved in HR or NHEJ^{34}. For datasets 1 and 2 (Fig. 4a, left panel), U2OS cells were synchronized postmitosis, released into G1 and treated with 2.5 ng/mL neocarzinostatin (NCS) (DNAdamaged) or normal media (control) for 30 minutes, cells were fixed and immunolabeled for phosphorylated DNAPKcs (pDNAPKcs) and DNA Ligase IV (LigIV) (Fig. 4b) or phosphorylated ATM kinase (pATM) and LigIV (Fig. 4c). For experimental dataset 3 (Fig. 4a, right panel), U2OS cells were synchronized postmitosis, released for 26 hours to establish a predominantly G2phase cell population and treated with NCS (DNAdamaged) or normal media (control) for 30 minutes. After 30 minutes of recovery, cells were fixed and immunolabeled for BRCA1 and BRIP1 (Fig. 4d).
pDNAPKcs and LigIV
For experimental dataset 1, a Welch’s ttest showed that the area of overlap between pDNAPKcs and LigIV was significantly greater after DNAdamage compared to the control group (control: n = 21, 0.05 ± 0.01 μm ^{2} (SEM); DNAdamaged: n = 29, 0.69 ± 0.05 μm ^{2} (SEM); Welch’s ttest, p < 0.0001; Fig. 4b,ii). Similarly, the number of overlaps between pDNAPKcs and LigIV clusters was significantly greater after DNAdamage compared to control group (controls: 2.86 ± 0.41 (SEM); DNAdamaged: 23.03 ± 1.34 (SEM); Welch’s ttest, p < 0.0001; Fig. 4b,iii). To determine if the proportion of LigIV clusters that overlapped with pDNAPKcs changed after NCS treatment, we calculated the percentage of clusters that overlapped with pDNAPKcs and found a significant increase in the DNAdamaged group (control: 4.85 ± 0.73 (SEM); DNAdamaged: 53.30 ± 2.62 (SEM); Welch’s ttest, p < 0.0001; Fig. 4b,iv).
These data could be interpreted as both proteins being closer in space caused by an increase in proteinprotein interaction. It is also possible that the increase in these measurements is simply due to an increase in cluster area or number. The total area covered by LigIV and pDNAPKcs was greater after DNAdamage compared to control group (Supplementary Fig. 5a) raising the possibility that the observed changes in overlap measurements were caused by an increase in cluster area. The effect of changing cluster number is inconclusive given that while the number of pDNAPKcs clusters increased, the number of LigIV clusters decreased after DNAdamage (Supplementary Fig. 5a).
We calculated the IF based upon LigIV as the reference protein (pDNAPKcsLIgIV IF) and found that the mean IF was greater after DNAdamage compared to control group (control: 0.39 ± 0.06 (SEM); DNAdamaged: 0.96 ± 0.01 (SEM); Welch’s ttest, p < 0.0001; Fig. 4b,v), suggesting an increase in interaction between LigIV and pDNAPKcs after DNA damage. The mean IF of 0.39 ± 0.06 (SEM) suggests that without NCS induced DNA damage there is no or very little interaction between LigIV and pDNAPKcs. Whereas, the mean IF for the DNAdamaged group of 0.96 ± 0.01 (SEM) suggests that there is very high interaction between LigIV and pDNAPKcs after DNA damage. Furthermore, we found that the choice of reference protein (LigIV or pDNAPKcs) had no effect on the results (Supplementary Fig. 6a).
pATM and LigIV
For experimental dataset 2, a Student’s ttest showed that the area of overlaps between pATM and LigIV was significantly greater after DNAdamage compared to control group (control: n = 25, 0.67 ± 0.10 μm ^{2} (SEM); DNAdamaged: n = 13, 1.37 ± 0.14 μm ^{2} (SEM); Student’s ttest, p < 0.0001; Fig. 4c,ii). Similarly, the number of overlaps between pATM and LigIV clusters was significantly greater after DNAdamage compared to control group (controls: 33.04 ± 4.56 (SEM); DNAdamaged: 50.54 ± 5.01 (SEM); Student’s ttest, p = 0.02; Fig. 4c,iii). To determine if the proportion of LigIV clusters that overlapped with pATM changed after NCS treatment, we calculated the percentage of LigIV clusters that overlapped with pATM and found a significant increase in the DNAdamaged group (control: 48.67 ± 3.26 (SEM); DNAdamaged: 74.26 ± 3.51 (SEM); Student’s ttest, p < 0.0001; Fig. 4c,iv). Altogether these data suggest that there is an increase in the interaction between LigIV and pATM, however it is also possible that the increase in overlap measurements is due to an increase in the number or area of clusters. Additional data (Supplementary Fig. 5b) showed no differences between control and DNAdamaged groups in cluster area or number, which would suggest that the observed increase in overlap is due to an increase in proteinprotein interaction.
We calculated the IF based upon LigIV as the reference protein (pATMLigIV IF) and found that the mean IF was greater after DNAdamage compared to control group (control: 0.91 ± 0.02 (SEM); DNAdamaged: 0.98 ± 0.01 (SEM); Welch’s ttest, p = 0.03; Fig. 4c,v). Further examination of the IF measurements show that the mean IF was 0.91 ± 0.02 (SEM) for the control group suggesting that without the NCS DNA damage induction there is high interaction between LigIV and pATM. Whereas, the mean IF for the DNAdamaged group was 0.98 ± 0.01 (SEM) suggesting that the interaction was already present in the absence of DNA damage induction, and increased slightly after DNA damage. As in the previous data set, we found that the choice of reference protein (LigIV or pATM) had no effect on the results (Supplementary Fig. 6b).
BRCA1 and BRIP1
For experimental dataset 3, a Welch’s ttest showed that the area of overlaps between BRCA1 and BRIP1 was significantly greater after DNAdamage compared to control group (control: n = 17, 0.04 ± 0.10 μm ^{2} (SEM); DNAdamaged: n = 18, 0.12 ± 0.02 μm ^{2} (SEM); Welch’s ttest, p < 0.0001; Fig. 4d,ii). Similarly, the number of overlaps between BRCA1 and BRIP1 clusters was significantly greater after DNAdamage compared to control group (controls: 5.71 ± 0.73(SEM); DNAdamaged: 14.06 ± 1.61 (SEM); Welch’s ttest, p < 0.0001; Fig. 4d,iii). To determine if the proportion of BRIP1 clusters that overlap with BRCA1 changed after NCS treatment, we calculated the percentage of BRIP1 clusters that overlapped with BRCA1 and found a significant increase in the DNAdamaged group (control: 8.26 ± 0.78 (SEM); DNAdamaged: 13.33 ± 1.12 (SEM); Student’s ttest, p < 0.0001; Fig. 4d,iv). Altogether, these data suggest that there is an increase in proteinprotein interaction or that the respective clusters increased in number and/or area. Additional data (Supplementary Fig. 5c) showed the DNAdamaged group exhibited an increase in both BRIP1 and BRCA1 cluster area and number raising the possibility that this increase in cluster density caused the observed increase in overlap.
We calculated the IF based upon BRIP1 as the reference protein (BRCA1BRIP1 IF) and found no significant difference in IF between control and DNAdamaged group (control: 0.21 ± 0.05 (SEM); DNAdamaged: 0.17 ± 0.04 (SEM); Student’s ttest, p = 0.49; Fig. 4d,v), further supporting the hypothesis that the increase in cluster density caused the increase in overlaps. Moreover, the low mean IFs in both groups would suggest that the proteins are not interacting before or after DNA induced damage during G2 phase. As in the previous data set, the choice of reference protein (BRIP1 or BRCA1) had no effect on the results (Supplementary Fig. 6c).
Discussion
The Interaction Factor (IF) is calculated by using stochastic simulations to determine the probability for random overlap and deriving the relationship between IF, the experimentally observed overlap and the probability for random overlap (Eq. 1). Using both simulated and experimental data we show that the IF is widely applicable to a large range of situations encountered in superresolution microscopy.
In order to further elucidate the meaning of the IF in a biological context, we compared it to the equilibrium dissociation constant (K_{d}) for receptorligand binding, based on the assumption that the observed colocalization in an image is the result of binding between the two molecules. We calculated in silico K_{d} values for simulated images with increasing numbers of clusters of one color (corresponding to increasing the concentration of the ligand), and then fit a saturation binding model to the results to obtain K_{d} values and compared these with calculated IF values. As expected, there is an inverse relationship between K_{d} and IF (Supplementary Fig. 7). When K_{d} decreases, the probability for molecules to be in a bound state increases, leading to increased colocalization and increased IF.
The inherent variability in the amount of cluster overlap when there is no interaction or only weak interaction causes significant variation in the IF measurement for low IFs (Fig. 1e(i), Supplementary Table 1). While increasing the size of the data set can mitigate this problem somewhat (Fig. 3), we recommend that in general reporting interaction for IF levels below 0.5 is done with caution (Fig. 1e(ii)). In the case where the calculated IF is reported as 0, this implies that the level of colocalization is equal to or less than that found for the random case. We do not attempt to distinguish the IF for the case where the colocalization is less than that found for random in our current model, as we have yet to encounter this situation biologically.
For our simulated data, the accuracy of the IF calculation was lowest when using 25 clusters for both colors, especially when the clusters were small in size. Therefore, we do not recommend using the IF calculation when images contain less than 25 clusters and they are small with respect to the size of the ROI. In addition, it is important that both sets of clusters are distributed uniformly and able to move freely over the ROI. If this is not the case, for example if one set of clusters is bound to a membrane, then the procedure can be modified by allowing these clusters to be left in their original positions while randomly placing only the second set of clusters. An extension that could be added to the IF algorithm would be to substitute the uniform distribution for randomization of clusters within the ROI with an arbitrary shape for the distribution of clusters.
In the simulations produced for the IF calculation, clusters were randomly placed in the ROI but we did not take the additional step of randomly rotating the clusters. While this is straightforward to add to the algorithm, we found that it would not significantly affect the results and therefore was an unnecessary complication (Supplementary Fig. 8).
The IF has the potential to probe the asymmetry of the interaction, i.e. for two interacting clusters of molecules, the IF for one set of clusters with respect to the other can be directional. For example, in the case where only a small subset of one protein interacts with the second protein while the second protein almost always interacts with the first, the IF will be different depending on the choice of the reference protein. This fact is taken into account in the formulation of the IF (Equation 1, Supplementary Fig. 1) and it is recommended to calculate the IF in the context of both sets of clusters to fully describe the meaning of the colocalization in the image. In our ImageJ plugin (Supplementary Fig. 9), both calculations are always reported.
The method for determining colocalization was chosen as the percentage of overlapping clusters where overlap is defined as any amount of pixel overlap between segmented objects. This seemed a valid basis for measuring colocalization in SR images, but we recognize that the optimal measure can be different depending on the application. Our model may be adjusted to consider colocalization in a different manner, for example a particular amount of pixel overlap, the overlap of a cluster centroid with another cluster or a minimum distance of clusters.
The current implementation of IF is for interactions in two dimensions, but it would be straightforward to extend it to three dimensional images by simple implementation changes without any need to change the basic definition of the IF.
Finally, we recognize that the algorithm is not suited for images where molecules cannot be arranged in clusters, and thus other colocalization methods such as crosscorrelation analysis^{22,23}, the CBC method^{30} or intensitybased methods^{1,8,9} are better suited for this type of data.
In summary, we have introduced a new technique to evaluate colocalization measurements in microscopy images, the Interaction Factor (IF). It provides an absolute measure of the interaction between proteins and because it is insensitive to cluster density, it can directly be used for calculating the effective size of changes between experimental conditions. The absolute nature of the IF makes interpretation of colocalization in microscopy images easier by allowing biologists not only to quantify the size of the change in interaction but also to distinguish the case where the change goes from no interaction to high interaction from the case where the interaction is already high but then increases even further. The examples we have shown here are all from dSTORM images but the method is equally applicable to other SR and conventional fluorescence microscopies, or for that matter, to any images with overlapping objects of different types.
Methods
Sample preparation and Immunofluorescence
For experimental data sets 1 and 2 (Fig. 4a, left panel and Fig. 4b and c), U2OS (American Type Culture Collection) cells were cultured routinely in McCoy’s 5 A medium supplemented with 10% fetal bovine serum and penicillin/streptomycin. Cells were synchronized postmitosis by serum starvation for 72 hours, before release into G1 and treatment with 5 ng/mL of neocarzinostatin (NCS) for 30 minutes (DNAdamaged) or normal media (control). For experimental dataset 3 (Fig. 4a, right panel and Fig. 4d), U2OS cells were synchronized by serum starvation for 72 h, released for 26 hours and treated with NCS for 30 minutes (DNAdamaged) or normal media (control).
Cells were then subsequently prepared for imaging via cytoplasmic extraction with cold CSK buffer containing 0.5% Triton X100, washed with DPBS (with calcium and magnesium), and fixed for 15 minutes in 4% paraformaldaheyde in DPBS. Coverslips were blocked with blocking solution [20 mg/mL BSA, 0.2% gelatin, 2% (wt/vol) glycine, 50 mM NH4Cl, and PBS], then stained with primary antibodies dilutions specified by the manufacturer either overnight at 4 °C or for 1 h at room temperature) and secondary antibodies (usually at 1:1,000–5,000 for 30 min at room temperature) in blocking solution before imaging. The following primary antibodies were used: phosphorylated DNAdependent protein kinase catalytic subunit (pDNAPKcs; ab18356, Abcam), DNA Ligase IV (LigIV; ab26039, Abcam) and phosphorylated Ataxia telangiectasiamutated protein (pATM; ab36810, Abcam), Breast cancer type 1 susceptibility protein (BRCA1; sc6954, Santa Cruz) and Fanconi anemia group J protein (BRIP1; NB100–416, Novus). Antimouse and antirabbit secondary antibodies conjugated to Alexa Fluor 647, 568 and 488 were used for visualization (Thermofisher).
Microscopy Setup for SingleMolecule Imaging
We used a custombuilt microscopy setup based on a Leica DMI3000 microscope equipped with an HCX PL APO 63 × NA 1.47 OIL CORR TIRF objective, followed by achromatic 2 × tube lens magnification. The microscope was coupled to 473 nm (200 mW), 532 nm (200 mW), and 640 nm (150 mW) diodepumped solidstate lasers to excite the sample in a Highly Inclined and Laminated Optical Sheet (HILO) illumination mode for improved signal to noise ratio and to reject out of plane fluorescence. Sample emission was collected and split into multiple channels through the use of proper dichroic and emission narrowband bandpass filters in conjunction with the use of a Photometrics DV2 multichannel imaging system to image two colors simultaneously, sidebyside, onto a single EMCCD camera (Andor iXon + 897) acquiring at 33 Hz.
For accurate alignment and mapping of the two color channels, we first imaged diffractionlimited fluorescent beads that have wide emission spectra spanning all channels (Life Technologies). The location of the beads was matched for each channel, and a mapping matrix was generated using a custom mapping routine (IDL; Exelis Visual Information Solutions). In brief, the 2D Gaussian fitted centroid coordinates of the same bead but in different color channels were mapped using a 3rd order polynomial function:
$$\begin{array}{ccc}{x}_{R} & = & \sum _{i=0}^{3}\sum _{j=0}^{3}{P}_{ij}{x}_{G}^{i}{y}_{G}^{j}\\ {y}_{R} & = & \sum _{i=0}^{3}\sum _{j=0}^{3}{Q}_{ij}{x}_{G}^{i}{y}_{G}^{j}\end{array}$$where \(({x}_{R},{y}_{R})\) and \(({x}_{G},{y}_{G})\) are the coordinates of the same bead in Red and Green channel, respectively. The polynomial coefficients \({P}_{ij}\) and \({Q}_{ij}\) were optimized by fitting/training more than 40 pairs of such centroid coordinates, and were then applied to warp the sample images from different channels. The order of this polynomial function was optimized to maximum mapping accuracy (7~10 nm on average). Higher order but not sufficient trainee beads could result in extreme over fit.
SR Imaging
SR imaging was achieved through a modified direct stochastic optical reconstruction microscopy (dSTORM) approach^{18}. Images were acquired in HILO illumination mode^{35}. Using these acquisition modes and the filters described earlier, we found only negligible fluorescence crosstalk between channels. Subdiffraction twocolor colocalization error was <20 nm in our instrument^{18,36}. SR imaging conditions were achieved through the addition of 100 mM MEA and an additional oxygenscavenging system (1 mg/mL glucose oxidase, 0.02 mg/mL catalase, and 10% glucose)^{28,37}. Movies containing a minimum of 2,000 frames were used to generate reconstructed superresolved images.
Analysis
SR image reconstruction was performed using the freely available QuickPALM plugin for ImageJ^{38}. We specifically used an FWHM of 4 and a signaltonoise ratio (SNR) of 2–4 to detect localized particles from our raw data. Localized molecules were written to rendered images by the QuickPALM plugin with a pixel size of 20 nm. For the experimental datasets, ROIs were manually drawn to outline cell nuclei using ImageJ. Then, images were segmented with the Otsu thresholding method and all clusters less than 4 pixels in area were excluded from the analysis.
Derivation of IF formula
Suppose we have an image with sets of clusters from two different color channels. Clusters of one color are first placed randomly on the image (random color). Then, clusters of the second color (reference color) are placed according to the procedure outlined in Supplementary Figure 1 and Online methods (see below). Based on the steps in the figure, a formula can be derived to relate the IF to the percentage of overlapping clusters (of the reference color), with the only additional parameter being the probability of overlap for each cluster in a completely random simulated image (IF = 0). The probability of overlap for cluster k (1 ≤ k ≤ n) of the reference color is P _{ k }(0) for the 1^{st} iteration of the algorithm, where P _{ k }(0) = probability of overlap of cluster k where both sets of clusters placed randomly. The probability that the cluster is not placed in this iteration and to continue to the next iteration is (1–P _{ k }(0) × f), where f = Interaction Factor, i.e. the probability that the cluster does not overlap times the Interaction Factor. For the 2nd iteration of the algorithm, the probability of overlap for cluster k of the reference color is P _{ k }(0) × [(1–P _{ k }(0)) × f)], i.e. the probability of overlap for random placement times the probability that it made it to the second iteration. The probability that the cluster is not placed in this iteration and continue to the next iteration is (1–P _{ k }(0)) × [(1–P _{ k }(0)) × f] × f, i.e. the probability that it does not overlap at random placement times the probability that it was not placed in an earlier iteration times the Interaction Factor. By continuing for an infinite number of iteration, we can derive P _{ k }(f), the probability of overlap of cluster k for an arbitrary Interaction Factor, f by summing the probabilities for placement in each iteration:
since P _{ k }(f) is an infinite geometric series:
Finally, if P = % of overlapping clusters (reference color) in the experimental image, we obtain Equation 1.
Generation of IF simulations
To generate IF simulations, clusters of one color are placed randomly on a new image within the boundaries of the ROI, if present. Then, clusters of the second color (reference color) are placed according to the flowchart (Supplementary Fig. 1). This process is repeated for each cluster: (1) the cluster is placed randomly within the ROI. (2) If the IF is equal to 0, then the cluster is kept in that position. (3) If the IF is not equal to 0, the program determines if the cluster is overlapping with one or more pixels of a cluster of the other color. (4) If it is overlapping, then the cluster is kept in that position. (5) If instead it’s not overlapping, then a random number between 0 and 1 is generated. (6) If that number is greater than the IF, the cluster is kept in that position. (7) Conversely, if the random number is not greater than the IF, then the process is repeated from step (1) above. The random number generation utilized in the python package is random.random() from the python standard library. The random number generation utilized in the ImageJ plugin was Math.random() from the Java standard library.
Statistics for Ellipse Simulations and Experimental data comparisons
Ellipse simulations data is expressed as mean ± SD. Experimental data is expressed as mean ± SEM. The p criterion was 0.05. Tests for equality of variances were determined by performing a Bartlett’s test. For analyses of two groups, if results showed significant differences in variances, Welch’s ttests were performed. For analyses of more than two groups, if results showed significant differences in variances, KruskalWallis H tests were performed. Linear regression, twotailed Student’s ttests, Welch’s ttests, KruskalWallis H tests, and Oneway ANOVAs were performed using the Scipy Stats package^{39}.
Additional Measurements
In the text the percentage of overlaps refers to the percentage of clusters of the reference color overlapping with the other color. They were counted by first generating masks for both channels such that \({C}_{mask}=C > th\), were C is the color channel and th is the threshold value, followed by counting the clusters that colocalized with a cluster of the other color, and dividing it by the total number of clusters multiplied by 100. The number of overlaps were counted by first generating an overlap mask
where \(C{1}_{mask}\) is the channel 1 mask, \(C{2}_{mask}\) is the channel 2 mask, followed by counting the number of overlapping regions in the overlap mask. The area of overlaps was calculated by adding the area of all the overlapping regions and multiplying it by the scale. Mander’s coefficient 1 (M1) was calculated by utilizing the following formula:
where \(C{1}_{c}\) is the intensity value of a pixel in channel 1 above threshold that colocalized with channel 2 above threshold and \(C{1}_{i}\) is the intensity value for a pixel in channel 1 above threshold. Mander’s coefficient 2 (M2) was calculated by utilizing the following formula:
where \(C{2}_{c}\) is the intensity value of a pixel in channel 2 above threshold that colocalized with channel 1 above threshold and \(C{2}_{i}\) is the intensity value for a pixel in channel 2 above threshold.
ImageJ Plugin and python package
We provide both an ImageJ plugin (Supplementary Fig. 8) and also a python package to calculate the IF for an image. The ImageJ plugin is called “Interaction Factor Package” and contains two dialog boxes. The first, called “Interaction Factor”, calculates the Interaction Factor for an image. The user can outline the ROI and choose the color channels to be examined before running the IF calculation. Builtin ImageJ thresholding methods may be applied to the image to obtain masks for the channels. The plugin allows the user to test different thresholding methods interactively by drawing an overlay based on the chosen method. Once satisfied, the user can then run the IF calculation. The user may choose to randomly arrange both sets of clusters or keep one set of clusters in their original positions while rearranging only the second set. The output is the calculated IFs for the image (in the context of both color channels, e.g. redgreen IF and greenred IF), along with a “pvalue” indicating the proportion of the random simulations produced in the IF calculation which had a percentage of overlap ≥ that of the original image. The number of random simulations performed by default in the plugin is 50, providing a precision of 0.02 for the probability calculation. The second dialog box, called “Interaction Factor Simulations”, allows the user to create a set of simulated images at a chosen IF based on an input image. The interface for preprocessing the input image is similar to the “Interaction Factor” dialog box, except that the user may also choose the IF level and number of simulated images generated. The plugin source code and .jar file are available on github at https://github.com/FenyoLab/InteractionFactorIJplugin and a detailed user guide is included as Supplementary information. The python package “IFSimPy” provides analogous tools for calculating the IF as well as creating simulated images at any IF. The python source code is available on github at https://github.com/FenyoLab/IFSimPy.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.
Dunn, K. W., Kamocka, M. M. & McDonald, J. H. A practical guide to evaluating colocalization in biological microscopy. Am. J. Physiol. Cell Physiol. 300, C723–742 (2011).
 2.
Huang, B., Bates, M. & Zhuang, X. Superresolution fluorescence microscopy. Annu. Rev. Biochem. 78, 993–1016 (2009).
 3.
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).
 4.
Hess, S. T., Girirajan, T. P. K. & Mason, M. D. UltraHigh Resolution Imaging by Fluorescence Photoactivation Localization Microscopy. Biophys. J. 91, 4258–4272 (2006).
 5.
Klein, T., Proppert, S. & Sauer, M. Eight years of singlemolecule localization microscopy. Histochem. Cell Biol. 141, 561–575 (2014).
 6.
Whelan, D. R. & Bell, T. D. M. SuperResolution SingleMolecule Localization Microscopy: Tricks of the Trade. J. Phys. Chem. Lett. 6, 374–382 (2015).
 7.
Comeau, J.W., Costantino, S. & Wiseman, P.W. A guide to accurate fluorescence microscopy colocalization measurements. Biophys. J. 91, 4611–4622 (2006).
 8.
Bolte, S. & Cordelieres, F. P. A guided tour into subcellular colocalization analysis in light microscopy. J. Microsc. 224, 213–232 (2006).
 9.
Lagache, T., Sauvonnet, N., Danglot, L. & OlivoMarin, J. C. Statistical analysis of molecule colocalization in bioimaging. Cytometry A 87, 568–579 (2015).
 10.
Adler, J., Pagakis, S. N. & Parmryd, I. Replicatebased noise corrected correlation for accurate measurements of colocalization. J. Microsc. 230, 121–133 (2008).
 11.
Costes, S. V. et al. Automatic and quantitative measurement of proteinprotein colocalization in live cells. Biophys. J. 86, 3993–4003 (2004).
 12.
Helmuth, J. A., Paul, G. & Sbalzarini, I. F. Beyond colocalization: inferring spatial interactions between subcellular structures from microscopy images. BMC Bioinformatics 11, 372 (2010).
 13.
Fletcher, P. A., Scriven, D. R., Schulson, M. N. & Moore, E. D. Multiimage colocalization and its statistical significance. Biophys. J. 99, 1996–2005 (2010).
 14.
Lachmanovich, E. et al. Colocalization analysis of complex formation among membrane proteins by computerized fluorescence microscopy: application to immunofluorescence copatching studies. J. Microsc. 212, 122–131 (2003).
 15.
Owen, D. M. et al. PALM imaging and cluster analysis of protein heterogeneity at the cell surface. J. Biophotonics 3, 446–454 (2010).
 16.
RubinDelanchy, P. et al. Bayesian cluster identification in singlemolecule localization microscopy data. Nat. Methods. 12, 1072–1076 (2015).
 17.
Mazouchi, A. & Milstein, J. N. Fast Optimized Cluster Algorithm for Localizations (FOCAL): a spatial cluster analysis for superresolved microscopy. Bioinformatics 32, 747–754 (2016).
 18.
AgulloPascual, E. et al. Superresolution fluorescence microscopy of the cardiac connexome reveals plakophilin2 inside the connexin43 plaque. Cardiovasc. Res. 100, 231–240 (2013).
 19.
LeoMacias, A. et al. Nanoscale visualization of functional adhesion/excitability nodes at the intercalated disc. Nat. Commun. 7, 10342 (2016).
 20.
Clark, P. J. & Evans, F. C. Distance to Nearest Neighbor as a Measure of Spatial Relationships in Populations. Ecology 35, 445–453 (1954).
 21.
Ripley, B. D. The SecondOrder Analysis of Stationary Point Processes. J. Appl. Probab. 13, 255–266 (1976).
 22.
Veatch, S. L. et al. Correlation functions quantify superresolution images and estimate apparent clustering due to overcounting. PLoS One 7, e31457 (2012).
 23.
Sengupta, P. et al. Probing protein heterogeneity in the plasma membrane using PALM and pair correlation analysis. Nat. Methods. 8, 969–975 (2011).
 24.
Yin, Y. & Rothenberg, E. Probing the Spatial Organization of Molecular Complexes Using TriplePairCorrelation. Sci. Rep. 6, 30819 (2016).
 25.
Stoyan, D. & Penttinen, A. Recent applications of point process methods in forestry statistics. 61–78 (2000).
 26.
Patterson, G., Davidson, M., Manley, S. & LippincottSchwartz, J. Superresolution imaging using singlemolecule localization. Annu. Rev. Phys. Chem. 61, 345–367 (2010).
 27.
Rust, M. J., Bates, M. & Zhuang, X. Subdiffractionlimit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods. 3, 793–795 (2006).
 28.
van de Linde, S. et al. Direct stochastic optical reconstruction microscopy with standard fluorescent probes. Nat. Methods. 6, 991–1009 (2011).
 29.
Sengupta, P., JovanovicTalisman, T. & LippincottSchwartz, J. Quantifying spatial organization in pointlocalization superresolution images using pair correlation analysis. Nat. Protoc. 8, 345–354 (2013).
 30.
Malkusch, S. et al. Coordinatebased colocalization analysis of singlemolecule localization microscopy data. Histochem. Cell Biol. 137, 1–10 (2012).
 31.
Rossy, J., Cohen, E., Gaus, K. & Owen, D. M. Method for cocluster analysis in multichannel singlemolecule localisation data. Histochem. Cell Biol. 141, 605–612 (2014).
 32.
Manders, E. M. M., Verbeek, F. J. & Aten, J. A. Measurement of colocalization of objects in dualcolour confocal images. J. Microsc. 169, 375–382 (1993).
 33.
Reid, D. A. et al. Organization and dynamics of the nonhomologous endjoining machinery during DNA doublestrand break repair. Proc. Natl. Acad. Sci. USA 112, E2575–2584 (2015).
 34.
Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair Pathway Choices and Consequences at the DoubleStrand Break. Trends Cell Biol. 26, 52–64 (2016).
 35.
Tokunaga, M., Imamoto, N. & SakataSogawa, K. Highly inclined thin illumination enables clear singlemolecule imaging in cells. Nat. Methods. 5, 159–161 (2008).
 36.
Pavlides, S. C. et al. Inhibitors of SCFSkp2/Cks1 E3 ligase block estrogeninduced growth stimulation and degradation of nuclear p27kip1: therapeutic potential for endometrial cancer. Endocrinology 154, 4030–4045 (2013).
 37.
Yildiz, A. et al. Myosin V walks handoverhand: single fluorophore imaging with 1.5nm localization. Science 300, 2061–2065 (2003).
 38.
Henriques, R. et al. QuickPALM: 3D realtime photoactivation nanoscopy image processing in ImageJ. Nat. Methods. 7, 339–340 (2010).
 39.
Jones, E., Oliphant, T. & Peterson, P. SciPy: Open source scientific tools for Python http://www.scipy.org/ (2001).
Acknowledgements
K.B. was supported by the Molecular Oncology and Immunology training grant T32 CA009161. Work in E.R.’s laboratory is supported by National Institutes of Health Grants CA187612, GM108119, and the American Cancer Society RSG DMC1624101DMC. Work in D.F.’s laboratory is supported by National Institutes of Health Grant U24 CA210972. We would like to thank Mario Delmar and Esperanza AgulloPascual for the opportunity to collaborate and for comments on the ImageJ plugin. We would also like to thank Bart Aromando for his comments and work on the user guide for the plugin.
Author information
Author notes
Keria BermudezHernandez and Sarah Keegan contributed equally to this work.
Affiliations
Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA
 Keria BermudezHernandez
 , Sarah Keegan
 , Donna R. Whelan
 , Dylan A. Reid
 , Jennifer Zagelbaum
 , Yandong Yin
 , Eli Rothenberg
 & David Fenyö
Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA
 Keria BermudezHernandez
 , Sarah Keegan
 , Sisi Ma
 & David Fenyö
Authors
Search for Keria BermudezHernandez in:
Search for Sarah Keegan in:
Search for Donna R. Whelan in:
Search for Dylan A. Reid in:
Search for Jennifer Zagelbaum in:
Search for Yandong Yin in:
Search for Sisi Ma in:
Search for Eli Rothenberg in:
Search for David Fenyö in:
Contributions
K.B., S.K., S.M. and D.F. conceived the method. D.R.W. provided invaluable input and feedback on the utility of the method and the functionality of the ImageJ plugin. D.R.W., D.A.R., J.Z., Y.Y. and E.R performed and/or supervised experiments. K.B. and S.K. performed the analysis. The manuscript was cowritten by K.B., S.K. and D.F., with contributions from all authors.
Competing Interests
The authors declare that they have no competing interests.
Corresponding authors
Correspondence to Eli Rothenberg or David Fenyö.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Further reading

BRCA2 controls DNA:RNA hybrid level at DSBs by mediating RNase H2 recruitment
Nature Communications (2018)

A dynamic threestep mechanism drives the HIV1 prefusion reaction
Nature Structural & Molecular Biology (2018)

Spatiotemporal dynamics of homologous recombination repair at single collapsed replication forks
Nature Communications (2018)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.