Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Bayesian cluster identification in single-molecule localization microscopy data

Abstract

Single-molecule localization-based super-resolution microscopy techniques such as photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) produce pointillist data sets of molecular coordinates. Although many algorithms exist for the identification and localization of molecules from raw image data, methods for analyzing the resulting point patterns for properties such as clustering have remained relatively under-studied. Here we present a model-based Bayesian approach to evaluate molecular cluster assignment proposals, generated in this study by analysis based on Ripley's K function. The method takes full account of the individual localization precisions calculated for each emitter. We validate the approach using simulated data, as well as experimental data on the clustering behavior of CD3ζ, a subunit of the CD3 T cell receptor complex, in resting and activated primary human T cells.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Workflow of the algorithm.
Figure 2: Four different clustering scenarios.
Figure 3: Comparison of the clustering behavior of CD3ζ-mEos3.2 in primary human T cells resting on poly-L-lysine (PLL) or forming synapses (activated).

References

  1. Huang, B. Super-resolution optical microscopy: multiple choices. Curr. Opin. Chem. Biol. 14, 10–14 (2010).

    CAS  Article  Google Scholar 

  2. Hell, S.W. & Wichmann, J. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Opt. Lett. 19, 780–782 (1994).

    CAS  Article  Google Scholar 

  3. Chmyrov, A. et al. Nanoscopy with more than 100,000 'doughnuts'. Nat. Methods 10, 737–740 (2013).

    CAS  Article  Google Scholar 

  4. Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).

    CAS  Article  Google Scholar 

  5. Rust, M.J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006).

    CAS  Article  Google Scholar 

  6. Heilemann, M. et al. Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes. Angew. Chem. Int. Ed. Engl. 47, 6172–6176 (2008).

    CAS  Article  Google Scholar 

  7. Hess, S.T., Girirajan, T.P.K. & Mason, M.D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006).

    CAS  Article  Google Scholar 

  8. Wolter, S. et al. rapidSTORM: accurate, fast open-source software for localization microscopy. Nat. Methods 9, 1040–1041 (2012).

    CAS  Article  Google Scholar 

  9. Holden, S.J., Uphoff, S. & Kapanidis, A.N. DAOSTORM: an algorithm for high-density super-resolution microscopy. Nat. Methods 8, 279–280 (2011).

    CAS  Article  Google Scholar 

  10. Henriques, R. et al. QuickPALM: 3D real-time photoactivation nanoscopy image processing in ImageJ. Nat. Methods 7, 339–340 (2010).

    CAS  Article  Google Scholar 

  11. van de Linde, S. et al. Direct stochastic optical reconstruction microscopy with standard fluorescent probes. Nat. Protoc. 6, 991–1009 (2011).

    CAS  Article  Google Scholar 

  12. Heilemann, M., van de Linde, S., Mukherjee, A. & Sauer, M. Super-resolution imaging with small organic fluorophores. Angew. Chem. Int. Ed. Engl. 48, 6903–6908 (2009).

    CAS  Article  Google Scholar 

  13. Dempsey, G.T. et al. Photoswitching mechanism of cyanine dyes. J. Am. Chem. Soc. 131, 18192–18193 (2009).

    CAS  Article  Google Scholar 

  14. Williamson, D.J. et al. Pre-existing clusters of the adaptor Lat do not participate in early T cell signaling events. Nat. Immunol. 12, 655–662 (2011).

    CAS  Article  Google Scholar 

  15. Rossy, J., Owen, D.M., Williamson, D.J., Yang, Z. & Gaus, K. Conformational states of the kinase Lck regulate clustering in early T cell signaling. Nat. Immunol. 14, 82–89 (2013).

    CAS  Article  Google Scholar 

  16. Ripley, B.D. Modelling spatial patterns. J. R. Stat. Soc. Series B Stat. Methodol. 39, 172–192 (1977).

    Google Scholar 

  17. Sengupta, P. et al. Probing protein heterogeneity in the plasma membrane using PALM and pair correlation analysis. Nat. Methods 8, 969–975 (2011).

    CAS  Article  Google Scholar 

  18. Veatch, S.L. et al. Correlation functions quantify super-resolution images and estimate apparent clustering due to over-counting. PLoS ONE 7, e31457 (2012).

    CAS  Article  Google Scholar 

  19. Owen, D.M. et al. PALM imaging and cluster analysis of protein heterogeneity at the cell surface. J. Biophotonics 3, 446–454 (2010).

    CAS  Article  Google Scholar 

  20. Sherman, E. et al. Functional nanoscale organization of signaling molecules downstream of the T cell antigen receptor. Immunity 35, 705–720 (2011).

    CAS  Article  Google Scholar 

  21. Lillemeier, B.F. et al. TCR and Lat are expressed on separate protein islands on T cell membranes and concatenate during activation. Nat. Immunol. 11, 90–96 (2010).

    CAS  Article  Google Scholar 

  22. Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Identification of clustering artifacts in photoactivated localization microscopy. Nat. Methods 8, 527–528 (2011).

    CAS  Article  Google Scholar 

  23. Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Quantitative photo activated localization microscopy: unraveling the effects of photoblinking. PLoS ONE 6, e22678 (2011).

    CAS  Article  Google Scholar 

  24. Ovesný, M., Krř ížek, P., Borkovec, J., Švindrych, Z. & Hagen, G.M. ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging. Bioinformatics 30, 2389–2390 (2014).

    Article  Google Scholar 

  25. Quan, T., Zeng, S. & Huang, Z.-L. Localization capability and limitation of electron-multiplying charge-coupled, scientific complementary metal-oxide semiconductor, and charge-coupled devices for superresolution imaging. J. Biomed. Opt. 15, 066005 (2010).

    Article  Google Scholar 

  26. Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973).

    Article  Google Scholar 

  27. Getis, A. & Franklin, J. Second-order neighborhood analysis of mapped point patterns. Ecology 68, 473–477 (1987).

    Article  Google Scholar 

  28. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (2006).

    Article  Google Scholar 

  29. Hinneburg, A. & Gabriel, H.-H. in Advances in Intelligent Data Analysis VII (eds. Berthold, M.R., Shawe-Taylor, J. & Lavrač, N.) 70–80 (Springer, 2007).

  30. Johnson, S.C. Hierarchical clustering schemes. Psychometrika 32, 241–254 (1967).

    CAS  Article  Google Scholar 

  31. Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 226–231 (1996).

  32. Neve-Oz, Y., Razvag, Y., Sajman, J. & Sherman, E. Mechanisms of localized activation of the T cell antigen receptor inside clusters. Biochim. Biophys. Acta 1853, 810–821 (2015).

    CAS  Article  Google Scholar 

  33. Cox, S. et al. Bayesian localization microscopy reveals nanoscale podosome dynamics. Nat. Methods 9, 195–200 (2012).

    CAS  Article  Google Scholar 

  34. Lee, S.-H., Shin, J.Y., Lee, A. & Bustamante, C. Counting single photoactivatable fluorescent molecules by photoactivated localization microscopy (PALM). Proc. Natl. Acad. Sci. USA 109, 17436–17441 (2012).

    CAS  Article  Google Scholar 

  35. Gandy, A. Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104, 1504–1511 (2009).

    Article  Google Scholar 

  36. Gandy, A. & Rubin-Delanchy, P. An algorithm to compute the power of Monte Carlo tests with guaranteed precision. Ann. Stat. 41, 125–142 (2013).

    Article  Google Scholar 

  37. Green, P.J. & Richardson, S. Modelling heterogeneity with and without the Dirichlet process. Scand. J. Stat. 28, 355–375 (2001).

    Article  Google Scholar 

Download references

Acknowledgements

D.M.O. acknowledges funding from the European Research Council (FP7 starter grant 337187) and Marie Curie Career Integration grant 334303. A.P.C. is funded by Arthritis Research UK grants 19652 and 20525.

Author information

Authors and Affiliations

Authors

Contributions

P.R.-D., N.A.H. and D.M.O. conceived the method. P.R.-D., J.G. and D.M.O. performed the analysis. P.R.-D. and D.M.O. wrote the manuscript. G.L.B. acquired cell data. G.L.B., D.J.W. and A.P.C. provided materials.

Corresponding authors

Correspondence to Patrick Rubin-Delanchy or Dylan M Owen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Performance analysis under four different clustering scenarios.

Performance analysis under four different clustering scenarios. i) Standard Conditions, ii) a sparse data set with only 10% as many localisations, iii) clusters which are twice as large and iv) only 10 localisations per cluster and 90% of localisations in the background. a) Histograms of the number of clusters per region. b) Histograms of the cluster radii. c) Histograms of the number of localisations per cluster and d) Histograms of the percentage of localisations found in clusters. In all cases the blue dashed line represents the true value. Histograms are calculated from 100 simulated data sets for each scenario.

Supplementary Figure 2 Estimation of cluster descriptors as simulation parameters vary (n = 100 per point).

Estimation of cluster descriptors as simulation parameters vary (n = 100 per point). i) Measured localisations per cluster. ii) Measured cluster radii. iii) Measured percentage of localisations in clusters and iv) measured number of clusters per region. a) Simulated number of localisations per cluster, b) simulated cluster radii and c) simulated fraction of background localisations. Blue dashed lines represent simulated values.

Supplementary Figure 3 Comparison of our algorithm with DBSCAN.

Comparison of our algorithm with DBSCAN. a-d) Comparison of the proposal generating algorithm (I) with DBSCAN (II) when each method is allowed to optimise its analysis parameters based on our Bayesian scoring mechanism, run on simulated data in the Standard Conditions (n = 100). a) Number of clusters per region, b) percentage of localisations in clusters, c) number of localisations per cluster and d) cluster radii. e-h) Histograms of key cluster descriptors generated by DBSCAN with fixed r = 50 nm and T = 78 from simulated data in the Standard Conditions (n = 100). e) Number of clusters per region, f) number of localisations per cluster, g) percentage of localisations in clusters and h) cluster radii. Blue dashed lines represent simulated values.

Supplementary Figure 4 Analysis of simulated data from the standard and sparse conditions (n = 100) by alternative clustering techniques.

Analysis of simulated data from the Standard (a-f) and sparse (g-l) conditions (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.

Supplementary Figure 5 Analysis of simulated data with 100 nm clusters and data with 90% of localizations in the background (n = 100) by alternative clustering techniques.

Analysis of simulated data with 100 nm clusters (a-f) and data with 90% of localisations in the background (g-l) (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.

Supplementary Figure 6 Performance of the algorithm on data sets (n = 100) with an uneven background.

Performance of the algorithm on data sets (n = 100) with an uneven background, following a Beta(2,2) (i) or a Beta(5,1) (ii) distribution. a) Representative data. b) Log(posterior probability) heat maps. c) The highest scoring cluster proposal. d) Histograms of cluster radii. e) Histograms of the number of localisations per cluster. f) Histograms of the number of clusters. g) Histograms of the percentage of localisations found in clusters. Blue dashed lines represent true values.

Supplementary Figure 7 Analysis of simulated data with an uneven background by alternative clustering techniques.

Analysis of simulated data with an uneven background, following a Beta(2,2) (a-f) or a Beta(5,1) (g-l) distribution (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.

Supplementary Figure 8 Histograms of the measured number of localizations per cluster and cluster radii for simulated dimers, trimers and hexamers.

Histograms of the measured number of localisations per cluster and cluster radii for simulated dimers, trimers and hexamers in a dense or sparse distribution (n = 100 per condition). These were simulated by generating localisations with identical coordinates which were then independently scrambled according to each point’s localisation precision. The detection problem becomes harder as the overall density increases. We therefore performed the analysis at two different densities (2000 and 200 overall localisations).

Supplementary Figure 9 Performance of the algorithm on data sets (n = 100) with different cluster sizes within the same ROI.

Performance of the algorithm on data sets (n = 100) with different cluster sizes within the same ROI, 5x 50 nm and 5x 100 nm (i), 5x 10 nm and 5x 100 nm (ii) and a range of cluster sizes between 10 and 100 nm (iii). a) Representative data. b) Log(posterior probability) heat maps. c) The highest scoring cluster proposal and d) histograms of the measured cluster radii. Blue dashed lines represent simulated values.

Supplementary Figure 10 Side by side comparison of methods applied to three simulated conditions.

Side by side comparison of our method (I), Getis’s method (II) and DBSCAN (III) applied to three simulated conditions (n = 100 datasets each). The conditions are Standard Conditions (representative dataset shown in Fig. 2a i), Standard Conditions but with larger clusters (representative dataset shown in Fig. 2a iii) and Standard Conditions with uneven background (representative dataset shown in Supplementary Fig. 6a ii).

Supplementary Figure 11 Sensitivity of the measured clustering descriptors to the prior settings.

Sensitivity of the measured clustering descriptors (percentage of localisations in clusters, cluster radii, number of clusters per ROI and the number of localisations per cluster) to the prior settings. a) Illustration of the sensitivity of measured cluster descriptors to varying the Dirichlet process concentration coefficient, α. b) Sensitivity to the prior probability of any localisation being allocated to the background. c) Sensitivity to the prior distribution on the cluster radius. Two possible distributions are considered, one taken from experimental data (i) and a flat distribution between zero and half the size of the ROI (ii).

Supplementary Figure 12 Data preprocessing steps.

Data preprocessing steps. a) Determination of the optimal merge time. Following the method of Annibale et al, we plot the total number of localisations in a representative image against the merge time for CD3ζ -mEos3 in primary T cells. The average optimum merge time was found to be three frames (30 ms). This example is representative of four such plots. b and c) Representative histogram of the localisation precisions calculated for CD3ζ data in resting T cells (b) and in activated cells (c), using the method of Quan et al.

Supplementary Figure 13 Bayesian analysis of CD3ζ clustering in activated primary human T cells with localizations from the same ROI divided into two equally sized data sets.

Bayesian analysis of CD3ζ clustering in activated primary human T cells with localisations from the same ROI divided into two equally-sized data sets. a) and b) Highest scoring cluster proposals. c) and d) Log(posterior probability) heat maps.

Supplementary Figure 14 Data on CD3ζ clustering analysed by alternative approaches in resting and activated primary human T cells.

Data on CD3ζ clustering analysed by alternative approaches in resting and activated primary human T cells. a) Representative cluster maps (n = 30) generated by Getis and Franklin’s Local Point Pattern Analysis of a 3000 x 3000 nm area. b) Representative binary maps showing clustered areas. c) Ripley’s K function (average of n = 30 regions). d) Pair correlation curves (average of n = 30 regions) and e) Cluster statistics on the percentage of localisations found in clusters, number of clusters per region and cluster radii extracted from the binary maps in b).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 (PDF 5755 kb)

Supplementary Software

R code for running Bayesian cluster analysis (ZIP 103 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rubin-Delanchy, P., Burn, G., Griffié, J. et al. Bayesian cluster identification in single-molecule localization microscopy data. Nat Methods 12, 1072–1076 (2015). https://doi.org/10.1038/nmeth.3612

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3612

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing