Abstract
Single-molecule localization microscopy (SMLM) generates data in the form of coordinates of localized fluorophores. Cluster analysis is an attractive route for extracting biologically meaningful information from such data and has been widely applied. Despite a range of cluster analysis algorithms, there exists no consensus framework for the evaluation of their performance. Here, we use a systematic approach based on two metrics to score the success of clustering algorithms in simulated conditions mimicking experimental data. We demonstrate the framework using seven diverse analysis algorithms: DBSCAN, ToMATo, KDE, FOCAL, CAML, ClusterViSu and SR-Tesseler. Given that the best performer depended on the underlying distribution of localizations, we demonstrate an analysis pipeline based on statistical similarity measures that enables the selection of the most appropriate algorithm, and the optimized analysis parameters for real SMLM data. We propose that these standard simulated conditions, metrics and analysis pipeline become the basis for future analysis algorithm development and evaluation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
Both the simulation and the real SMLM data used as the basis for this work are available for download at https://github.com/DJ-Nieves/ARI-and-IoU-cluster-analysis-evaluation without restriction. Source data are provided with this paper.
Code availability
R code for calculating ARI and IoU for clustering results against a ground truth scenario is available for download at https://github.com/DJ-Nieves/ARI-and-IoU-cluster-analysis-evaluation without restriction.
References
Goyette, J. & Gaus, K. Mechanisms of protein nanoscale clustering. Curr. Opin. Cell Biol. 44, 86–92 (2017).
Goyette, J., Nieves, D. J., Ma, Y. & Gaus, K. How does T cell receptor clustering impact on signal transduction? J. Cell Sci. 132, jcs226423 (2019).
Prior, I. A., Muncke, C., Parton, R. G. & Hancock, J. F. Direct visualization of Ras proteins in spatially distinct cell surface microdomains. J. Cell Biol. 160, 165–170 (2003).
Lukeš, T. et al. Quantifying protein densities on cell membranes using super-resolution optical fluctuation imaging. Nat. Commun. 8, 1731 (2017).
Sauer, M. & Heilemann, M. Single-molecule localization microscopy in eukaryotes. Chem. Rev. 117, 7478–7509 (2017).
Heilemann, M. et al. Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes. Angew. Chem. Int. Ed. Engl. 47, 6172–6176 (2008).
Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006).
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).
Sharonov, A. & Hochstrasser, R. M. Wide-field subdiffraction imaging by accumulated binding of diffusing probes. Proc. Natl Acad. Sci. USA 103, 18911–18916 (2006).
Jungmann, R. et al. Single-molecule kinetics and super-resolution microscopy by fluorescence imaging of transient binding on DNA origami. Nano Lett. 10, 4756–4761 (2010).
Jungmann, R. et al. Multiplexed 3D cellular super-resolution imaging with DNA-PAINT and Exchange-PAINT. Nat. Methods 11, 313–318 (2014).
Nieves, D. J., Gaus, K. & Baker, M. A. B. DNA-based super-resolution microscopy: DNA-PAINT. Genes (Basel) 9, 621 (2018).
Nieves, D. J. & Owen, D. M. Analysis methods for interrogating spatial organisation of single molecule localization microscopy data. Int. J. Biochem. Cell Biol. 123, 105749 (2020).
Khater, I. M., Nabi, I. R. & Hamarneh, G. A review of super-resolution single-molecule localization microscopy cluster analysis and quantification methods. Patterns (NY) 1, 100038 (2020).
Ripley, B. D. Modeling spatial patterns. J. R. Stat. Soc. B Methodol. 39, 172–192 (1977).
Cover, T. M. & Hart, P. E. Nearest neighbor pattern classification. IEEE Trans. Inform. Theory 13, 21–27 (1967).
van Leeuwen, J. M. J., Groeneveld, J. & de Boer, J. New method for the calculation of the pair correlation function. I. Physica 25, 792–808 (1959).
Rossy, J., Owen, D. M., Williamson, D. J., Yang, Z. & Gaus, K. Conformational states of the kinase Lck regulate clustering in early T cell signaling. Nat. Immunol. 14, 82–89 (2013).
Williamson, D. J. et al. Pre-existing clusters of the adaptor Lat do not participate in early T cell signaling events. Nat. Immunol. 12, 655–662 (2011).
Bar-On, D. et al. Super-resolution imaging reveals the internal architecture of nano-sized syntaxin clusters. J. Biol. Chem. 287, 27158–27167 (2012).
Razvag, Y., Neve-Oz, Y., Sajman, J., Reches, M. & Sherman, E. Nanoscale kinetic segregation of TCR and CD45 in engaged microvilli facilitates early T cell activation. Nat. Commun. 9, 732 (2018).
Scarselli, M., Annibale, P. & Radenovic, A. Cell type-specific beta2-adrenergic receptor clusters identified using photoactivated localization microscopy are not lipid raft related, but depend on actin cytoskeleton integrity. J. Biol. Chem. 287, 16768–16780 (2012).
Mollazade, M. et al. Can single molecule localization microscopy be used to map closely spaced RGD nanodomains? PLoS One 12, e0180871 (2017).
Levet, F. et al. SR-Tesseler: a method to segment and quantify localization-based super-resolution microscopy data. Nat. Methods 12, 1065–1071 (2015).
Andronov, L., Orlov, I., Lutz, Y., Vonesch, J. L. & Klaholz, B. P. ClusterViSu, a method for clustering of protein complexes by Voronoi tessellation in super-resolution microscopy. Sci. Rep. 6, 24084 (2016).
Mazouchi, A. & Milstein, J. N. Fast Optimized Cluster Algorithm for Localizations (FOCAL): a spatial cluster analysis for super-resolved microscopy. Bioinformatics 32, 747–754 (2016).
Williamson, D. J. et al. Machine learning for cluster analysis of localization microscopy data. Nat. Commun. 11, 1493 (2020).
Pike, J. A. et al. Topological data analysis quantifies biological nano-structure from single molecule localization microscopy. Bioinformatics 36, 1614–1621 (2020).
Griffié, J. et al. A Bayesian cluster analysis method for single-molecule localization microscopy data. Nat. Protoc. 11, 2499–2514 (2016).
Rubin-Delanchy, P. et al. Bayesian cluster identification in single-molecule localization microscopy data. Nat. Methods 12, 1072–1076 (2015).
Nieves, D. J. et al. The T cell receptor displays lateral signal propagation involving non-engaged receptors. Nanoscale 14, 3513–3526 (2022).
Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).
Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).
Jaccard, P. The distribution of the flora in the alpine zone. 1. New Phytologist 11, 37–50 (1912).
Tanimoto, T. T. An Elementary Mathematical Theory of Classification and Prediction (IBM, 1958).
Margalit, A. & Knott, G. D. An algorithm for computing the union, intersection or difference of two polygons. Computers Graphics 13, 167–183 (1989).
Ester, M., Kriegel, H. P., Sander, J., Xiaowei, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD-96 Proceedings 226–231 (AAAI, 1996).
Chazal, F., Guibas, L. J., Oudot, S. Y. & Skraba, P. Persistence-based clustering in Riemannian manifolds. J. ACM 60, 1–38 (2013).
Bohrer, C. H. et al. A pairwise distance distribution correction (DDC) algorithm to eliminate blinking-caused artifacts in SMLM. Nat. Methods 18, 669–677 (2021).
Jensen, L. G. et al. Correction of multiple-blinking artefacts in photoactivated localization microscopy. Nat. Methods 19, 594–602 (2022).
Monegal, A. et al. Immunological applications of single-domain llama recombinant antibodies isolated from a naive library. Protein Eng. Des. Sel. 22, 273–280 (2009).
Baragilly, M., Nieves, D. J., Williamson, D. J., Peters, R. & Owen, D. M. Measuring the similarity of SMLM-derived point-clouds. Preprint at https://www.biorxiv.org/content/10.1101/2022.09.12.507560v1 (2022).
Ambrosetti, E. et al. Quantification of circulating cancer biomarkers via sensitive topographic measurements on single binder nanoarrays. ACS Omega 30, 2618–2629 (2017).
Veggiani, G. & de Marco, A. Improved quantitative and qualitative production of single-domain intrabodies mediated by the co-expression of Erv1p sulfhydryl oxidase. Protein Expr. Purif. 79, 111–114 (2011).
Acknowledgements
D.M.O. acknowledges funding from BBSRC grant BB/R007365/1. M.H. acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, Project-ID 259130777, SFB 1177; GRK 2566). D.M.O. and M.B. acknowledge funding from the Alan Turing Institute.
Author information
Authors and Affiliations
Contributions
D.J.N. wrote simulation and analysis code, produced simulations, performed cluster analyses, acquired dSTORM data and wrote the manuscript. J.A.P. wrote the simulation code. F.L. and D.J.W. performed analyses. M.B. performed dissimilarity measurements. S.O. and A.d.M. produced the FGFR1 nanobody. J.G., D.S., E.A.K.C., J.A.P., J.-B.S. and M.H. contributed ideas and concepts. D.M.O. conceived the work and wrote the manuscript. All authors contributed to the drafting and writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Marek Cebecauer and the other, anonymous, reviewers for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–13 and Supplementary Tables 1–38.
Source data
Source Data Fig. 1
Ripley’s K curves for each simulated data scenario.
Source Data Fig. 2
Mean and variance of ARI and IoU scoring for ground truth scenario 2 for parameter scanning of clustering algorithms DBSCAN, ToMATo and KDE.
Source Data Fig. 3
Mean of the maximal ARI and IoU scores for all algorithms for simulation scenarios 2–10.
Source Data Fig. 4
Mean of the maximal ARI and IoU scores for all algorithms for simulation scenarios 2–10 with added multiple blinking.
Source Data Fig. 5
Cluster areas and number of clusters per μm2 identified in FGFR1 dSTORM data using framework-optimized DBSCAN parameters.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nieves, D.J., Pike, J.A., Levet, F. et al. A framework for evaluating the performance of SMLM cluster analysis algorithms. Nat Methods 20, 259–267 (2023). https://doi.org/10.1038/s41592-022-01750-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-022-01750-6