Comparing the performance of biomedical clustering methods


Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (, to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Performance of all clustering tools on all nonartificial data sets on the basis of F1 scores.
Figure 2: Correlations between internal and external cluster validity indices for all biomedical data sets.


  1. 1

    Brohée, S. & van Helden, J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7, 488 (2006).

    Article  Google Scholar 

  2. 2

    Wittkop, T., Baumbach, J., Lobo, F.P. & Rahmann, S. Large scale clustering of protein sequences with FORCE—a layout based heuristic for weighted cluster editing. BMC Bioinformatics 8, 396 (2007).

    Article  Google Scholar 

  3. 3

    Salton, G. Developments in automatic text retrieval. Science 253, 974–980 (1991).

    CAS  Article  Google Scholar 

  4. 4

    Navigli, R. Word sense disambiguation: a survey. ACM Comput. Surv. 41, 10:11–10:69 (2009).

    Article  Google Scholar 

  5. 5

    Verhaak, R.G.W. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110 (2010).

    CAS  Article  Google Scholar 

  6. 6

    Wirapati, P. et al. Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Res. 10, R65 (2008).

    Article  Google Scholar 

  7. 7

    Wittkop, T. et al. Comprehensive cluster analysis with Transitivity Clustering. Nat. Protoc. 6, 285–295 (2011).

    CAS  Article  Google Scholar 

  8. 8

    Röttger, R. et al. Density parameter estimation for finding clusters of homologous proteins–tracing actinobacterial pathogenicity lifestyles. Bioinformatics 29, 215–222 (2013).

    Article  Google Scholar 

  9. 9

    King, A.D., Przulj, N. & Jurisica, I. Protein complex prediction via cost-based clustering. Bioinformatics 20, 3013–3020 (2004).

    CAS  Article  Google Scholar 

  10. 10

    Nepusz, T., Yu, H. & Paccanaro, A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods 9, 471–472 (2012).

    CAS  Article  Google Scholar 

  11. 11

    Milligan, G. & Cheng, R. Measuring the influence of individual data points in a cluster analysis. Journal of Classification 13, 315–335 (1996).

    Article  Google Scholar 

  12. 12

    Xu, R. & Wunsch, D.C. Clustering algorithms in biomedical research: a review. IEEE Rev. Biomed. Eng. 3, 120–154 (2010).

    Article  Google Scholar 

  13. 13

    Andreopoulos, B., An, A., Wang, X. & Schroeder, M. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief. Bioinform. 10, 297–314 (2009).

    CAS  Article  Google Scholar 

  14. 14

    Dubes, R.C. How many clusters are best? - An experiment. Pattern Recognit. 20, 645–663 (1987).

    Article  Google Scholar 

  15. 15

    Jain, A.K., Murty, M.N. & Flynn, P.J. Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999).

    Article  Google Scholar 

  16. 16

    Röttger, R., Kreutzer, C., Duong Vu, T., Wittkop, T. & Baumbach, J. Online transitivity clustering of biological data with missing values. Proc. German Conference on Bioinformatics (eds. Böcker, S., Hufsky, F., Scheubert, K., Schleicher, J. & Schuster, S.) 57–68 (Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2012).

  17. 17

    Belacel, N., Wang, Q. & Cuperlovic-Culf, M. Clustering methods for microarray gene expression data. OMICS 10, 507–531 (2006).

    CAS  Article  Google Scholar 

  18. 18

    Boutros, P.C. & Okey, A.B. Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data. Brief. Bioinform. 6, 331–343 (2005).

    CAS  Article  Google Scholar 

  19. 19

    D'Haeseleer, P. How does gene expression clustering work? Nat. Biotechnol. 23, 1499–1501 (2005).

    CAS  Article  Google Scholar 

  20. 20

    Kerr, G., Ruskin, H.J., Crane, M. & Doolan, P. Techniques for clustering gene expression data. Comput. Biol. Med. 38, 283–293 (2008).

    CAS  Article  Google Scholar 

  21. 21

    Thalamuthu, A., Mukhopadhyay, I., Zheng, X. & Tseng, G.C. Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22, 2405–2412 (2006).

    CAS  Article  Google Scholar 

  22. 22

    Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).

    CAS  Article  Google Scholar 

  23. 23

    Rodriguez, A. & Laio, A. Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014).

    CAS  Article  Google Scholar 

  24. 24

    Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231 (1996).

    Google Scholar 

  25. 25

    Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: cluster analysis basics and extensions. R package version 2.0.1 (2015).

  26. 26

    R Core Team. A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2012).

  27. 27

    Van Dongen, S. A Cluster Algorithm for Graphs Technical Report INS-R0010 (National Research Institute for Mathematics and Computer Science in the Netherlands, 2000).

  28. 28

    Bader, G.D. & Hogue, C.W.V. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 2 (2003).

    Article  Google Scholar 

  29. 29

    Wehrens, R. & Buydens, L.M.C. Self- and super-organizing maps in R: the kohonen package. J. Stat. Softw. 21, 1–19 (2007).

    Article  Google Scholar 

  30. 30

    Karatzoglou, A., Smola, A., Hornik, K. & Zeileis, A. kernlab–an S4 package for kernel methods in R. J. Stat. Softw. 11, 1–20 (2004).

    Article  Google Scholar 

  31. 31

    Wittkop, T. et al. Partitioning biological data with transitivity clustering. Nat. Methods 7, 419–420 (2010).

    CAS  Article  Google Scholar 

  32. 32

    Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering—a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003).

    Article  Google Scholar 

  33. 33

    Speicher, N. Towards the Identification of Cancer Subtypes by Integrative Clustering of Molecular Data M.S. thesis, Universität des Saarlandes (2012).

  34. 34

    Pagel, P. et al. The MIPS mammalian protein-protein interaction database. Bioinformatics 21, 832–834 (2005).

    CAS  Article  Google Scholar 

  35. 35

    Brenner, S.E., Koehl, P. & Levitt, M. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28, 254–256 (2000).

    CAS  Article  Google Scholar 

  36. 36

    Brown, S.D., Gerlt, J.A., Seffernick, J.L. & Babbitt, P.C. A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol. 7, R8 (2006).

    Article  Google Scholar 

  37. 37

    Ortiz, A.R., Strauss, C.E. & Olmea, O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 11, 2606–2621 (2002).

    CAS  Article  Google Scholar 

  38. 38

    Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977).

    Article  Google Scholar 

  39. 39

    Chang, H. & Yeung, D.-Y. Robust path-based spectral clustering. Pattern Recognit. 41, 191–203 (2008).

    Article  Google Scholar 

  40. 40

    Fränti, P. & Virmajoki, O. Iterative shrinking method for clustering problems. Pattern Recognit. 39, 761–775 (2006).

    Article  Google Scholar 

  41. 41

    Fu, L. & Medico, E. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics 8, 3 (2007).

    Article  Google Scholar 

  42. 42

    Gionis, A., Mannila, H. & Tsaparas, P. Clustering aggregation. ACM Trans. Knowl. Discov. Data 1, 4–es (2007).

    Article  Google Scholar 

  43. 43

    Veenman, C.J., Reinders, M.J.T. & Backer, E. A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1273–1280 (2002).

    Article  Google Scholar 

  44. 44

    Zahn, C.T. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20, 68–86 (1971).

    Article  Google Scholar 

  45. 45

    Leisch, F. & Dimitriadou, E. mlbench: Machine Learning Benchmark Problems R package version 2.1-1. (CRAN R Project, 2010).

  46. 46

    Miller, G.A. WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995).

    Article  Google Scholar 

  47. 47

    Davies, D.L. & Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979).

    CAS  Article  Google Scholar 

  48. 48

    Dunn, J.C. Well-separated clusters and optimal fuzzy partitions. Cybern. Syst. 4, 95–104 (1974).

    Google Scholar 

  49. 49

    Rousseeuw, P.J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

    Article  Google Scholar 

  50. 50

    Powers, D.M.W. Evaluation: from precision, recall and F-factor to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2, 1–24 (2007).

    Google Scholar 

  51. 51

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

  52. 52

    Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, 2009).

  53. 53

    Fowlkes, E.B. & Mallows, C.L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983).

    Article  Google Scholar 

  54. 54

    Jaccard, P. Etude comparative de la distribution florale dans une portion des Alpes et du Jura (Corbaz, 1901).

  55. 55

    Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).

    Article  Google Scholar 

  56. 56

    Rosenberg, A. & Hirschberg, J. V-Measure: a conditional entropy-based external cluster evaluation measure. In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (ed. Eisner, J.) 410–420 (Association for Computational Linguistics, 2007).

  57. 57

    Hartigan, J.A. & Wong, M.A. A K-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28, 100–108 (1979).

    Google Scholar 

  58. 58

    Sander, J., Ester, M., Kriegel, H.-P. & Xu, X. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Discov. 2, 169–194 (1998).

    Article  Google Scholar 

  59. 59

    Lawson, R.G. & Jurs, P.C. New index for clustering tendency and its application to chemical problems. J. Chem. Inf. Comput. Sci. 30, 36–41 (1990).

    CAS  Article  Google Scholar 

  60. 60

    Handl, J., Knowles, J. & Kell, D.B. Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005).

    CAS  Article  Google Scholar 

Download references


C.W. is supported by the SDU2020 funding initiative at the University of Southern Denmark. R.R. was partially supported by the International Max Planck Research School on Computer Science and the Saarland University Graduate School for Computer Science. J.B. is grateful for financial support from the Cluster of Excellence for Multimodal Computing and Interaction (MMCI).

Author information




C.W. implemented ClustEval and performed the study. J.B. and R.R. jointly directed this work and designed the study. All authors contributed equally to the manuscript.

Corresponding author

Correspondence to Jan Baumbach.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Performance of all clustering tools on all data sets.

See Table 1 in the main text for definitions of methods’ abbreviations. Empty fields correspond to an inability of the corresponding tool to cluster the data set or to an inability to compute a cluster validity index. This happens when a tool needs feature vectors for the objects but the data set is given as similarity matrix, or when the silhouette value is undefined (indicated with an asterisk) because the clustering consists of only singletons or only one cluster, respectively.

Supplementary Figure 2 Robustness analysis of all clustering methods.

Robustness of all clustering methods on five selected data sets reported as mean F1 scores over ten repetitions. For the two biomedical data sets (astral1_161 and bone_marrow) the noise levels are 5% (low) and 10% (high). For the three synthetic data sets, we report the performance on higher noise levels: 20% (low) and 40% (high). See Table 1 for definitions of methods’ abbreviations. Empty fields correspond to an inability of the corresponding tool to cluster the data set. This happens when a tool needs feature vectors for the objects but the data set is given as similarity matrix.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–2 and Supplementary Note (PDF 2855 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wiwie, C., Baumbach, J. & Röttger, R. Comparing the performance of biomedical clustering methods. Nat Methods 12, 1033–1038 (2015).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing