Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning

Abstract

We present single-cell interpretation via multikernel learning (SIMLR), an analytic framework and software which learns a similarity measure from single-cell RNA-seq data in order to perform dimension reduction, clustering and visualization. On seven published data sets, we benchmark SIMLR against state-of-the-art methods. We show that SIMLR is scalable and greatly enhances clustering performance while improving the visualization and interpretability of single-cell sequencing data.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Overview of SIMLR.
Figure 2: Benchmark results on data sets with ground truth.
Figure 3: Comparison of 2D visualization.

References

  1. Shapiro, E., Biezuner, T. & Linnarsson, S. Nat. Rev. Genet. 14, 618–630 (2013).

    CAS  Article  PubMed Central  Google Scholar 

  2. Pollen, A.A. et al. Nat. Biotechnol. 32, 1053–1058 (2014).

    CAS  Article  PubMed Central  Google Scholar 

  3. Usoskin, D. et al. Nat. Neurosci. 18, 145–153 (2015).

    CAS  Article  PubMed Central  Google Scholar 

  4. Kolodziejczyk, A.A. et al. Cell Stem Cell 17, 471–485 (2015).

    CAS  Article  PubMed Central  Google Scholar 

  5. Pierson, E. & Yau, C. Genome Biol. 16, 241 (2015).

    Article  PubMed Central  Google Scholar 

  6. Macosko, E.Z. et al. Cell 161, 1202–1214 (2015).

    CAS  Article  PubMed Central  Google Scholar 

  7. Zheng, G.X.Y. et al. Nat. Commun. 8, 14049 (2017).

    CAS  Article  PubMed Central  Google Scholar 

  8. Bach, F.R., Lanckriet, G.R.G. & Jordan, M.I. In Proc. 21st Int. Conf. Mach. Learn (eds. Greiner, R. & Schuurmans, D.) 6 (ICML, 2004).

  9. Gönen, M. & Alpaydin, E. J. Mach. Learn. Res. 12, 2211–2268 (2011).

    Google Scholar 

  10. Wang, B. et al. Nat. Methods 11, 333–337 (2014).

    CAS  Article  PubMed Central  Google Scholar 

  11. Buettner, F. et al. Nat. Biotechnol. 33, 155–160 (2015).

    CAS  Article  Google Scholar 

  12. Jolliffe, I. Principal Component Analysis (Wiley Online Library, 2002).

  13. Van der Maaten, L. & Hinton, G. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  14. Frey, B.J. & Dueck, D. Science 315, 972–976 (2007).

    CAS  Article  PubMed Central  Google Scholar 

  15. Ding, C. & He, X. In Proc. 21st Int. Conf. Mach. Learn (eds. Greiner, R. & Schuurmans, D.) 225–232 (ICML, 2004).

  16. Paul, F. et al. Cell 163, 1663–1677 (2015).

    CAS  Article  Google Scholar 

  17. Zeisel, A. et al. Title. Science 347, 1138–1142 (2015).

  18. von Luxburg, U. Stat. Comput. 17, 395–416 (2007).

    Article  Google Scholar 

  19. Wang, B. et al. Adv. Neural Inf. Process. Syst. 3297–3305 (2016).

  20. Nesterov, Y., Nemirovskii, A. & Ye, Y. Interior-Point Polynomial Algorithms in Convex Programming (SIAM, 1994).

  21. Parlett, B.N. The Symmetric Eigenvalue Problem (SIAM, 1980).

  22. Yang, J. & Leskovec, J. In Proc. 10th IEEE Conf. Data Min. (eds. Webb, G.I. et al.) 599–608 (IEEE, 2010).

  23. He, X., Cai, D. & Niyogi, P. Adv. Neural Inf. Process. Syst. 18, 507–514 (2005).

    Google Scholar 

  24. Kolde, R., Laur, S., Adler, P. & Vilo, J. Bioinformatics 28, 573–580 (2012).

    CAS  Article  PubMed Central  Google Scholar 

  25. Van Der Maaten, L. J. Mach. Learn. Res. 15, 3221–3245 (2014).

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank G.X. Zheng, J. Terry and T. Mikkelsen from 10x Genomics for providing access to the PBMC data as well as suggestions for the manuscript and the in silico experiments. E.P. acknowledges support from an NDSEG Fellowship and a Hertz Fellowship. J.Z. acknowledges support from a Stanford Graduate Fellowship.

Author information

Authors and Affiliations

Authors

Contributions

B.W., J.Z., and S.B. conceived the study and planned experiments. B.W. designed the algorithm and implemented the software in MATLAB. D.R. and B.W. developed the software package in R. J.Z. and E.P. performed data analysis and implemented the simulation study. J.Z. and E.P. drafted the manuscript. B.W. and S.B. contributed to the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Bo Wang or Serafim Batzoglou.

Ethics declarations

Competing interests

S.B. is currently on a leave of absence from Stanford, and he is VP of Applied and Computational Biology at Illumina.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–29, Supplementary Tables 1–10 and Supplementary Notes 1–10 (PDF 18964 kb)

Supplementary Software 1

Matlab and R implementations of SIMLR with four small-scale single-cell RNA-seq datasets (ZIP 161889 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Zhu, J., Pierson, E. et al. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 14, 414–416 (2017). https://doi.org/10.1038/nmeth.4207

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4207

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing