Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Evaluating measures of association for single-cell transcriptomics

Abstract

Single-cell transcriptomics provides an opportunity to characterize cell-type-specific transcriptional networks, intercellular signaling pathways and cellular diversity with unprecedented resolution by profiling thousands of cells in a single experiment. However, owing to the unique statistical properties of scRNA-seq data, the optimal measures of association for identifying gene–gene and cell–cell relationships from single-cell transcriptomics remain unclear. Here, we conducted a large-scale evaluation of 17 measures of association for their ability to reconstruct cellular networks, cluster cells of the same type and link cell-type-specific transcriptional programs to disease. Measures of proportionality were consistently among the best-performing methods across datasets and tasks. Our analysis provides data-driven guidance for gene and cell network analysis in single-cell transcriptomics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Single-cell transcriptomics compendium and functional coherence of single-cell gene coexpression networks.
Fig. 2: Overlap between single-cell gene coexpression networks and other biological networks.
Fig. 3: Reproducibility of single-cell gene coexpression networks of the human pancreas.
Fig. 4: Accuracy of measures of association for clustering single-cell transcriptomes of known cell types.
Fig. 5: Disease gene prioritization through single-cell gene coexpression analysis.

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the following GitHub repository: https://github.com/skinnider/SCT-MoA. Raw data are available from the Gene Expression Omnibus, http://mousebrain.org, or https://support.10xgenomics.com, as detailed in the Methods; dataset identifiers are provided in Supplementary Data 1.

Code availability

The ‘dismay’ R package is available as Supplementary Software 1 and from the following GitHub repository: https://github.com/skinnider/dismay. R code used to reproduce the analysis and figures is available from the following GitHub repository: https://github.com/skinnider/SCT-MoA.

References

  1. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

    Article  CAS  PubMed  Google Scholar 

  2. Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol. 14, e1006245 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. van der Wijst, M. G. P. et al. Single-cell RNA sequencing identifies cell-type-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Crow, M., Paul, A., Ballouz, S., Huang, Z. J. & Gillis, J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat. Commun. 9, 884 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. La Manno, G. et al. Molecular diversity of midbrain development in mouse, human, and stem cells. Cell 167, 566–580 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Han, X. et al. Mapping the mouse cell atlas by Microwell-seq. Cell 172, 1091–1107 (2018).

    Article  CAS  PubMed  Google Scholar 

  15. Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, eaaq1723 (2018).

    Article  PubMed  Google Scholar 

  16. Gerber, T. et al. Single-cell analysis uncovers convergence of cell identities during axolotl limb regeneration. Science 362, eaaq0681 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Zar, J. H. Biostatistical Analysis 5th edn (Prentice-Hall/Pearson, 2010).

  18. Mohammadi, S., Davila-Velderrain, J., Kellis, M. & Grama, A. DECODE-ing sparsity patterns in single-cell RNA-seq. Preprint at https://www.biorxiv.org/content/10.1101/241646v2 (2018).

  19. Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. & Bähler, J. Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Quinn, T. P., Richardson, M. F., Lovell, D. & Crowley, T. M. propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci. Rep. 7, 16252 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Song, L., Langfelder, P. & Horvath, S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 13, 328 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Pimentel, R. S., Niewiadomska-Bugaj, M. & Wang, J.-C. Association of zero-inflated continuous variables. Stat. Probabil. Lett. 96, 61–67 (2015).

    Article  Google Scholar 

  23. Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).

    CAS  PubMed  Google Scholar 

  24. Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ramani, A. K. et al. A map of human protein interactions derived from co-expression of human mRNAs and their orthologs. Mol. Syst. Biol. 4, 180 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002).

    Article  CAS  PubMed  Google Scholar 

  27. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

    Article  CAS  PubMed  Google Scholar 

  28. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Zhang, B. et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540, 423–427 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Gulsuner, S. et al. Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154, 518–529 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 1257601 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Choobdar, S. et al. Open community challenge reveals molecular network modules with key roles in diseases. Preprint at https://www.biorxiv.org/content/10.1101/265553v1 (2018).

  35. Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Vanlandewijck, M. et al. A molecular atlas of cell types and zonation in the brain vasculature. Nature 554, 475–480 (2018).

    Article  CAS  PubMed  Google Scholar 

  37. Zhao, Z., Nelson, A. R., Betsholtz, C. & Zlokovic, B. V. Establishment and dysfunction of the blood-brain barrier. Cell 163, 1064–1078 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lindahl, P., Johansson, B. R., Levéen, P. & Betsholtz, C. Pericyte loss and microaneurysm formation in PDGF-B-deficient mice. Science 277, 242–245 (1997).

    Article  CAS  PubMed  Google Scholar 

  39. Chen, S. & Mar, J. C. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 19, 232 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Ballouz, S., Verleyen, W. & Gillis, J. Guidance for RNA-seq co-expression network construction and analysis: safety in numbers. Bioinformatics 31, 2123–2130 (2015).

    Article  CAS  PubMed  Google Scholar 

  41. Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 36, 1091–1099 (2018).

    Article  CAS  Google Scholar 

  42. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Budnik, B., Levy, E., Harmange, G. & Slavov, N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Camp, J. G. et al. Multilineage communication regulates human liver bud development from pluripotency. Nature 546, 533–538 (2017).

    Article  CAS  PubMed  Google Scholar 

  45. Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018).

    Article  CAS  PubMed  Google Scholar 

  46. Cohen, M. et al. Lung single-cell signaling interaction map reveals basophil role in macrophage imprinting. Cell 175, 1031–1044 (2018).

    Article  CAS  PubMed  Google Scholar 

  47. Qiu, X. et al. Towards inferring causal gene regulatory networks from single cell expression measurements. Preprint at https://www.biorxiv.org/content/10.1101/426981v1 (2018).

  48. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hahsler, M., Chelluboina, S., Hornik, K. & Buchta, C. The arules R-Package ecosystem: analyzing interesting patterns from large transaction datasets. J. Mach. Learn. Res. 12, 2021–2025 (2011).

    Google Scholar 

  50. Dimmer, E. C. et al. The UniProt-GO annotation database in 2011. Nucleic Acids Res. 40, D565–D570 (2012).

    Article  CAS  PubMed  Google Scholar 

  51. Alanis-Lobato, G., Andrade-Navarro, M. A. & Schaefer, M. H. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 45, D408–D414 (2017).

    Article  CAS  PubMed  Google Scholar 

  52. Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).

    Article  PubMed  Google Scholar 

  53. Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

    Article  CAS  PubMed  Google Scholar 

  54. Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Enge, M. et al. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 171, 321–330 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Xin, Y. et al. RNA sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metab. 24, 608–615 (2016).

    Article  CAS  PubMed  Google Scholar 

  58. Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).

    Article  Google Scholar 

  60. Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).

    CAS  PubMed  Google Scholar 

  61. Wiwie, C., Baumbach, J. & Röttger, R. Comparing the performance of biomedical clustering methods. Nat. Methods 12, 1033–1038 (2015).

    Article  CAS  PubMed  Google Scholar 

  62. Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).

    PubMed  PubMed Central  Google Scholar 

  63. Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal Complex Systems 1695, 1–9 (2006).

    Google Scholar 

  64. Yu, G., Lam, T. T.-Y., Zhu, H. & Guan, Y. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Mol. Biol. Evol. 35, 3041–3043 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Yu, W., Clyne, M., Khoury, M. J. & Gwinn, M. Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics 26, 145–146 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank E. Su and M. Hirst for computational assistance and P. Pavlidis for comments on an earlier version of the manuscript. This work was supported by Genome Canada and Genome British Columbia (to L.J.F.; project 214PRO) and enabled in part by support provided by WestGrid and Compute Canada. M.A.S. is supported by a CIHR Vanier Canada Graduate Scholarship, an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship. J.W.S. is supported by an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship.

Author information

Authors and Affiliations

Authors

Contributions

M.A.S., J.W.S. and L.J.F. designed experiments. M.A.S. and J.W.S. performed experiments. M.A.S. wrote the first draft of the manuscript, which was edited by J.W.S. and L.J.F.

Corresponding authors

Correspondence to Michael A. Skinnider or Leonard J. Foster.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 Functional coherence of scRNA-seq gene coexpression networks, considering all datasets from each publication (n = 213).

Functional coherence of single-cell gene coexpression networks. Known gene functions were randomly withheld and predicted from the coexpression network in threefold cross-validation, and the area under the receiver operating characteristic curve (AUC) was calculated to quantify the degree to which genes with similar functions are coexpressed in networks constructed with each measure of association. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).

Supplementary Fig. 2 Relationships between functional coherence and dropouts or number of cells in scRNA-seq gene coexpression networks.

a,b, Relationships between functional coherence and proportion of dropouts (a) or number of cells (b) in scRNA-seq gene coexpression networks.

Supplementary Fig. 3 Functional coherence of single-cell gene coexpression networks according to scRNA-seq protocol (including all datasets associated with each publication, n = 213).

Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).

Supplementary Figure 4 Functional coherence of single-cell gene coexpression networks according to transcript coverage afforded by scRNA-seq protocol.

Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers), for the subset of n = 40 of the 43 datasets randomly sampled from each publication using scRNA-seq protocols discussed by Chen et al. (Annu. Rev. Biomed. Data Sci. 1, 29–51; 2018).

Supplementary Fig. 5 Adjusted r2 of experimental and analytical variables in univariate linear models of overall functional coherence (median AUC).

Univariate linear regression was performed using the subset of n = 35 of the 43 datasets randomly sampled from each publication (i) using scRNA-seq protocols discussed by Chen et al. (Annu. Rev. Biomed. Data Sci. 1, 29–51; 2018) and (ii) using protocols that were used in at least two different publications. Exact F test P values were 5.2 × 10–31 (measure of association); 1.3 × 10–13 (sequencing protocol); 3.6 × 10–12 (number of cells); 7.0 × 10–6 (cell isolation/capture); 1.3 × 10–3 (transcript coverage); and 0.13 (percentage of zeroes). Blue, significant at uncorrected P < 0.05; gray, not significant.

Supplementary Fig. 6 Overlap between sparse unweighted single-cell gene coexpression networks constructed with the top-ranked 20,000 or 100,000 edges and other biological networks (n = 43 datasets, one per publication).

a, Results with 20,000 edges. b, Results with 100,000 edges. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).

Supplementary Fig. 7

Dendrograms obtained from hierarchical clustering of 561 single-cell transcriptomes derived from seven cell lines27, with two lines sequenced in two batches, colored by cell line of origin and batch.

Supplementary Fig. 8

Adjusted Rand index of hierarchical clustering of single-cell transcriptomes from cell lines profiled in two different batches27 with each measure of association.

Supplementary Fig. 9 Normalized mutual information of hierarchical clustering of single-cell transcriptomes.

a,b, Normalized mutual information of hierarchical clustering of single-cell transcriptomes from seven different cell lines27 (a) or a subset of two cell lines profiled in two different batches (b) with each measure of association.

Supplementary Fig. 10 Adjusted Rand index and normalized mutual information of Louvain clustering applied to the shared nearest-neighbor graph of single-cell transcriptomes from seven different cell lines.

a, Adjusted Rand index. b, Normalized mutual information. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 5 values of k in nearest-neighbor graph construction (2, 5, 10, 20 and 50).

Supplementary Fig. 11 Computational time required to construct gene coexpression matrices for a random sample of 1,000 genes from the subset of scRNA-seq datasets obtained from the Gene Expression Omnibus (n = 162 datasets).

Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers). The median of 10 random samples from each dataset was retained for plotting.

Supplementary Fig. 12 Functional coherence of gene coexpression networks constructed with each measure of association at different thresholds for gene filtering, ranging from non-zero expression in less than 50% of cells (more stringent, less sparse) to non-zero expression in less than 95% of cells (less stringent, more sparse).

Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 43 datasets, one per publication.

Supplementary Fig. 13 Trends in functional coherence of gene coexpression networks for each of 17 measures of association at different thresholds for gene filtering, ranging from non-zero expression in less than 50% of cells (more stringent, less sparse) to non-zero expression in less than 95% of cells (less stringent, more sparse).

Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 43 datasets, one per publication.

Supplementary information

Supplementary Information

Supplementary Figs. 1–13 and Supplementary Note

Reporting Summary

Supplementary Software

‘dismay’ R package providing a unified interface to calculate all 17 measures of association.

Supplementary Data 1

List of scRNA-seq datasets analyzed in this study.

Supplementary Data 2

Functional coherence of scRNA-seq gene coexpression networks.

Supplementary Data 3

Macromolecular interaction network overlap of scRNA-seq gene coexpression networks.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Skinnider, M.A., Squair, J.W. & Foster, L.J. Evaluating measures of association for single-cell transcriptomics. Nat Methods 16, 381–386 (2019). https://doi.org/10.1038/s41592-019-0372-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0372-4

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing