Analysis | Published:

Evaluating measures of association for single-cell transcriptomics

Nature Methodsvolume 16pages381386 (2019) | Download Citation

Abstract

Single-cell transcriptomics provides an opportunity to characterize cell-type-specific transcriptional networks, intercellular signaling pathways and cellular diversity with unprecedented resolution by profiling thousands of cells in a single experiment. However, owing to the unique statistical properties of scRNA-seq data, the optimal measures of association for identifying gene–gene and cell–cell relationships from single-cell transcriptomics remain unclear. Here, we conducted a large-scale evaluation of 17 measures of association for their ability to reconstruct cellular networks, cluster cells of the same type and link cell-type-specific transcriptional programs to disease. Measures of proportionality were consistently among the best-performing methods across datasets and tasks. Our analysis provides data-driven guidance for gene and cell network analysis in single-cell transcriptomics.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The data that support the findings of this study are available from the following GitHub repository: https://github.com/skinnider/SCT-MoA. Raw data are available from the Gene Expression Omnibus, http://mousebrain.org, or https://support.10xgenomics.com, as detailed in the Methods; dataset identifiers are provided in Supplementary Data 1.

Code availability

The ‘dismay’ R package is available as Supplementary Software 1 and from the following GitHub repository: https://github.com/skinnider/dismay. R code used to reproduce the analysis and figures is available from the following GitHub repository: https://github.com/skinnider/SCT-MoA.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

  2. 2.

    Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol. 14, e1006245 (2018).

  3. 3.

    Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).

  4. 4.

    Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).

  5. 5.

    Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).

  6. 6.

    Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).

  7. 7.

    Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

  8. 8.

    van der Wijst, M. G. P. et al. Single-cell RNA sequencing identifies cell-type-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018).

  9. 9.

    Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

  10. 10.

    Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

  11. 11.

    Crow, M., Paul, A., Ballouz, S., Huang, Z. J. & Gillis, J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat. Commun. 9, 884 (2018).

  12. 12.

    Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

  13. 13.

    La Manno, G. et al. Molecular diversity of midbrain development in mouse, human, and stem cells. Cell 167, 566–580 (2016).

  14. 14.

    Han, X. et al. Mapping the mouse cell atlas by Microwell-seq. Cell 172, 1091–1107 (2018).

  15. 15.

    Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, eaaq1723 (2018).

  16. 16.

    Gerber, T. et al. Single-cell analysis uncovers convergence of cell identities during axolotl limb regeneration. Science 362, eaaq0681 (2018).

  17. 17.

    Zar, J. H. Biostatistical Analysis 5th edn (Prentice-Hall/Pearson, 2010).

  18. 18.

    Mohammadi, S., Davila-Velderrain, J., Kellis, M. & Grama, A. DECODE-ing sparsity patterns in single-cell RNA-seq. Preprint at https://www.biorxiv.org/content/10.1101/241646v2 (2018).

  19. 19.

    Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. & Bähler, J. Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075 (2015).

  20. 20.

    Quinn, T. P., Richardson, M. F., Lovell, D. & Crowley, T. M. propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci. Rep. 7, 16252 (2017).

  21. 21.

    Song, L., Langfelder, P. & Horvath, S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 13, 328 (2012).

  22. 22.

    Pimentel, R. S., Niewiadomska-Bugaj, M. & Wang, J.-C. Association of zero-inflated continuous variables. Stat. Probabil. Lett. 96, 61–67 (2015).

  23. 23.

    Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).

  24. 24.

    Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).

  25. 25.

    Ramani, A. K. et al. A map of human protein interactions derived from co-expression of human mRNAs and their orthologs. Mol. Syst. Biol. 4, 180 (2008).

  26. 26.

    Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002).

  27. 27.

    Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

  28. 28.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

  29. 29.

    Zhang, B. et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013).

  30. 30.

    Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540, 423–427 (2016).

  31. 31.

    Gulsuner, S. et al. Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154, 518–529 (2013).

  32. 32.

    Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 1257601 (2015).

  33. 33.

    Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495 (2018).

  34. 34.

    Choobdar, S. et al. Open community challenge reveals molecular network modules with key roles in diseases. Preprint at https://www.biorxiv.org/content/10.1101/265553v1 (2018).

  35. 35.

    Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).

  36. 36.

    Vanlandewijck, M. et al. A molecular atlas of cell types and zonation in the brain vasculature. Nature 554, 475–480 (2018).

  37. 37.

    Zhao, Z., Nelson, A. R., Betsholtz, C. & Zlokovic, B. V. Establishment and dysfunction of the blood-brain barrier. Cell 163, 1064–1078 (2015).

  38. 38.

    Lindahl, P., Johansson, B. R., Levéen, P. & Betsholtz, C. Pericyte loss and microaneurysm formation in PDGF-B-deficient mice. Science 277, 242–245 (1997).

  39. 39.

    Chen, S. & Mar, J. C. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 19, 232 (2018).

  40. 40.

    Ballouz, S., Verleyen, W. & Gillis, J. Guidance for RNA-seq co-expression network construction and analysis: safety in numbers. Bioinformatics 31, 2123–2130 (2015).

  41. 41.

    Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 36, 1091–1099 (2018).

  42. 42.

    Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

  43. 43.

    Budnik, B., Levy, E., Harmange, G. & Slavov, N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018).

  44. 44.

    Camp, J. G. et al. Multilineage communication regulates human liver bud development from pluripotency. Nature 546, 533–538 (2017).

  45. 45.

    Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018).

  46. 46.

    Cohen, M. et al. Lung single-cell signaling interaction map reveals basophil role in macrophage imprinting. Cell 175, 1031–1044 (2018).

  47. 47.

    Qiu, X. et al. Towards inferring causal gene regulatory networks from single cell expression measurements. Preprint at https://www.biorxiv.org/content/10.1101/426981v1 (2018).

  48. 48.

    Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

  49. 49.

    Hahsler, M., Chelluboina, S., Hornik, K. & Buchta, C. The arules R-Package ecosystem: analyzing interesting patterns from large transaction datasets. J. Mach. Learn. Res. 12, 2021–2025 (2011).

  50. 50.

    Dimmer, E. C. et al. The UniProt-GO annotation database in 2011. Nucleic Acids Res. 40, D565–D570 (2012).

  51. 51.

    Alanis-Lobato, G., Andrade-Navarro, M. A. & Schaefer, M. H. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 45, D408–D414 (2017).

  52. 52.

    Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).

  53. 53.

    Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

  54. 54.

    Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).

  55. 55.

    Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).

  56. 56.

    Enge, M. et al. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 171, 321–330 (2017).

  57. 57.

    Xin, Y. et al. RNA sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metab. 24, 608–615 (2016).

  58. 58.

    Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).

  59. 59.

    Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).

  60. 60.

    Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).

  61. 61.

    Wiwie, C., Baumbach, J. & Röttger, R. Comparing the performance of biomedical clustering methods. Nat. Methods 12, 1033–1038 (2015).

  62. 62.

    Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).

  63. 63.

    Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal Complex Systems 1695, 1–9 (2006).

  64. 64.

    Yu, G., Lam, T. T.-Y., Zhu, H. & Guan, Y. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Mol. Biol. Evol. 35, 3041–3043 (2018).

  65. 65.

    Yu, W., Clyne, M., Khoury, M. J. & Gwinn, M. Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics 26, 145–146 (2010).

Download references

Acknowledgements

We thank E. Su and M. Hirst for computational assistance and P. Pavlidis for comments on an earlier version of the manuscript. This work was supported by Genome Canada and Genome British Columbia (to L.J.F.; project 214PRO) and enabled in part by support provided by WestGrid and Compute Canada. M.A.S. is supported by a CIHR Vanier Canada Graduate Scholarship, an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship. J.W.S. is supported by an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship.

Author information

Affiliations

  1. Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada

    • Michael A. Skinnider
    •  & Leonard J. Foster
  2. International Collaboration on Repair Discoveries (ICORD), University of British Columbia, Vancouver, Bristish Columbia, Canada

    • Jordan W. Squair
  3. Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia, Canada

    • Leonard J. Foster

Authors

  1. Search for Michael A. Skinnider in:

  2. Search for Jordan W. Squair in:

  3. Search for Leonard J. Foster in:

Contributions

M.A.S., J.W.S. and L.J.F. designed experiments. M.A.S. and J.W.S. performed experiments. M.A.S. wrote the first draft of the manuscript, which was edited by J.W.S. and L.J.F.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Michael A. Skinnider or Leonard J. Foster.

Integrated supplementary information

  1. Supplementary Fig. 1 Functional coherence of scRNA-seq gene coexpression networks, considering all datasets from each publication (n = 213).

    Functional coherence of single-cell gene coexpression networks. Known gene functions were randomly withheld and predicted from the coexpression network in threefold cross-validation, and the area under the receiver operating characteristic curve (AUC) was calculated to quantify the degree to which genes with similar functions are coexpressed in networks constructed with each measure of association. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).

  2. Supplementary Fig. 2 Relationships between functional coherence and dropouts or number of cells in scRNA-seq gene coexpression networks.

    a,b, Relationships between functional coherence and proportion of dropouts (a) or number of cells (b) in scRNA-seq gene coexpression networks.

  3. Supplementary Fig. 3 Functional coherence of single-cell gene coexpression networks according to scRNA-seq protocol (including all datasets associated with each publication, n = 213).

    Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).

  4. Supplementary Figure 4 Functional coherence of single-cell gene coexpression networks according to transcript coverage afforded by scRNA-seq protocol.

    Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers), for the subset of n = 40 of the 43 datasets randomly sampled from each publication using scRNA-seq protocols discussed by Chen et al. (Annu. Rev. Biomed. Data Sci. 1, 29–51; 2018).

  5. Supplementary Fig. 5 Adjusted r2 of experimental and analytical variables in univariate linear models of overall functional coherence (median AUC).

    Univariate linear regression was performed using the subset of n = 35 of the 43 datasets randomly sampled from each publication (i) using scRNA-seq protocols discussed by Chen et al. (Annu. Rev. Biomed. Data Sci. 1, 29–51; 2018) and (ii) using protocols that were used in at least two different publications. Exact F test P values were 5.2 × 10–31 (measure of association); 1.3 × 10–13 (sequencing protocol); 3.6 × 10–12 (number of cells); 7.0 × 10–6 (cell isolation/capture); 1.3 × 10–3 (transcript coverage); and 0.13 (percentage of zeroes). Blue, significant at uncorrected P < 0.05; gray, not significant.

  6. Supplementary Fig. 6 Overlap between sparse unweighted single-cell gene coexpression networks constructed with the top-ranked 20,000 or 100,000 edges and other biological networks (n = 43 datasets, one per publication).

    a, Results with 20,000 edges. b, Results with 100,000 edges. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).

  7. Supplementary Fig. 7

    Dendrograms obtained from hierarchical clustering of 561 single-cell transcriptomes derived from seven cell lines27, with two lines sequenced in two batches, colored by cell line of origin and batch.

  8. Supplementary Fig. 8

    Adjusted Rand index of hierarchical clustering of single-cell transcriptomes from cell lines profiled in two different batches27 with each measure of association.

  9. Supplementary Fig. 9 Normalized mutual information of hierarchical clustering of single-cell transcriptomes.

    a,b, Normalized mutual information of hierarchical clustering of single-cell transcriptomes from seven different cell lines27 (a) or a subset of two cell lines profiled in two different batches (b) with each measure of association.

  10. Supplementary Fig. 10 Adjusted Rand index and normalized mutual information of Louvain clustering applied to the shared nearest-neighbor graph of single-cell transcriptomes from seven different cell lines.

    a, Adjusted Rand index. b, Normalized mutual information. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 5 values of k in nearest-neighbor graph construction (2, 5, 10, 20 and 50).

  11. Supplementary Fig. 11 Computational time required to construct gene coexpression matrices for a random sample of 1,000 genes from the subset of scRNA-seq datasets obtained from the Gene Expression Omnibus (n = 162 datasets).

    Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers). The median of 10 random samples from each dataset was retained for plotting.

  12. Supplementary Fig. 12 Functional coherence of gene coexpression networks constructed with each measure of association at different thresholds for gene filtering, ranging from non-zero expression in less than 50% of cells (more stringent, less sparse) to non-zero expression in less than 95% of cells (less stringent, more sparse).

    Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 43 datasets, one per publication.

  13. Supplementary Fig. 13 Trends in functional coherence of gene coexpression networks for each of 17 measures of association at different thresholds for gene filtering, ranging from non-zero expression in less than 50% of cells (more stringent, less sparse) to non-zero expression in less than 95% of cells (less stringent, more sparse).

    Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 43 datasets, one per publication.

Supplementary information

  1. Supplementary Information

    Supplementary Figs. 1–13 and Supplementary Note

  2. Reporting Summary

  3. Supplementary Software

    ‘dismay’ R package providing a unified interface to calculate all 17 measures of association.

  4. Supplementary Data 1

    List of scRNA-seq datasets analyzed in this study.

  5. Supplementary Data 2

    Functional coherence of scRNA-seq gene coexpression networks.

  6. Supplementary Data 3

    Macromolecular interaction network overlap of scRNA-seq gene coexpression networks.

About this article

Publication history

Received

Revised

Accepted

Published

Issue Date

DOI

https://doi.org/10.1038/s41592-019-0372-4