Abstract
Single-cell transcriptomics provides an opportunity to characterize cell-type-specific transcriptional networks, intercellular signaling pathways and cellular diversity with unprecedented resolution by profiling thousands of cells in a single experiment. However, owing to the unique statistical properties of scRNA-seq data, the optimal measures of association for identifying gene–gene and cell–cell relationships from single-cell transcriptomics remain unclear. Here, we conducted a large-scale evaluation of 17 measures of association for their ability to reconstruct cellular networks, cluster cells of the same type and link cell-type-specific transcriptional programs to disease. Measures of proportionality were consistently among the best-performing methods across datasets and tasks. Our analysis provides data-driven guidance for gene and cell network analysis in single-cell transcriptomics.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the following GitHub repository: https://github.com/skinnider/SCT-MoA. Raw data are available from the Gene Expression Omnibus, http://mousebrain.org, or https://support.10xgenomics.com, as detailed in the Methods; dataset identifiers are provided in Supplementary Data 1.
Code availability
The ‘dismay’ R package is available as Supplementary Software 1 and from the following GitHub repository: https://github.com/skinnider/dismay. R code used to reproduce the analysis and figures is available from the following GitHub repository: https://github.com/skinnider/SCT-MoA.
References
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol. 14, e1006245 (2018).
Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).
Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).
Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
van der Wijst, M. G. P. et al. Single-cell RNA sequencing identifies cell-type-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018).
Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).
Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).
Crow, M., Paul, A., Ballouz, S., Huang, Z. J. & Gillis, J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat. Commun. 9, 884 (2018).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
La Manno, G. et al. Molecular diversity of midbrain development in mouse, human, and stem cells. Cell 167, 566–580 (2016).
Han, X. et al. Mapping the mouse cell atlas by Microwell-seq. Cell 172, 1091–1107 (2018).
Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, eaaq1723 (2018).
Gerber, T. et al. Single-cell analysis uncovers convergence of cell identities during axolotl limb regeneration. Science 362, eaaq0681 (2018).
Zar, J. H. Biostatistical Analysis 5th edn (Prentice-Hall/Pearson, 2010).
Mohammadi, S., Davila-Velderrain, J., Kellis, M. & Grama, A. DECODE-ing sparsity patterns in single-cell RNA-seq. Preprint at https://www.biorxiv.org/content/10.1101/241646v2 (2018).
Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. & Bähler, J. Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075 (2015).
Quinn, T. P., Richardson, M. F., Lovell, D. & Crowley, T. M. propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci. Rep. 7, 16252 (2017).
Song, L., Langfelder, P. & Horvath, S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 13, 328 (2012).
Pimentel, R. S., Niewiadomska-Bugaj, M. & Wang, J.-C. Association of zero-inflated continuous variables. Stat. Probabil. Lett. 96, 61–67 (2015).
Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017).
Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).
Ramani, A. K. et al. A map of human protein interactions derived from co-expression of human mRNAs and their orthologs. Mol. Syst. Biol. 4, 180 (2008).
Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002).
Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Zhang, B. et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013).
Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540, 423–427 (2016).
Gulsuner, S. et al. Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154, 518–529 (2013).
Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 1257601 (2015).
Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495 (2018).
Choobdar, S. et al. Open community challenge reveals molecular network modules with key roles in diseases. Preprint at https://www.biorxiv.org/content/10.1101/265553v1 (2018).
Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).
Vanlandewijck, M. et al. A molecular atlas of cell types and zonation in the brain vasculature. Nature 554, 475–480 (2018).
Zhao, Z., Nelson, A. R., Betsholtz, C. & Zlokovic, B. V. Establishment and dysfunction of the blood-brain barrier. Cell 163, 1064–1078 (2015).
Lindahl, P., Johansson, B. R., Levéen, P. & Betsholtz, C. Pericyte loss and microaneurysm formation in PDGF-B-deficient mice. Science 277, 242–245 (1997).
Chen, S. & Mar, J. C. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 19, 232 (2018).
Ballouz, S., Verleyen, W. & Gillis, J. Guidance for RNA-seq co-expression network construction and analysis: safety in numbers. Bioinformatics 31, 2123–2130 (2015).
Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 36, 1091–1099 (2018).
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Budnik, B., Levy, E., Harmange, G. & Slavov, N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018).
Camp, J. G. et al. Multilineage communication regulates human liver bud development from pluripotency. Nature 546, 533–538 (2017).
Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018).
Cohen, M. et al. Lung single-cell signaling interaction map reveals basophil role in macrophage imprinting. Cell 175, 1031–1044 (2018).
Qiu, X. et al. Towards inferring causal gene regulatory networks from single cell expression measurements. Preprint at https://www.biorxiv.org/content/10.1101/426981v1 (2018).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Hahsler, M., Chelluboina, S., Hornik, K. & Buchta, C. The arules R-Package ecosystem: analyzing interesting patterns from large transaction datasets. J. Mach. Learn. Res. 12, 2021–2025 (2011).
Dimmer, E. C. et al. The UniProt-GO annotation database in 2011. Nucleic Acids Res. 40, D565–D570 (2012).
Alanis-Lobato, G., Andrade-Navarro, M. A. & Schaefer, M. H. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 45, D408–D414 (2017).
Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
Enge, M. et al. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 171, 321–330 (2017).
Xin, Y. et al. RNA sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metab. 24, 608–615 (2016).
Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).
Wiwie, C., Baumbach, J. & Röttger, R. Comparing the performance of biomedical clustering methods. Nat. Methods 12, 1033–1038 (2015).
Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal Complex Systems 1695, 1–9 (2006).
Yu, G., Lam, T. T.-Y., Zhu, H. & Guan, Y. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Mol. Biol. Evol. 35, 3041–3043 (2018).
Yu, W., Clyne, M., Khoury, M. J. & Gwinn, M. Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics 26, 145–146 (2010).
Acknowledgements
We thank E. Su and M. Hirst for computational assistance and P. Pavlidis for comments on an earlier version of the manuscript. This work was supported by Genome Canada and Genome British Columbia (to L.J.F.; project 214PRO) and enabled in part by support provided by WestGrid and Compute Canada. M.A.S. is supported by a CIHR Vanier Canada Graduate Scholarship, an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship. J.W.S. is supported by an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship and a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship.
Author information
Authors and Affiliations
Contributions
M.A.S., J.W.S. and L.J.F. designed experiments. M.A.S. and J.W.S. performed experiments. M.A.S. wrote the first draft of the manuscript, which was edited by J.W.S. and L.J.F.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Fig. 1 Functional coherence of scRNA-seq gene coexpression networks, considering all datasets from each publication (n = 213).
Functional coherence of single-cell gene coexpression networks. Known gene functions were randomly withheld and predicted from the coexpression network in threefold cross-validation, and the area under the receiver operating characteristic curve (AUC) was calculated to quantify the degree to which genes with similar functions are coexpressed in networks constructed with each measure of association. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).
Supplementary Fig. 2 Relationships between functional coherence and dropouts or number of cells in scRNA-seq gene coexpression networks.
a,b, Relationships between functional coherence and proportion of dropouts (a) or number of cells (b) in scRNA-seq gene coexpression networks.
Supplementary Fig. 3 Functional coherence of single-cell gene coexpression networks according to scRNA-seq protocol (including all datasets associated with each publication, n = 213).
Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).
Supplementary Figure 4 Functional coherence of single-cell gene coexpression networks according to transcript coverage afforded by scRNA-seq protocol.
Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers), for the subset of n = 40 of the 43 datasets randomly sampled from each publication using scRNA-seq protocols discussed by Chen et al. (Annu. Rev. Biomed. Data Sci. 1, 29–51; 2018).
Supplementary Fig. 5 Adjusted r2 of experimental and analytical variables in univariate linear models of overall functional coherence (median AUC).
Univariate linear regression was performed using the subset of n = 35 of the 43 datasets randomly sampled from each publication (i) using scRNA-seq protocols discussed by Chen et al. (Annu. Rev. Biomed. Data Sci. 1, 29–51; 2018) and (ii) using protocols that were used in at least two different publications. Exact F test P values were 5.2 × 10–31 (measure of association); 1.3 × 10–13 (sequencing protocol); 3.6 × 10–12 (number of cells); 7.0 × 10–6 (cell isolation/capture); 1.3 × 10–3 (transcript coverage); and 0.13 (percentage of zeroes). Blue, significant at uncorrected P < 0.05; gray, not significant.
Supplementary Fig. 6 Overlap between sparse unweighted single-cell gene coexpression networks constructed with the top-ranked 20,000 or 100,000 edges and other biological networks (n = 43 datasets, one per publication).
a, Results with 20,000 edges. b, Results with 100,000 edges. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers).
Supplementary Fig. 7
Dendrograms obtained from hierarchical clustering of 561 single-cell transcriptomes derived from seven cell lines27, with two lines sequenced in two batches, colored by cell line of origin and batch.
Supplementary Fig. 8
Adjusted Rand index of hierarchical clustering of single-cell transcriptomes from cell lines profiled in two different batches27 with each measure of association.
Supplementary Fig. 9 Normalized mutual information of hierarchical clustering of single-cell transcriptomes.
a,b, Normalized mutual information of hierarchical clustering of single-cell transcriptomes from seven different cell lines27 (a) or a subset of two cell lines profiled in two different batches (b) with each measure of association.
Supplementary Fig. 10 Adjusted Rand index and normalized mutual information of Louvain clustering applied to the shared nearest-neighbor graph of single-cell transcriptomes from seven different cell lines.
a, Adjusted Rand index. b, Normalized mutual information. Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 5 values of k in nearest-neighbor graph construction (2, 5, 10, 20 and 50).
Supplementary Fig. 11 Computational time required to construct gene coexpression matrices for a random sample of 1,000 genes from the subset of scRNA-seq datasets obtained from the Gene Expression Omnibus (n = 162 datasets).
Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers). The median of 10 random samples from each dataset was retained for plotting.
Supplementary Fig. 12 Functional coherence of gene coexpression networks constructed with each measure of association at different thresholds for gene filtering, ranging from non-zero expression in less than 50% of cells (more stringent, less sparse) to non-zero expression in less than 95% of cells (less stringent, more sparse).
Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 43 datasets, one per publication.
Supplementary Fig. 13 Trends in functional coherence of gene coexpression networks for each of 17 measures of association at different thresholds for gene filtering, ranging from non-zero expression in less than 50% of cells (more stringent, less sparse) to non-zero expression in less than 95% of cells (less stringent, more sparse).
Box plots show median (horizontal line), interquartile range (hinges) and the smallest and largest values no more than 1.5 times the interquartile range (whiskers) for n = 43 datasets, one per publication.
Supplementary information
Supplementary Information
Supplementary Figs. 1–13 and Supplementary Note
Supplementary Software
‘dismay’ R package providing a unified interface to calculate all 17 measures of association.
Supplementary Data 1
List of scRNA-seq datasets analyzed in this study.
Supplementary Data 2
Functional coherence of scRNA-seq gene coexpression networks.
Supplementary Data 3
Macromolecular interaction network overlap of scRNA-seq gene coexpression networks.
Rights and permissions
About this article
Cite this article
Skinnider, M.A., Squair, J.W. & Foster, L.J. Evaluating measures of association for single-cell transcriptomics. Nat Methods 16, 381–386 (2019). https://doi.org/10.1038/s41592-019-0372-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-019-0372-4
This article is cited by
-
Highly sensitive spatial transcriptomics using FISHnCHIPs of multiple co-expressed genes
Nature Communications (2024)
-
Kernelized multiview signed graph learning for single-cell RNA sequencing data
BMC Bioinformatics (2023)
-
ENGEP: advancing spatial transcriptomics with accurate unmeasured gene expression prediction
Genome Biology (2023)
-
Identification of genetic variants that impact gene co-expression relationships using large-scale single-cell data
Genome Biology (2023)
-
KMD clustering: robust general-purpose clustering of biological data
Communications Biology (2023)