Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Machine learning methods to model multicellular complexity and tissue specificity

Abstract

Experimental approaches to study tissue specificity enable insight into the nature and organization of the cell types and tissues that constitute complex multicellular organisms. Machine learning provides a powerful tool to investigate and interpret tissue-specific experimental data. In this Review, we first provide a brief introduction to key single-cell and whole-tissue approaches that allow investigation of tissue specificity and then highlight two classes of machine-learning-based methods, which can be applied to analyse, model and interpret these experimental data. Deep learning methods can predict tissue-dependent effects of individual mutations on gene expression, alternative splicing and disease phenotypes. Network-based approaches can capture relationships between biomolecules, integrate large heterogeneous data compendia to model molecular circuits and identify tissue-specific functional relationships and regulatory connections. We conclude with an outlook to future possibilities in examining multicellular complexity by combining high-resolution, large-scale multiomics data sets and interpretable machine learning models.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Decoding multicellular complexity.
Fig. 2: Machine learning approaches.
Fig. 3: Decoding the tissue-specific impact of sequence variants.
Fig. 4: The GIANT framework for predicting tissue-specific functional networks.
Fig. 5: Applications of tissue-specific networks.

References

  1. 1.

    Mazzarello, P. A unifying concept: the history of cell theory. Nat. Cell Biol. 1, E13–E15 (1999).

    CAS  Google Scholar 

  2. 2.

    Willensdorfer, M. On the evolution of differentiated multicellularity. Evolution 63, 306–323 (2009).

    Google Scholar 

  3. 3.

    Ispolatov, I., Ackermann, M. & Doebeli, M. Division of labour and the evolution of multicellularity. Proc. R. Soc. B 279, 1768–1776 (2012).

    Google Scholar 

  4. 4.

    Long, F., Peng, H., Liu, X., Kim, S. K. & Myers, E. A 3D digital atlas of C. elegans and its application to single-cell analyses. Nat. Methods 6, 667–672 (2009).

    CAS  Google Scholar 

  5. 5.

    Sulston, J. E. & Horvitz, H. R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev. Biol. 56, 110–156 (1977).

    CAS  Google Scholar 

  6. 6.

    Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).

    CAS  Google Scholar 

  7. 7.

    Woodhouse, R. M. & Ashe, A. How do histone modifications contribute to transgenerational epigenetic inheritance in C. elegans? Biochem. Soc. Trans. 48, 1019–1034 (2020).

    CAS  Google Scholar 

  8. 8.

    Fernandez, R. W. et al. Cellular expression and functional roles of all 26 neurotransmitter GPCRs in the C. elegans egg-laying circuit. J. Neurosci. 40, 7475–7488 (2020).

    CAS  Google Scholar 

  9. 9.

    Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020).

    CAS  Google Scholar 

  10. 10.

    Kim-Hellmuth, S. et al. Cell type–specific genetic regulation of gene expression across human tissues. Science 369, eaaz8528 (2020).

    CAS  Google Scholar 

  11. 11.

    Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    CAS  Google Scholar 

  12. 12.

    Poulin, J.-F., Tasic, B., Hjerling-Leffler, J., Trimarchi, J. M. & Awatramani, R. Disentangling neural cell diversity using single-cell transcriptomics. Nat. Neurosci. 19, 1131–1141 (2016).

    Google Scholar 

  13. 13.

    Consortium, T. T. M. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Google Scholar 

  14. 14.

    Park, J. et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science 360, 758–763 (2018).

    CAS  Google Scholar 

  15. 15.

    Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, eaaq1723 (2018).

    Google Scholar 

  16. 16.

    Kashima, Y. et al. Single-cell sequencing techniques from individual to multiomics analyses. Exp. Mol. Med. 52, 1419–1427 (2020).

    CAS  Google Scholar 

  17. 17.

    Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).

    CAS  Google Scholar 

  18. 18.

    Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).

    CAS  Google Scholar 

  19. 19.

    Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).

    CAS  Google Scholar 

  20. 20.

    Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).

    CAS  Google Scholar 

  21. 21.

    Eng, C.-H. L., Shah, S., Thomassie, J. & Cai, L. Profiling the transcriptome with RNA SPOTs. Nat. Methods 14, 1153–1155 (2017).

    CAS  Google Scholar 

  22. 22.

    Larsson, L., Frisén, J. & Lundeberg, J. Spatially resolved transcriptomics adds a new dimension to genomics. Nat. Methods 18, 15–18 (2021).

    CAS  Google Scholar 

  23. 23.

    Guo, H. et al. Profiling DNA methylome landscapes of mammalian cells with single-cell reduced-representation bisulfite sequencing. Nat. Protoc. 10, 645–659 (2015).

    CAS  Google Scholar 

  24. 24.

    Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).

    CAS  Google Scholar 

  25. 25.

    Angermueller, C. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods 13, 229–232 (2016).

    CAS  Google Scholar 

  26. 26.

    Clark, S. J. et al. Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq). Nat. Protoc. 12, 534–547 (2017).

    CAS  Google Scholar 

  27. 27.

    Grosselin, K. et al. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer. Nat. Genet. 51, 1060–1066 (2019).

    CAS  Google Scholar 

  28. 28.

    Kelsey, G., Stegle, O. & Reik, W. Single-cell epigenomics: Recording the past and predicting the future. Science 358, 69–75 (2017).

    CAS  Google Scholar 

  29. 29.

    Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    CAS  Google Scholar 

  30. 30.

    Hughes, A. J. et al. Single-cell western blotting. Nat. Methods 11, 749–755 (2014).

    CAS  Google Scholar 

  31. 31.

    Budnik, B., Levy, E., Harmange, G. & Slavov, N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018).

    Google Scholar 

  32. 32.

    Lee, J., Hyeon, D. Y. & Hwang, D. Single-cell multiomics: technologies and data analysis methods. Exp. Mol. Med. 52, 1428–1442 (2020).

    CAS  Google Scholar 

  33. 33.

    Ando, Y., Kwon, A. T.-J. & Shin, J. W. An era of single-cell genomics consortia. Exp. Mol. Med. 52, 1409–1418 (2020).

    CAS  Google Scholar 

  34. 34.

    Petegrosso, R., Li, Z. & Kuang, R. Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Brief. Bioinform. 21, 1209–1223 (2019).

    Google Scholar 

  35. 35.

    Efremova, M. & Teichmann, S. A. Computational methods for single-cell omics across modalities. Nat. Methods 17, 14–17 (2020).

    CAS  Google Scholar 

  36. 36.

    Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).

  37. 37.

    Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    CAS  Google Scholar 

  38. 38.

    Yao, V., Wong, A. & Troyanskaya, O. Enabling precision medicine through integrative network models. J. Mol. Biol. 430, 2913–2923 (2018).

    CAS  Google Scholar 

  39. 39.

    Bumgarner, R. Overview of DNA microarrays: types, applications, and their future. Curr. Protoc. Mol. Biol. 101, 22.1.1–22.1.11 (2013).

    Google Scholar 

  40. 40.

    Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).

    CAS  Google Scholar 

  41. 41.

    Wen, X. et al. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA 95, 334–339 (1998).

    CAS  Google Scholar 

  42. 42.

    Alon, U. et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA 96, 6745–6750 (1999).

    CAS  Google Scholar 

  43. 43.

    Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    CAS  Google Scholar 

  44. 44.

    Wold, B. & Myers, R. M. Sequence census methods for functional genomics. Nat. Methods 5, 19–21 (2008).

    CAS  Google Scholar 

  45. 45.

    Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE 12, e0190152 (2017).

    Google Scholar 

  46. 46.

    Hrdlickova, R., Toloue, M. & Tian, B. RNA-Seq methods for transcriptome analysis. Wiley Interdiscip. Rev. RNA 8, e1364 (2017).

    Google Scholar 

  47. 47.

    Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Google Scholar 

  48. 48.

    Brunner, E. et al. A high-quality catalog of the Drosophila melanogaster proteome. Nat. Biotechnol. 25, 576–583 (2007).

    CAS  Google Scholar 

  49. 49.

    Schrimpf, S. P. et al. Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol. 7, e48 (2009).

    Google Scholar 

  50. 50.

    Washburn, M. P., Wolters, D. & Yates, J. R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001).

    CAS  Google Scholar 

  51. 51.

    Chintapalli, V. R., Al Bratty, M., Korzekwa, D., Watson, D. G. & Dow, J. A. T. Mapping an atlas of tissue-specific Drosophila melanogaster metabolomes by high resolution mass spectrometry. PLoS ONE 8, e78066 (2013).

    CAS  Google Scholar 

  52. 52.

    Stupp, G. S. et al. Isotopic ratio outlier analysis global metabolomics of Caenorhabditis elegans. Anal. Chem. 85, 11858–11865 (2013).

    CAS  Google Scholar 

  53. 53.

    Davis, S. et al. Expanding proteome coverage with CHarge Ordered Parallel Ion aNalysis (CHOPIN) combined with broad specificity proteolysis. J. Proteome Res. 16, 1288–1299 (2017).

    CAS  Google Scholar 

  54. 54.

    Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599.e4 (2017).

    CAS  Google Scholar 

  55. 55.

    Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).

    CAS  Google Scholar 

  56. 56.

    Nagaraj, N. et al. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 7, 548 (2011).

    Google Scholar 

  57. 57.

    Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).

    Google Scholar 

  58. 58.

    Darmanis, S. et al. A survey of human brain transcriptome diversity at the single cell level. Proc. Natl Acad. Sci. USA 112, 7285–7290 (2015).

    CAS  Google Scholar 

  59. 59.

    Menon, R. et al. Single cell transcriptomics identifies focal segmental glomerulosclerosis remission endothelial biomarker. JCI Insight 5, e133267 (2020).

    Google Scholar 

  60. 60.

    Lake, B. B. et al. A single-nucleus RNA-sequencing pipeline to decipher the molecular anatomy and pathophysiology of human kidneys. Nat. Commun. 10, 2832 (2019).

    Google Scholar 

  61. 61.

    Schiller, H. B. et al. The human lung cell atlas: a high-resolution reference map of the human lung in health and disease. Am. J. Respir. Cell Mol. Biol. 61, 31–41 (2019).

    CAS  Google Scholar 

  62. 62.

    Ding, J. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38, 737–746 (2020).

    CAS  Google Scholar 

  63. 63.

    Bakken, T. E. et al. Single-nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS ONE 13, e0209648 (2018).

    Google Scholar 

  64. 64.

    Wu, H., Kirita, Y., Donnelly, E. L. & Humphreys, B. D. Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J. Am. Soc. Nephrol. 30, 23–32 (2019).

    CAS  Google Scholar 

  65. 65.

    Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

    CAS  Google Scholar 

  66. 66.

    Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Google Scholar 

  67. 67.

    Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).

    CAS  Google Scholar 

  68. 68.

    Schwartzman, O. & Tanay, A. Single-cell epigenomics: techniques and emerging applications. Nat. Rev. Genet. 16, 716–726 (2015).

    CAS  Google Scholar 

  69. 69.

    Kelly, R. T. Single-cell proteomics: progress and prospects. Mol. Cell. Proteom. 19, 1739–1748 (2020).

    CAS  Google Scholar 

  70. 70.

    Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165–1172 (2015).

    CAS  Google Scholar 

  71. 71.

    Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).

    Google Scholar 

  72. 72.

    Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).

    Google Scholar 

  73. 73.

    Doerr, A. Single-cell proteomics. Nat. Methods 16, 20 (2019).

    CAS  Google Scholar 

  74. 74.

    Cong, Y. et al. Ultrasensitive single-cell proteomics workflow identifies >1000 protein groups per mammalian cell. Chem. Sci. 12, 1001–1006 (2021).

    CAS  Google Scholar 

  75. 75.

    Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).

    CAS  Google Scholar 

  76. 76.

    Zhu, C., Preissl, S. & Ren, B. Single-cell multimodal omics: the power of many. Nat. Methods 17, 11–14 (2020).

    CAS  Google Scholar 

  77. 77.

    Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol. 38, 1007–1022 (2020).

    CAS  Google Scholar 

  78. 78.

    Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    CAS  Google Scholar 

  79. 79.

    Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    CAS  Google Scholar 

  80. 80.

    Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    CAS  Google Scholar 

  81. 81.

    Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063–1070 (2019).

    CAS  Google Scholar 

  82. 82.

    Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    CAS  Google Scholar 

  83. 83.

    The GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Google Scholar 

  84. 84.

    The ENCODE Project Consortium et al.Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

    CAS  Google Scholar 

  85. 85.

    Kawaji, H., Kasukawa, T., Forrest, A., Carninci, P. & Hayashizaki, Y. The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types. Sci. Data 4, 170113 (2017).

    CAS  Google Scholar 

  86. 86.

    Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015).

    CAS  Google Scholar 

  87. 87.

    Lizio, M. et al. Update of the FANTOM web resource: expansion to provide additional transcriptome atlases. Nucleic Acids Res. 47, D752–D758 (2019).

    CAS  Google Scholar 

  88. 88.

    Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

    Google Scholar 

  89. 89.

    Celniker, S. E. et al. Unlocking the secrets of the genome. Nature 459, 927–930 (2009).

    CAS  Google Scholar 

  90. 90.

    Tarca, A. L., Carey, V. J., Chen, X.-W., Romero, R. & Drăghici, S. Machine learning and its applications to biology. PLoS Comput. Biol. 3, e116 (2007).

    Google Scholar 

  91. 91.

    Chicco, D. Ten quick tips for machine learning in computational biology. BioData Min. 10, 35 (2017).

    Google Scholar 

  92. 92.

    Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).

    CAS  Google Scholar 

  93. 93.

    Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    Google Scholar 

  94. 94.

    Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

    Google Scholar 

  95. 95.

    Koumakis, L. Deep learning models in genomics; are we there yet? Comput. Struct. Biotechnol. J. 18, 1466–1473 (2020).

    CAS  Google Scholar 

  96. 96.

    Zhang, Z., Park, C. Y., Theesfeld, C. L. & Troyanskaya, O. G. An automated framework for efficiently designing deep convolutional neural networks in genomics. Nat. Mach. Intell. 3, 392–400 (2021).

    Google Scholar 

  97. 97.

    Huttenhower, C. & Troyanskaya, O. G. Bayesian data integration: a functional perspective. Comput. Syst. Bioinformatics Conf. 5, 341–351 (2006).

    Google Scholar 

  98. 98.

    Li, Y., Wu, F.-X. & Ngom, A. A review on machine learning principles for multi-view biological data integration. Brief. Bioinform. 19, 325–340 (2016).

    Google Scholar 

  99. 99.

    Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).

    CAS  Google Scholar 

  100. 100.

    Shortliffe, E. H., Buchanan, B. G. & Feigenbaum, E. A. Knowledge engineering for medical decision making: a review of computer-based clinical decision aids. Proc. IEEE 67, 1207–1224 (1979).

    Google Scholar 

  101. 101.

    Shortliffe, E. H. Computer-Based Medical Consultations: MYCIN (Elsevier, 1976).

  102. 102.

    Krogh, A., Saira Mian, I. & Haussler, D. A hidden Markov model that finds genes in E.coli DNA. Nucleic Acids Res. 22, 4768–4778 (1994).

    CAS  Google Scholar 

  103. 103.

    Down, T. A. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002).

    CAS  Google Scholar 

  104. 104.

    Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    CAS  Google Scholar 

  105. 105.

    Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).

    CAS  Google Scholar 

  106. 106.

    Eddy, S. R. Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120 (1995).

    CAS  Google Scholar 

  107. 107.

    Krogh, A., Brown, M., Saira Mian, I., Sjölander, K. & Haussler, D. Hidden Markov models in computational biology. J. Mol. Biol. 235, 1501–1531 (1994).

    CAS  Google Scholar 

  108. 108.

    Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544–548 (1998).

    CAS  Google Scholar 

  109. 109.

    Novichkova, S., Egorov, S. & Daraselia, N. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19, 1699–1706 (2003).

    CAS  Google Scholar 

  110. 110.

    Rzhetsky, A. et al. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J. Biomed. Inform. 37, 43–53 (2004).

    CAS  Google Scholar 

  111. 111.

    Corney, D. P. A., Buxton, B. F., Langdon, W. B. & Jones, D. T. BioRAT: extracting biological information from full-length papers. Bioinformatics 20, 3206–3213 (2004).

    CAS  Google Scholar 

  112. 112.

    Peyvandipour, A., Shafi, A., Saberian, N. & Draghici, S. Identification of cell types from single cell data using stable clustering. Sci. Rep. 10, 12349 (2020).

    CAS  Google Scholar 

  113. 113.

    Qian, N. & Sejnowski, T. J. Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865–884 (1988).

    CAS  Google Scholar 

  114. 114.

    Rost, B. & Sander, C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl Acad. Sci. USA 90, 7558–7562 (1993).

    CAS  Google Scholar 

  115. 115.

    Cheng, J., Saigo, H. & Baldi, P. Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching. Proteins 62, 617–629 (2005).

    Google Scholar 

  116. 116.

    Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    CAS  Google Scholar 

  117. 117.

    Mao, W., Ding, W., Xing, Y. & Gong, H. AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction. Nat. Mach. Intell. 2, 25–33 (2020).

    Google Scholar 

  118. 118.

    El-Naqa, I., Yang, Y., Wernick, M. N., Galatsanos, N. P. & Nishikawa, R. M. A support vector machine approach for detection of microcalcifications. IEEE Trans. Med. Imaging 21, 1552–1563 (2002).

    Google Scholar 

  119. 119.

    Loo, L.-H., Wu, L. F. & Altschuler, S. J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).

    CAS  Google Scholar 

  120. 120.

    Bakal, C., Aach, J., Church, G. & Perrimon, N. Quantitative morphological signatures define local signaling networks regulating cell morphology. Science 316, 1753–1756 (2007).

    CAS  Google Scholar 

  121. 121.

    Jones, T. R. et al. Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning. Proc. Natl Acad. Sci. USA 106, 1826–1831 (2009).

    CAS  Google Scholar 

  122. 122.

    Bray, M.-A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11, 1757–1774 (2016).

    CAS  Google Scholar 

  123. 123.

    Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).

    CAS  Google Scholar 

  124. 124.

    Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Google Scholar 

  125. 125.

    Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).

    CAS  Google Scholar 

  126. 126.

    Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).

    CAS  Google Scholar 

  127. 127.

    Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003).

    CAS  Google Scholar 

  128. 128.

    Poroikov, V. V., Filimonov, D. A., Borodina, Y. V., Lagunin, A. A. & Kos, A. Robustness of biological activity spectra predicting by computer program PASS for noncongeneric sets of chemical compounds. J. Chem. Inf. Comput. Sci. 40, 1349–1355 (2000).

    CAS  Google Scholar 

  129. 129.

    Pakhomov, S. V. S., Buntrock, J. D. & Chute, C. G. Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J. Am. Med. Inform. Assoc. 13, 516–525 (2006).

    Google Scholar 

  130. 130.

    Barrier, A. et al. Colon cancer prognosis prediction by gene expression profiling. Oncogene 24, 6155–6164 (2005).

    CAS  Google Scholar 

  131. 131.

    Colubri, A. et al. Transforming clinical data into actionable prognosis models: machine-learning framework and field-deployable app to predict outcome of ebola patients. PLoS Negl. Trop. Dis. 10, e0004549 (2016).

    Google Scholar 

  132. 132.

    Küffner, R. et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 33, 51–57 (2015).

    Google Scholar 

  133. 133.

    Shipp, M. A. et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002).

    CAS  Google Scholar 

  134. 134.

    Zhang, P., Wang, F., Hu, J. & Sorrentino, R. Towards personalized medicine: leveraging patient similarity and drug similarity analytics. AMIA Jt. Summits Transl. Sci. Proc. 2014, 132–136 (2014).

    Google Scholar 

  135. 135.

    Menden, M. P. et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE 8, e61318 (2013).

    CAS  Google Scholar 

  136. 136.

    Dorman, S. N. et al. Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning. Mol. Oncol. 10, 85–100 (2016).

    CAS  Google Scholar 

  137. 137.

    Rudovic, O., Lee, J., Dai, M., Schuller, B. & Picard, R. W. Personalized machine learning for robot perception of affect and engagement in autism therapy. Sci. Robot. 3, eaao6760 (2018).

    Google Scholar 

  138. 138.

    Shendure, J., Mitra, R. D., Varma, C. & Church, G. M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).

    CAS  Google Scholar 

  139. 139.

    Libbrecht, M. W. et al. A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types. Genome Biol. 20, 180 (2019).

    CAS  Google Scholar 

  140. 140.

    Krogh, A. Hidden Markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 15001–1531 (1993).

    Google Scholar 

  141. 141.

    Cheng, J., Tegge, A. N. & Baldi, P. Machine learning methods for protein structure prediction. IEEE Rev. Biomed. Eng. 1, 41–49 (2008).

    Google Scholar 

  142. 142.

    Sato, K., Hamada, M., Asai, K. & Mituyama, T. CENTROIDFOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res. 37, W277–W280 (2009).

    CAS  Google Scholar 

  143. 143.

    Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).

    Google Scholar 

  144. 144.

    Wang, J., Cao, H., Zhang, J. Z. H. & Qi, Y. Computational protein design with deep learning neural networks. Sci. Rep. 8, 6349 (2018).

    Google Scholar 

  145. 145.

    McQuin, C. et al. CellProfiler 3.0: Next-generation image processing for biology. PLoS Biol. 16, e2005970 (2018).

    Google Scholar 

  146. 146.

    Soltanian-Zadeh, H., Rafiee-Rad, F. & D, S. P.-N. Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms. Pattern Recognit. 37, 1973–1986 (2004).

    Google Scholar 

  147. 147.

    Sirinukunwattana, K. et al. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans. Med. Imaging 35, 1196–1206 (2016).

    Google Scholar 

  148. 148.

    Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).

    CAS  Google Scholar 

  149. 149.

    Ngiam, K. Y. & Khor, I. W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262–e273 (2019).

    Google Scholar 

  150. 150.

    Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 19, 1236–1246 (2018).

    Google Scholar 

  151. 151.

    Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).

    CAS  Google Scholar 

  152. 152.

    Nguyen, T., Tagett, R., Diaz, D. & Draghici, S. A novel approach for data integration and disease subtyping. Genome Res. 27, 2025–2039 (2017).

    CAS  Google Scholar 

  153. 153.

    Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    CAS  Google Scholar 

  154. 154.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  Google Scholar 

  155. 155.

    Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15, 285–289 (2005).

    CAS  Google Scholar 

  156. 156.

    Tanevski, J. et al. Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data. Life Sci. Alliance 3, e202000867 (2020).

    Google Scholar 

  157. 157.

    Choobdar, S. et al. Assessment of network module identification across complex diseases. Nat. Methods 16, 843–852 (2019).

    CAS  Google Scholar 

  158. 158.

    Keilwagen, J., Posch, S. & Grau, J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 20, 9 (2019).

    Google Scholar 

  159. 159.

    Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).

    CAS  Google Scholar 

  160. 160.

    Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955 (2015).

    CAS  Google Scholar 

  161. 161.

    Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    CAS  Google Scholar 

  162. 162.

    Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    CAS  Google Scholar 

  163. 163.

    Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    CAS  Google Scholar 

  164. 164.

    Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    CAS  Google Scholar 

  165. 165.

    The ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

    Google Scholar 

  166. 166.

    Arloth, J. et al. DeepWAS: Multivariate genotype-phenotype associations by directly integrating regulatory information using deep learning. PLoS Comput. Biol. 16, e1007616 (2020).

    CAS  Google Scholar 

  167. 167.

    Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).

    CAS  Google Scholar 

  168. 168.

    Mostavi, M., Salekin, S. & Huang, Y. Deep-2′-O-Me: Predicting 2′-O-methylation sites by convolutional neural networks. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2018, 2394–2397 (2018).

    Google Scholar 

  169. 169.

    Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    CAS  Google Scholar 

  170. 170.

    Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).

    CAS  Google Scholar 

  171. 171.

    Zhang, Z. et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods 16, 307–310 (2019).

    CAS  Google Scholar 

  172. 172.

    Leung, M. K. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissue-regulated splicing code. Bioinformatics 30, i121–i129 (2014).

    CAS  Google Scholar 

  173. 173.

    Pan, X. & Shen, H.-B. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics 18, 136 (2017).

    Google Scholar 

  174. 174.

    Park, C. Y. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).

    CAS  Google Scholar 

  175. 175.

    Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. Proc. 34th Int. Conf. Mach. Learn. 70, 3145–3153 (2017).

    Google Scholar 

  176. 176.

    Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Rep. 21, 1077–1088 (2017).

    CAS  Google Scholar 

  177. 177.

    Pierson, E. et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol. 11, e1004220 (2015).

    Google Scholar 

  178. 178.

    Magger, O., Waldman, Y. Y., Ruppin, E. & Sharan, R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput. Biol. 8, e1002690 (2012).

    CAS  Google Scholar 

  179. 179.

    Greene, C. S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).

    CAS  Google Scholar 

  180. 180.

    Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).

    CAS  Google Scholar 

  181. 181.

    Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495.e5 (2018).

    CAS  Google Scholar 

  182. 182.

    Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 36, 1091–1099 (2018).

    CAS  Google Scholar 

  183. 183.

    Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366–370 (2016).

    Google Scholar 

  184. 184.

    Chen, X. et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease. Cell Syst. 12, 353–362.e6 (2021).

    CAS  Google Scholar 

  185. 185.

    Fagny, M. et al. Exploring regulation in tissues with eQTL networks. Proc. Natl Acad. Sci. USA 114, E7841–E7850 (2017).

    CAS  Google Scholar 

  186. 186.

    Ozturk, K., Dow, M., Carlin, D. E., Bejar, R. & Carter, H. The emerging potential for network analysis to inform precision cancer medicine. J. Mol. Biol. 430, 2875–2899 (2018).

    CAS  Google Scholar 

  187. 187.

    Prahallad, A. et al. Unresponsiveness of colon cancer to BRAF(V600E) inhibition through feedback activation of EGFR. Nature 483, 100–103 (2012).

    CAS  Google Scholar 

  188. 188.

    Horn, H. et al. NetSig: network-based discovery from cancer genomes. Nat. Methods 15, 61–66 (2018).

    CAS  Google Scholar 

  189. 189.

    Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).

    CAS  Google Scholar 

  190. 190.

    The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).

    Google Scholar 

  191. 191.

    Jackson, M. D. B., Duran-Nebreda, S. & Bassel, G. W. Network-based approaches to quantify multicellular development. J. R. Soc. Interface 14, 20170484 (2017).

    Google Scholar 

  192. 192.

    Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198 (2009).

    CAS  Google Scholar 

  193. 193.

    Gibson, M. C., Patel, A. B., Nagpal, R. & Perrimon, N. The emergence of geometric order in proliferating metazoan epithelia. Nature 442, 1038–1041 (2006).

    CAS  Google Scholar 

  194. 194.

    Wilson, P. C. et al. The single-cell transcriptomic landscape of early human diabetic nephropathy. Proc. Natl Acad. Sci. USA 116, 19619–19625 (2019).

    CAS  Google Scholar 

  195. 195.

    Schafflick, D. et al. Integrated single cell analysis of blood and cerebrospinal fluid leukocytes in multiple sclerosis. Nat. Commun. 11, 247 (2020).

    CAS  Google Scholar 

  196. 196.

    Velmeshev, D. et al. Single-cell genomics identifies cell type–specific molecular changes in autism. Science 364, 685–689 (2019).

    CAS  Google Scholar 

  197. 197.

    Rossi, G., Manfrin, A. & Lutolf, M. P. Progress and potential in organoid research. Nat. Rev. Genet. 19, 671–687 (2018).

    CAS  Google Scholar 

  198. 198.

    Kassis, T., Hernandez-Gordillo, V., Langer, R. & Griffith, L. G. OrgaQuant: human intestinal organoid localization and quantification using deep convolutional neural networks. Sci. Rep. 9, 12479 (2019).

    Google Scholar 

  199. 199.

    Trujillo, C. A. et al. Complex oscillatory waves emerging from cortical organoids model early human brain network development. Cell Stem Cell 25, 558–569.e7 (2019).

    CAS  Google Scholar 

  200. 200.

    Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).

    CAS  Google Scholar 

  201. 201.

    Hawrylycz, M. J. et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489, 391–399 (2012).

    CAS  Google Scholar 

  202. 202.

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    CAS  Google Scholar 

  203. 203.

    Hon, C.-C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

    CAS  Google Scholar 

  204. 204.

    The FANTOM Consortium and the RIKEN PMI and CLST (DGT) A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Google Scholar 

  205. 205.

    Svensson, V. et al. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).

    Google Scholar 

  206. 206.

    Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Google Scholar 

Download references

Acknowledgements

Figures created with Biorender.com. ADAR icon by Emw – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=8761779. This work is supported by NIH/NIDDK grants U24DK100845, UGDK114907 and U2CDK114886 and NIH grant UH3TR002158 to O.G.T. We thank C. Theesfeld for helpful discussion and comments on the manuscript.

Author information

Affiliations

Authors

Contributions

R.S.G.S., A.K.W. and O.G.T. drafted and edited the manuscript.

Corresponding author

Correspondence to Olga G. Troyanskaya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

ConsensusPathDB: http://cpdb.molgen.mpg.de/

Critical Assessment of protein Structure Prediction (CASP): https://predictioncenter.org/

DREAM Challenges: https://dreamchallenges.org/

Genome-scale Integrated Analysis of gene Networks in Tissues (GIANT): http://giant-v2.princeton.edu

GWAS Catalog: https://www.ebi.ac.uk/gwas/

Search Tool for the Retrieval of Interacting Genes/Proteins (STRING): https://string-db.org/

UK Biobank: https://www.ukbiobank.ac.uk

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sealfon, R.S.G., Wong, A.K. & Troyanskaya, O.G. Machine learning methods to model multicellular complexity and tissue specificity. Nat Rev Mater 6, 717–729 (2021). https://doi.org/10.1038/s41578-021-00339-3

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing