Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Machine learning methods to model multicellular complexity and tissue specificity

Abstract

Experimental approaches to study tissue specificity enable insight into the nature and organization of the cell types and tissues that constitute complex multicellular organisms. Machine learning provides a powerful tool to investigate and interpret tissue-specific experimental data. In this Review, we first provide a brief introduction to key single-cell and whole-tissue approaches that allow investigation of tissue specificity and then highlight two classes of machine-learning-based methods, which can be applied to analyse, model and interpret these experimental data. Deep learning methods can predict tissue-dependent effects of individual mutations on gene expression, alternative splicing and disease phenotypes. Network-based approaches can capture relationships between biomolecules, integrate large heterogeneous data compendia to model molecular circuits and identify tissue-specific functional relationships and regulatory connections. We conclude with an outlook to future possibilities in examining multicellular complexity by combining high-resolution, large-scale multiomics data sets and interpretable machine learning models.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Decoding multicellular complexity.
Fig. 2: Machine learning approaches.
Fig. 3: Decoding the tissue-specific impact of sequence variants.
Fig. 4: The GIANT framework for predicting tissue-specific functional networks.
Fig. 5: Applications of tissue-specific networks.

Similar content being viewed by others

References

  1. Mazzarello, P. A unifying concept: the history of cell theory. Nat. Cell Biol. 1, E13–E15 (1999).

    Article  CAS  Google Scholar 

  2. Willensdorfer, M. On the evolution of differentiated multicellularity. Evolution 63, 306–323 (2009).

    Article  Google Scholar 

  3. Ispolatov, I., Ackermann, M. & Doebeli, M. Division of labour and the evolution of multicellularity. Proc. R. Soc. B 279, 1768–1776 (2012).

    Article  Google Scholar 

  4. Long, F., Peng, H., Liu, X., Kim, S. K. & Myers, E. A 3D digital atlas of C. elegans and its application to single-cell analyses. Nat. Methods 6, 667–672 (2009).

    Article  CAS  Google Scholar 

  5. Sulston, J. E. & Horvitz, H. R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev. Biol. 56, 110–156 (1977).

    Article  CAS  Google Scholar 

  6. Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).

    Article  CAS  Google Scholar 

  7. Woodhouse, R. M. & Ashe, A. How do histone modifications contribute to transgenerational epigenetic inheritance in C. elegans? Biochem. Soc. Trans. 48, 1019–1034 (2020).

    Article  CAS  Google Scholar 

  8. Fernandez, R. W. et al. Cellular expression and functional roles of all 26 neurotransmitter GPCRs in the C. elegans egg-laying circuit. J. Neurosci. 40, 7475–7488 (2020).

    Article  CAS  Google Scholar 

  9. Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020).

    Article  CAS  Google Scholar 

  10. Kim-Hellmuth, S. et al. Cell type–specific genetic regulation of gene expression across human tissues. Science 369, eaaz8528 (2020).

    Article  CAS  Google Scholar 

  11. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    Article  CAS  Google Scholar 

  12. Poulin, J.-F., Tasic, B., Hjerling-Leffler, J., Trimarchi, J. M. & Awatramani, R. Disentangling neural cell diversity using single-cell transcriptomics. Nat. Neurosci. 19, 1131–1141 (2016).

    Article  Google Scholar 

  13. Consortium, T. T. M. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Article  Google Scholar 

  14. Park, J. et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science 360, 758–763 (2018).

    Article  CAS  Google Scholar 

  15. Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, eaaq1723 (2018).

    Article  Google Scholar 

  16. Kashima, Y. et al. Single-cell sequencing techniques from individual to multiomics analyses. Exp. Mol. Med. 52, 1419–1427 (2020).

    Article  CAS  Google Scholar 

  17. Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).

    Article  CAS  Google Scholar 

  18. Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).

    Article  CAS  Google Scholar 

  19. Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).

    Article  CAS  Google Scholar 

  20. Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).

    Article  CAS  Google Scholar 

  21. Eng, C.-H. L., Shah, S., Thomassie, J. & Cai, L. Profiling the transcriptome with RNA SPOTs. Nat. Methods 14, 1153–1155 (2017).

    Article  CAS  Google Scholar 

  22. Larsson, L., Frisén, J. & Lundeberg, J. Spatially resolved transcriptomics adds a new dimension to genomics. Nat. Methods 18, 15–18 (2021).

    Article  CAS  Google Scholar 

  23. Guo, H. et al. Profiling DNA methylome landscapes of mammalian cells with single-cell reduced-representation bisulfite sequencing. Nat. Protoc. 10, 645–659 (2015).

    Article  CAS  Google Scholar 

  24. Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).

    Article  CAS  Google Scholar 

  25. Angermueller, C. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods 13, 229–232 (2016).

    Article  CAS  Google Scholar 

  26. Clark, S. J. et al. Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq). Nat. Protoc. 12, 534–547 (2017).

    Article  CAS  Google Scholar 

  27. Grosselin, K. et al. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer. Nat. Genet. 51, 1060–1066 (2019).

    Article  CAS  Google Scholar 

  28. Kelsey, G., Stegle, O. & Reik, W. Single-cell epigenomics: Recording the past and predicting the future. Science 358, 69–75 (2017).

    Article  CAS  Google Scholar 

  29. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Article  CAS  Google Scholar 

  30. Hughes, A. J. et al. Single-cell western blotting. Nat. Methods 11, 749–755 (2014).

    Article  CAS  Google Scholar 

  31. Budnik, B., Levy, E., Harmange, G. & Slavov, N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018).

    Article  Google Scholar 

  32. Lee, J., Hyeon, D. Y. & Hwang, D. Single-cell multiomics: technologies and data analysis methods. Exp. Mol. Med. 52, 1428–1442 (2020).

    Article  CAS  Google Scholar 

  33. Ando, Y., Kwon, A. T.-J. & Shin, J. W. An era of single-cell genomics consortia. Exp. Mol. Med. 52, 1409–1418 (2020).

    Article  CAS  Google Scholar 

  34. Petegrosso, R., Li, Z. & Kuang, R. Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Brief. Bioinform. 21, 1209–1223 (2019).

    Article  Google Scholar 

  35. Efremova, M. & Teichmann, S. A. Computational methods for single-cell omics across modalities. Nat. Methods 17, 14–17 (2020).

    Article  CAS  Google Scholar 

  36. Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).

  37. Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    Article  CAS  Google Scholar 

  38. Yao, V., Wong, A. & Troyanskaya, O. Enabling precision medicine through integrative network models. J. Mol. Biol. 430, 2913–2923 (2018).

    Article  CAS  Google Scholar 

  39. Bumgarner, R. Overview of DNA microarrays: types, applications, and their future. Curr. Protoc. Mol. Biol. 101, 22.1.1–22.1.11 (2013).

    Article  Google Scholar 

  40. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).

    Article  CAS  Google Scholar 

  41. Wen, X. et al. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA 95, 334–339 (1998).

    Article  CAS  Google Scholar 

  42. Alon, U. et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA 96, 6745–6750 (1999).

    Article  CAS  Google Scholar 

  43. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  44. Wold, B. & Myers, R. M. Sequence census methods for functional genomics. Nat. Methods 5, 19–21 (2008).

    Article  CAS  Google Scholar 

  45. Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE 12, e0190152 (2017).

    Article  Google Scholar 

  46. Hrdlickova, R., Toloue, M. & Tian, B. RNA-Seq methods for transcriptome analysis. Wiley Interdiscip. Rev. RNA 8, e1364 (2017).

    Article  Google Scholar 

  47. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  Google Scholar 

  48. Brunner, E. et al. A high-quality catalog of the Drosophila melanogaster proteome. Nat. Biotechnol. 25, 576–583 (2007).

    Article  CAS  Google Scholar 

  49. Schrimpf, S. P. et al. Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol. 7, e48 (2009).

    Article  Google Scholar 

  50. Washburn, M. P., Wolters, D. & Yates, J. R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001).

    Article  CAS  Google Scholar 

  51. Chintapalli, V. R., Al Bratty, M., Korzekwa, D., Watson, D. G. & Dow, J. A. T. Mapping an atlas of tissue-specific Drosophila melanogaster metabolomes by high resolution mass spectrometry. PLoS ONE 8, e78066 (2013).

    Article  CAS  Google Scholar 

  52. Stupp, G. S. et al. Isotopic ratio outlier analysis global metabolomics of Caenorhabditis elegans. Anal. Chem. 85, 11858–11865 (2013).

    Article  CAS  Google Scholar 

  53. Davis, S. et al. Expanding proteome coverage with CHarge Ordered Parallel Ion aNalysis (CHOPIN) combined with broad specificity proteolysis. J. Proteome Res. 16, 1288–1299 (2017).

    Article  CAS  Google Scholar 

  54. Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599.e4 (2017).

    Article  CAS  Google Scholar 

  55. Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).

    Article  CAS  Google Scholar 

  56. Nagaraj, N. et al. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 7, 548 (2011).

    Article  Google Scholar 

  57. Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).

    Article  Google Scholar 

  58. Darmanis, S. et al. A survey of human brain transcriptome diversity at the single cell level. Proc. Natl Acad. Sci. USA 112, 7285–7290 (2015).

    Article  CAS  Google Scholar 

  59. Menon, R. et al. Single cell transcriptomics identifies focal segmental glomerulosclerosis remission endothelial biomarker. JCI Insight 5, e133267 (2020).

    Article  Google Scholar 

  60. Lake, B. B. et al. A single-nucleus RNA-sequencing pipeline to decipher the molecular anatomy and pathophysiology of human kidneys. Nat. Commun. 10, 2832 (2019).

    Article  Google Scholar 

  61. Schiller, H. B. et al. The human lung cell atlas: a high-resolution reference map of the human lung in health and disease. Am. J. Respir. Cell Mol. Biol. 61, 31–41 (2019).

    Article  CAS  Google Scholar 

  62. Ding, J. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38, 737–746 (2020).

    Article  CAS  Google Scholar 

  63. Bakken, T. E. et al. Single-nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS ONE 13, e0209648 (2018).

    Article  Google Scholar 

  64. Wu, H., Kirita, Y., Donnelly, E. L. & Humphreys, B. D. Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J. Am. Soc. Nephrol. 30, 23–32 (2019).

    Article  CAS  Google Scholar 

  65. Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

    Article  CAS  Google Scholar 

  66. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Article  Google Scholar 

  67. Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).

    Article  CAS  Google Scholar 

  68. Schwartzman, O. & Tanay, A. Single-cell epigenomics: techniques and emerging applications. Nat. Rev. Genet. 16, 716–726 (2015).

    Article  CAS  Google Scholar 

  69. Kelly, R. T. Single-cell proteomics: progress and prospects. Mol. Cell. Proteom. 19, 1739–1748 (2020).

    Article  CAS  Google Scholar 

  70. Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165–1172 (2015).

    Article  CAS  Google Scholar 

  71. Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).

    Article  Google Scholar 

  72. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).

    Article  Google Scholar 

  73. Doerr, A. Single-cell proteomics. Nat. Methods 16, 20 (2019).

    Article  CAS  Google Scholar 

  74. Cong, Y. et al. Ultrasensitive single-cell proteomics workflow identifies >1000 protein groups per mammalian cell. Chem. Sci. 12, 1001–1006 (2021).

    Article  CAS  Google Scholar 

  75. Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).

    Article  CAS  Google Scholar 

  76. Zhu, C., Preissl, S. & Ren, B. Single-cell multimodal omics: the power of many. Nat. Methods 17, 11–14 (2020).

    Article  CAS  Google Scholar 

  77. Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol. 38, 1007–1022 (2020).

    Article  CAS  Google Scholar 

  78. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    Article  CAS  Google Scholar 

  79. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    Article  CAS  Google Scholar 

  80. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    Article  CAS  Google Scholar 

  81. Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063–1070 (2019).

    Article  CAS  Google Scholar 

  82. Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  CAS  Google Scholar 

  83. The GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  Google Scholar 

  84. The ENCODE Project Consortium et al.Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

    Article  CAS  Google Scholar 

  85. Kawaji, H., Kasukawa, T., Forrest, A., Carninci, P. & Hayashizaki, Y. The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types. Sci. Data 4, 170113 (2017).

    Article  CAS  Google Scholar 

  86. Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015).

    Article  CAS  Google Scholar 

  87. Lizio, M. et al. Update of the FANTOM web resource: expansion to provide additional transcriptome atlases. Nucleic Acids Res. 47, D752–D758 (2019).

    Article  CAS  Google Scholar 

  88. Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

    Article  Google Scholar 

  89. Celniker, S. E. et al. Unlocking the secrets of the genome. Nature 459, 927–930 (2009).

    Article  CAS  Google Scholar 

  90. Tarca, A. L., Carey, V. J., Chen, X.-W., Romero, R. & Drăghici, S. Machine learning and its applications to biology. PLoS Comput. Biol. 3, e116 (2007).

    Article  Google Scholar 

  91. Chicco, D. Ten quick tips for machine learning in computational biology. BioData Min. 10, 35 (2017).

    Article  Google Scholar 

  92. Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).

    Article  CAS  Google Scholar 

  93. Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    Article  Google Scholar 

  94. Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

    Article  Google Scholar 

  95. Koumakis, L. Deep learning models in genomics; are we there yet? Comput. Struct. Biotechnol. J. 18, 1466–1473 (2020).

    Article  CAS  Google Scholar 

  96. Zhang, Z., Park, C. Y., Theesfeld, C. L. & Troyanskaya, O. G. An automated framework for efficiently designing deep convolutional neural networks in genomics. Nat. Mach. Intell. 3, 392–400 (2021).

    Article  Google Scholar 

  97. Huttenhower, C. & Troyanskaya, O. G. Bayesian data integration: a functional perspective. Comput. Syst. Bioinformatics Conf. 5, 341–351 (2006).

    Article  Google Scholar 

  98. Li, Y., Wu, F.-X. & Ngom, A. A review on machine learning principles for multi-view biological data integration. Brief. Bioinform. 19, 325–340 (2016).

    Google Scholar 

  99. Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).

    Article  CAS  Google Scholar 

  100. Shortliffe, E. H., Buchanan, B. G. & Feigenbaum, E. A. Knowledge engineering for medical decision making: a review of computer-based clinical decision aids. Proc. IEEE 67, 1207–1224 (1979).

    Article  Google Scholar 

  101. Shortliffe, E. H. Computer-Based Medical Consultations: MYCIN (Elsevier, 1976).

  102. Krogh, A., Saira Mian, I. & Haussler, D. A hidden Markov model that finds genes in E.coli DNA. Nucleic Acids Res. 22, 4768–4778 (1994).

    Article  CAS  Google Scholar 

  103. Down, T. A. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002).

    Article  CAS  Google Scholar 

  104. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    Article  CAS  Google Scholar 

  105. Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).

    Article  CAS  Google Scholar 

  106. Eddy, S. R. Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120 (1995).

    CAS  Google Scholar 

  107. Krogh, A., Brown, M., Saira Mian, I., Sjölander, K. & Haussler, D. Hidden Markov models in computational biology. J. Mol. Biol. 235, 1501–1531 (1994).

    Article  CAS  Google Scholar 

  108. Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544–548 (1998).

    Article  CAS  Google Scholar 

  109. Novichkova, S., Egorov, S. & Daraselia, N. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19, 1699–1706 (2003).

    Article  CAS  Google Scholar 

  110. Rzhetsky, A. et al. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J. Biomed. Inform. 37, 43–53 (2004).

    Article  CAS  Google Scholar 

  111. Corney, D. P. A., Buxton, B. F., Langdon, W. B. & Jones, D. T. BioRAT: extracting biological information from full-length papers. Bioinformatics 20, 3206–3213 (2004).

    Article  CAS  Google Scholar 

  112. Peyvandipour, A., Shafi, A., Saberian, N. & Draghici, S. Identification of cell types from single cell data using stable clustering. Sci. Rep. 10, 12349 (2020).

    Article  CAS  Google Scholar 

  113. Qian, N. & Sejnowski, T. J. Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865–884 (1988).

    Article  CAS  Google Scholar 

  114. Rost, B. & Sander, C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl Acad. Sci. USA 90, 7558–7562 (1993).

    Article  CAS  Google Scholar 

  115. Cheng, J., Saigo, H. & Baldi, P. Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching. Proteins 62, 617–629 (2005).

    Article  Google Scholar 

  116. Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    Article  CAS  Google Scholar 

  117. Mao, W., Ding, W., Xing, Y. & Gong, H. AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction. Nat. Mach. Intell. 2, 25–33 (2020).

    Article  Google Scholar 

  118. El-Naqa, I., Yang, Y., Wernick, M. N., Galatsanos, N. P. & Nishikawa, R. M. A support vector machine approach for detection of microcalcifications. IEEE Trans. Med. Imaging 21, 1552–1563 (2002).

    Article  Google Scholar 

  119. Loo, L.-H., Wu, L. F. & Altschuler, S. J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).

    Article  CAS  Google Scholar 

  120. Bakal, C., Aach, J., Church, G. & Perrimon, N. Quantitative morphological signatures define local signaling networks regulating cell morphology. Science 316, 1753–1756 (2007).

    Article  CAS  Google Scholar 

  121. Jones, T. R. et al. Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning. Proc. Natl Acad. Sci. USA 106, 1826–1831 (2009).

    Article  CAS  Google Scholar 

  122. Bray, M.-A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11, 1757–1774 (2016).

    Article  CAS  Google Scholar 

  123. Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).

    Article  CAS  Google Scholar 

  124. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  125. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).

    Article  CAS  Google Scholar 

  126. Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).

    Article  CAS  Google Scholar 

  127. Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003).

    Article  CAS  Google Scholar 

  128. Poroikov, V. V., Filimonov, D. A., Borodina, Y. V., Lagunin, A. A. & Kos, A. Robustness of biological activity spectra predicting by computer program PASS for noncongeneric sets of chemical compounds. J. Chem. Inf. Comput. Sci. 40, 1349–1355 (2000).

    Article  CAS  Google Scholar 

  129. Pakhomov, S. V. S., Buntrock, J. D. & Chute, C. G. Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J. Am. Med. Inform. Assoc. 13, 516–525 (2006).

    Article  Google Scholar 

  130. Barrier, A. et al. Colon cancer prognosis prediction by gene expression profiling. Oncogene 24, 6155–6164 (2005).

    Article  CAS  Google Scholar 

  131. Colubri, A. et al. Transforming clinical data into actionable prognosis models: machine-learning framework and field-deployable app to predict outcome of ebola patients. PLoS Negl. Trop. Dis. 10, e0004549 (2016).

    Article  Google Scholar 

  132. Küffner, R. et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 33, 51–57 (2015).

    Article  Google Scholar 

  133. Shipp, M. A. et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002).

    Article  CAS  Google Scholar 

  134. Zhang, P., Wang, F., Hu, J. & Sorrentino, R. Towards personalized medicine: leveraging patient similarity and drug similarity analytics. AMIA Jt. Summits Transl. Sci. Proc. 2014, 132–136 (2014).

    Google Scholar 

  135. Menden, M. P. et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE 8, e61318 (2013).

    Article  CAS  Google Scholar 

  136. Dorman, S. N. et al. Genomic signatures for paclitaxel and gemcitabine resistance in breast cancer derived by machine learning. Mol. Oncol. 10, 85–100 (2016).

    Article  CAS  Google Scholar 

  137. Rudovic, O., Lee, J., Dai, M., Schuller, B. & Picard, R. W. Personalized machine learning for robot perception of affect and engagement in autism therapy. Sci. Robot. 3, eaao6760 (2018).

    Article  Google Scholar 

  138. Shendure, J., Mitra, R. D., Varma, C. & Church, G. M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).

    Article  CAS  Google Scholar 

  139. Libbrecht, M. W. et al. A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types. Genome Biol. 20, 180 (2019).

    Article  CAS  Google Scholar 

  140. Krogh, A. Hidden Markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 15001–1531 (1993).

    Google Scholar 

  141. Cheng, J., Tegge, A. N. & Baldi, P. Machine learning methods for protein structure prediction. IEEE Rev. Biomed. Eng. 1, 41–49 (2008).

    Article  Google Scholar 

  142. Sato, K., Hamada, M., Asai, K. & Mituyama, T. CENTROIDFOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res. 37, W277–W280 (2009).

    Article  CAS  Google Scholar 

  143. Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).

    Article  Google Scholar 

  144. Wang, J., Cao, H., Zhang, J. Z. H. & Qi, Y. Computational protein design with deep learning neural networks. Sci. Rep. 8, 6349 (2018).

    Article  Google Scholar 

  145. McQuin, C. et al. CellProfiler 3.0: Next-generation image processing for biology. PLoS Biol. 16, e2005970 (2018).

    Article  Google Scholar 

  146. Soltanian-Zadeh, H., Rafiee-Rad, F. & D, S. P.-N. Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms. Pattern Recognit. 37, 1973–1986 (2004).

    Article  Google Scholar 

  147. Sirinukunwattana, K. et al. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans. Med. Imaging 35, 1196–1206 (2016).

    Article  Google Scholar 

  148. Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).

    Article  CAS  Google Scholar 

  149. Ngiam, K. Y. & Khor, I. W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262–e273 (2019).

    Article  Google Scholar 

  150. Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 19, 1236–1246 (2018).

    Article  Google Scholar 

  151. Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).

    Article  CAS  Google Scholar 

  152. Nguyen, T., Tagett, R., Diaz, D. & Draghici, S. A novel approach for data integration and disease subtyping. Genome Res. 27, 2025–2039 (2017).

    Article  CAS  Google Scholar 

  153. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  Google Scholar 

  154. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  Google Scholar 

  155. Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15, 285–289 (2005).

    Article  CAS  Google Scholar 

  156. Tanevski, J. et al. Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data. Life Sci. Alliance 3, e202000867 (2020).

    Article  Google Scholar 

  157. Choobdar, S. et al. Assessment of network module identification across complex diseases. Nat. Methods 16, 843–852 (2019).

    Article  CAS  Google Scholar 

  158. Keilwagen, J., Posch, S. & Grau, J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 20, 9 (2019).

    Article  Google Scholar 

  159. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).

    Article  CAS  Google Scholar 

  160. Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955 (2015).

    Article  CAS  Google Scholar 

  161. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  Google Scholar 

  162. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    Article  CAS  Google Scholar 

  163. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  CAS  Google Scholar 

  164. Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    Article  CAS  Google Scholar 

  165. The ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

    Article  Google Scholar 

  166. Arloth, J. et al. DeepWAS: Multivariate genotype-phenotype associations by directly integrating regulatory information using deep learning. PLoS Comput. Biol. 16, e1007616 (2020).

    Article  CAS  Google Scholar 

  167. Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).

    Article  CAS  Google Scholar 

  168. Mostavi, M., Salekin, S. & Huang, Y. Deep-2′-O-Me: Predicting 2′-O-methylation sites by convolutional neural networks. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2018, 2394–2397 (2018).

    Google Scholar 

  169. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    Article  CAS  Google Scholar 

  170. Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).

    Article  CAS  Google Scholar 

  171. Zhang, Z. et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods 16, 307–310 (2019).

    Article  CAS  Google Scholar 

  172. Leung, M. K. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissue-regulated splicing code. Bioinformatics 30, i121–i129 (2014).

    Article  CAS  Google Scholar 

  173. Pan, X. & Shen, H.-B. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics 18, 136 (2017).

    Article  Google Scholar 

  174. Park, C. Y. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).

    Article  CAS  Google Scholar 

  175. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. Proc. 34th Int. Conf. Mach. Learn. 70, 3145–3153 (2017).

    Google Scholar 

  176. Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Rep. 21, 1077–1088 (2017).

    Article  CAS  Google Scholar 

  177. Pierson, E. et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol. 11, e1004220 (2015).

    Article  Google Scholar 

  178. Magger, O., Waldman, Y. Y., Ruppin, E. & Sharan, R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput. Biol. 8, e1002690 (2012).

    Article  CAS  Google Scholar 

  179. Greene, C. S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).

    Article  CAS  Google Scholar 

  180. Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).

    Article  CAS  Google Scholar 

  181. Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495.e5 (2018).

    Article  CAS  Google Scholar 

  182. Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 36, 1091–1099 (2018).

    Article  CAS  Google Scholar 

  183. Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366–370 (2016).

    Article  Google Scholar 

  184. Chen, X. et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease. Cell Syst. 12, 353–362.e6 (2021).

    Article  CAS  Google Scholar 

  185. Fagny, M. et al. Exploring regulation in tissues with eQTL networks. Proc. Natl Acad. Sci. USA 114, E7841–E7850 (2017).

    Article  CAS  Google Scholar 

  186. Ozturk, K., Dow, M., Carlin, D. E., Bejar, R. & Carter, H. The emerging potential for network analysis to inform precision cancer medicine. J. Mol. Biol. 430, 2875–2899 (2018).

    Article  CAS  Google Scholar 

  187. Prahallad, A. et al. Unresponsiveness of colon cancer to BRAF(V600E) inhibition through feedback activation of EGFR. Nature 483, 100–103 (2012).

    Article  CAS  Google Scholar 

  188. Horn, H. et al. NetSig: network-based discovery from cancer genomes. Nat. Methods 15, 61–66 (2018).

    Article  CAS  Google Scholar 

  189. Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).

    Article  CAS  Google Scholar 

  190. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).

    Article  Google Scholar 

  191. Jackson, M. D. B., Duran-Nebreda, S. & Bassel, G. W. Network-based approaches to quantify multicellular development. J. R. Soc. Interface 14, 20170484 (2017).

    Article  Google Scholar 

  192. Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198 (2009).

    Article  CAS  Google Scholar 

  193. Gibson, M. C., Patel, A. B., Nagpal, R. & Perrimon, N. The emergence of geometric order in proliferating metazoan epithelia. Nature 442, 1038–1041 (2006).

    Article  CAS  Google Scholar 

  194. Wilson, P. C. et al. The single-cell transcriptomic landscape of early human diabetic nephropathy. Proc. Natl Acad. Sci. USA 116, 19619–19625 (2019).

    Article  CAS  Google Scholar 

  195. Schafflick, D. et al. Integrated single cell analysis of blood and cerebrospinal fluid leukocytes in multiple sclerosis. Nat. Commun. 11, 247 (2020).

    Article  CAS  Google Scholar 

  196. Velmeshev, D. et al. Single-cell genomics identifies cell type–specific molecular changes in autism. Science 364, 685–689 (2019).

    Article  CAS  Google Scholar 

  197. Rossi, G., Manfrin, A. & Lutolf, M. P. Progress and potential in organoid research. Nat. Rev. Genet. 19, 671–687 (2018).

    Article  CAS  Google Scholar 

  198. Kassis, T., Hernandez-Gordillo, V., Langer, R. & Griffith, L. G. OrgaQuant: human intestinal organoid localization and quantification using deep convolutional neural networks. Sci. Rep. 9, 12479 (2019).

    Article  Google Scholar 

  199. Trujillo, C. A. et al. Complex oscillatory waves emerging from cortical organoids model early human brain network development. Cell Stem Cell 25, 558–569.e7 (2019).

    Article  CAS  Google Scholar 

  200. Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).

    Article  CAS  Google Scholar 

  201. Hawrylycz, M. J. et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489, 391–399 (2012).

    Article  CAS  Google Scholar 

  202. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  Google Scholar 

  203. Hon, C.-C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

    Article  CAS  Google Scholar 

  204. The FANTOM Consortium and the RIKEN PMI and CLST (DGT) A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  Google Scholar 

  205. Svensson, V. et al. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).

    Article  Google Scholar 

  206. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

Figures created with Biorender.com. ADAR icon by Emw – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=8761779. This work is supported by NIH/NIDDK grants U24DK100845, UGDK114907 and U2CDK114886 and NIH grant UH3TR002158 to O.G.T. We thank C. Theesfeld for helpful discussion and comments on the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

R.S.G.S., A.K.W. and O.G.T. drafted and edited the manuscript.

Corresponding author

Correspondence to Olga G. Troyanskaya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

ConsensusPathDB: http://cpdb.molgen.mpg.de/

Critical Assessment of protein Structure Prediction (CASP): https://predictioncenter.org/

DREAM Challenges: https://dreamchallenges.org/

Genome-scale Integrated Analysis of gene Networks in Tissues (GIANT): http://giant-v2.princeton.edu

GWAS Catalog: https://www.ebi.ac.uk/gwas/

Search Tool for the Retrieval of Interacting Genes/Proteins (STRING): https://string-db.org/

UK Biobank: https://www.ukbiobank.ac.uk

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sealfon, R.S.G., Wong, A.K. & Troyanskaya, O.G. Machine learning methods to model multicellular complexity and tissue specificity. Nat Rev Mater 6, 717–729 (2021). https://doi.org/10.1038/s41578-021-00339-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41578-021-00339-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing