Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Decoding disease: from genomes to networks to phenotypes

Abstract

Interpreting the effects of genetic variants is key to understanding individual susceptibility to disease and designing personalized therapeutic approaches. Modern experimental technologies are enabling the generation of massive compendia of human genome sequence data and associated molecular and phenotypic traits, together with genome-scale expression, epigenomics and other functional genomic data. Integrative computational models can leverage these data to understand variant impact, elucidate the effect of dysregulated genes on biological pathways in specific disease and tissue contexts, and interpret disease risk beyond what is feasible with experiments alone. In this Review, we discuss recent developments in machine learning algorithms for genome interpretation and for integrative molecular-level modelling of cells, tissues and organs relevant to disease. More specifically, we highlight existing methods and key challenges and opportunities in identifying specific disease-causing genetic variants and linking them to molecular pathways and, ultimately, to disease phenotypes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Interpreting the other 99% of the genome.
Fig. 2: Learning the regulatory code to predict variant effects.
Fig. 3: Informing quantitative genetics data with network models.
Fig. 4: Systems-level functional interpretation of disease molecular architecture.

References

  1. 1.

    1000 Genomes Project Consortium, et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  2. 2.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Article  CAS  Google Scholar 

  4. 4.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  CAS  Google Scholar 

  5. 5.

    Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005). This paper uses the PhastCons method to give per-base estimates of negative selection within conserved elements using multiple sequence alignments and hidden Markov models.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010). This paper uses the pathogenicity GERP++ method to give nucleotide and element-level constraint scores from profiling substitution rates in multiple sequence alignments.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Gulko, B., Hubisz, M. J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Ramani, R., Krumholz, K., Huang, Y.-F. & Siepel, A. PhastWeb: a web interface for evolutionary conservation scoring of multiple sequence alignments using phastCons and phyloP. Bioinformatics 35, 2320–2322 (2019).

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010). This paper discusses the pathogenicity scoring method PhyloP using multiple sequence alignments and gives the per-base P value for conservation/acceleration scores per clade that reflect divergence from the neutral rate.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. 12.

    Baugh, E. H. et al. Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Res. 44, 2501–2513 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Kobren, S. N., Chazelle, B. & Singh, M. PertInInt: an integrative, analytical approach to rapidly uncover cancer driver genes with perturbed interactions and functionalities. Cell Syst. 11, 63–74.e7 (2020). This paper uses PertInInt to assess protein variants for cancer relevance based on predicting the functional impact on physical interactions between proteins and other proteins, nucleic acids, ions, drugs and other small molecules.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Kobren, S. N. & Singh, M. Systematic domain-based aggregation of protein structures highlights DNA-, RNA- and other ligand-binding positions. Nucleic Acids Res. 47, 582–593 (2019).

    CAS  PubMed  Article  Google Scholar 

  15. 15.

    Ancien, F., Pucci, F., Godfroid, M. & Rooman, M. Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Sci. Rep. 8, 4480 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  16. 16.

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Gerasimavicius, L., Liu, X. & Marsh, J. A. Identification of pathogenic missense mutations using protein stability predictors. Sci. Rep. 10, 15387 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Rodrigues, C. H., Pires, D. E. & Ascher, D. B. DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic Acids Res. 46, W350–W355 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Pires, D. E. V., Ascher, D. B. & Blundell, T. L. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res. 42, W314–W319 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Rodrigues, C. H. M., Myung, Y., Pires, D. E. V. & Ascher, D. B. mCSM-PPI2: predicting the effects of mutations on protein–protein interactions. Nucleic Acids Res. 47, W338–W344 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Li, M., Simonetti, F. L., Goncearenco, A. & Panchenko, A. R. MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions. Nucleic Acids Res. 44, W494–W501 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Dehouck, Y., Kwasigroch, J. M., Rooman, M. & Gilis, D. BeAtMuSiC: prediction of changes in protein–protein binding affinity on mutations. Nucleic Acids Res. 41, W333–W339 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Pires, D. E. V., Blundell, T. L. & Ascher, D. B. mCSM-lig: quantifying the effects of mutations on protein–small molecule affinity in genetic disease and emergence of drug resistance. Sci. Rep. 6, 29575 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Ghersi, D. & Singh, M. Interaction-based discovery of functionally important genes in cancers. Nucleic Acids Res. 42, e18 (2014).

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015). This paper presents DeepSEA, a multitask deep learning model that trains and predicts cell type-specific regulatory factor binding to genomic sequence for >900 features and cell types.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019). This paper, for the first time, finds a significant contribution of non-coding mutations to complex disease risk by demonstrating higher functional impact of de novo mutations from probands with autism compared with siblings, using mutational impacts inferred from deep learning sequence models of transcriptional and post-transcriptional effects.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015). This paper presents, among the first approaches to predict the tissue-specific impact of changes in the non-coding genome without information on evolution or genome annotations, gkm-svm implementing an SVM classifier that uses only sequence k-mers as input.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  36. 36.

    Avsec, Ž. et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Shrikumar, A., Greenside, P. & Kundaje, A. in ICML’17 Proc. 34th Int. Conf. Machine Learning (eds Precup, D. & Teh, Y. W.) 3145–3153 (PMLR, 2017).

  39. 39.

    Binder, A. et al. Morphological and molecular breast cancer profiling through explainable machine learning. Nat. Mach. Intell. 3, 355–366 (2021).

    Article  Google Scholar 

  40. 40.

    Zhang, Z., Park, C. Y., Theesfeld, C. L. & Troyanskaya, O. G. An automated framework for efficiently designing deep convolutional neural networks in genomics. Nat. Mach. Intell. 3, 392–400 (2021).

    Article  Google Scholar 

  41. 41.

    Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

    CAS  PubMed  Article  Google Scholar 

  42. 42.

    Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. 43.

    Wainschtein, P. et al. Recovery of trait heritability from whole genome sequence data. Preprint at bioRxiv https://doi.org/10.1101/588020 (2019).

    Article  Google Scholar 

  44. 44.

    Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Lappalainen, T., Scott, A. J., Brandt, M. & Hall, I. M. Genomic analysis in the age of human genome sequencing. Cell 177, 70–84 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Shendure, J., Findlay, G. M. & Snyder, M. W. Genomic medicine—progress, pitfalls, and promise. Cell 177, 45–57 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    CAS  PubMed  Article  Google Scholar 

  52. 52.

    Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Schurz, H. et al. Evaluating the accuracy of imputation methods in a five-way admixed population. Front. Genet. 10, 34 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. 54.

    Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 55.

    Easton, D. F. et al. Gene-panel sequencing and the prediction of breast-cancer risk. N. Engl. J. Med. 372, 2243–2257 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Robbins, C. M. et al. Copy number and targeted mutational analysis reveals novel somatic events in metastatic prostate tumors. Genome Res. 21, 47–55 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  57. 57.

    Meienberg, J., Bruggmann, R., Oexle, K. & Matyas, G. Clinical sequencing: is WGS the better WES? Hum. Genet. 135, 359–362 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. 58.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. 59.

    French, C. E. et al. Whole genome sequencing reveals that genetic conditions are frequent in intensively ill children. Intensive Care Med. 45, 627–636 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. 60.

    Hou, Y.-C. C. et al. Precision medicine integrating whole-genome sequencing, comprehensive metabolomics, and advanced imaging. Proc. Natl Acad. Sci. USA 117, 3053–3062 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. 61.

    Cassini, T. A. et al. Whole genome sequencing reveals novel IGHMBP2 variant leading to unique cryptic splice-site and Charcot–Marie–Tooth phenotype with early onset symptoms. Mol. Genet. Genom. Med. 7, e00676 (2019).

    Google Scholar 

  62. 62.

    All of Us Research Program Investigators, et al. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).

    Article  Google Scholar 

  63. 63.

    Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  64. 64.

    Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  65. 65.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  66. 66.

    Havrilla, J. M., Pedersen, B. S., Layer, R. M. & Quinlan, A. R. A map of constrained coding regions in the human genome. Nat. Genet. 51, 88–95 (2019).

    CAS  Article  Google Scholar 

  67. 67.

    Eilbeck, K., Quinlan, A. & Yandell, M. Settling the score: variant prioritization and Mendelian disease. Nat. Rev. Genet. 18, 599–612 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  68. 68.

    Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).

    CAS  PubMed  Article  Google Scholar 

  69. 69.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  70. 70.

    Carter, H., Douville, C., Stenson, P. D., Cooper, D. N. & Karchin, R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics 14, S3 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  71. 71.

    Shihab, H. A. et al. Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum. Genomics 8, 11 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  72. 72.

    Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  73. 73.

    Sim, N.-L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  74. 74.

    Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  75. 75.

    Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).

    CAS  PubMed  Article  Google Scholar 

  76. 76.

    Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019). This paper uses the CADD pathogenicity score as an SVM classifier that integrates multiple functional genomic and evolutionary data to predict coding and non-coding variant impacts.

    CAS  PubMed  Article  Google Scholar 

  77. 77.

    Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016). This paper presents the Eigen pathogenicity score as an unsupervised meta-score of non-coding variant fitness impact.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  78. 78.

    Park, J. S. et al. Brain somatic mutations observed in Alzheimer’s disease associated with aging and dysregulation of tau phosphorylation. Nat. Commun. 10, 1–12 (2019).

    Article  CAS  Google Scholar 

  79. 79.

    Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  80. 80.

    Buja, A. et al. Damaging de novo mutations diminish motor skills in children on the autism spectrum. Proc. Natl Acad. Sci. USA 115, E1859–E1866 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  81. 81.

    Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  82. 82.

    Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107–e107 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  83. 83.

    Nair, S., Kim, D. S., Perricone, J. & Kundaje, A. Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts. Bioinformatics 35, i108–i116 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  84. 84.

    Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). This paper describes a novel deep learning framework for functional genomics sequence modelling, which combines neural network models with model interpretation tools to discover high-resolution motif syntax.

    CAS  PubMed  Article  Google Scholar 

  85. 85.

    Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018). This paper uses Basenji as a CNN sequence model that predicts regulatory factor binding and expression based on cap analysis gene expression (CAGE) peak data.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  86. 86.

    Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  87. 87.

    Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, e1003711 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  88. 88.

    Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015). This paper uses DeepBind as a framework of individual CNNs that train on and predict regulatory factor binding to DNA and RNA.

    CAS  PubMed  Article  Google Scholar 

  89. 89.

    Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  90. 90.

    Quang, D. & Xie, X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 166, 40–47 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  91. 91.

    Richter, F. et al. Genomic analyses implicate noncoding de novo variants in congenital heart disease. Nat. Genet. 52, 769–777 (2020). This paper presentes whole-genome sequence analysis of genetic aetiology of congenital heart disease wherein HeartENN, a deep CNN sequence genomic sequence model, is applied to functional impact prediction of de novo non-coding mutations and an excess burden of high-impact mutations is observed in individuals who are affected compared with controls.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  92. 92.

    Qin, Q. & Feng, J. Imputation for transcription factor binding predictions based on deep learning. PLoS Comput. Biol. 13, e1005403 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  93. 93.

    Li, H., Quang, D. & Guan, Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res. 29, 281–292 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  94. 94.

    Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).

    CAS  PubMed  Article  Google Scholar 

  95. 95.

    Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018). This paper presents tissue-specific gene expression prediction from sequence using a deep CNN and a linear model, and application to derive constraint violation score pathogenicity, based on cumulative predicted regulatory impacts in genomic intervals.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  96. 96.

    Dey, K. K. et al. Integrative approaches to improve the informativeness of deep learning models for human complex diseases. Preprint at bioRxiv https://doi.org/10.1101/2020.09.08.288563 (2020).

    Article  Google Scholar 

  97. 97.

    Law, A. J., Kleinman, J. E., Weinberger, D. R. & Weickert, C. S. Disease-associated intronic variants in the ErbB4 gene are related to altered ErbB4 splice-variant expression in the brain in schizophrenia. Hum. Mol. Genet. 16, 129–141 (2006).

    PubMed  Article  Google Scholar 

  98. 98.

    Sangermano, R. et al. ABCA4 midigenes reveal the full splice spectrum of all reported noncanonical splice site variants in Stargardt disease. Genome Res. 28, 100–110 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  99. 99.

    de Jong, V. M. et al. Post-transcriptional control of candidate risk genes for type 1 diabetes by rare genetic variants. Genes. Immun. 14, 58–61 (2012).

    PubMed  Article  CAS  Google Scholar 

  100. 100.

    Cardo, L. F. et al. A Search for SNCA 3′ UTR variants Identified SNP rs356165 as a determinant of disease risk and onset age in Parkinson’s disease. J. Mol. Neurosci. 47, 425–430 (2011).

    PubMed  Article  CAS  Google Scholar 

  101. 101.

    Zuallaert, J. et al. SpliceRover: interpretable convolutional neural networks for improved splice site prediction. Bioinformatics 34, 4180–4188 (2018).

    CAS  PubMed  Article  Google Scholar 

  102. 102.

    Louadi, Z., Oubounyt, M., Tayara, H. & Chong, K. T. Deep splicing code: classifying alternative splicing events using deep learning. Genes 10, 587 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  103. 103.

    Leung, M. K. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissue-regulated splicing code. Bioinformatics 30, i121–i129 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  104. 104.

    Zhang, Y., Liu, X., MacLeod, J. & Liu, J. Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach. BMC Genomics 19, 971 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  105. 105.

    Zeng, Z. & Bromberg, Y. Predicting functional effects of synonymous variants: a systematic review and perspectives. Front. Genet. 10, 914 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  106. 106.

    Barash, Y. et al. Deciphering the splicing code. Nature 465, 53–59 (2010).

    CAS  PubMed  Article  Google Scholar 

  107. 107.

    Paggi, J. M. & Bejerano, G. A sequence-based, deep learning model accurately predicts RNA splicing branchpoints. RNA 24, 1647–1658 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  108. 108.

    Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).

    CAS  PubMed  Article  Google Scholar 

  109. 109.

    Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    CAS  PubMed  Article  Google Scholar 

  110. 110.

    Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  111. 111.

    Ray, T. A. et al. Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease. Nat. Commun. 11, 3328 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  112. 112.

    Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49, 1731–1740 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  113. 113.

    Gupta, I. et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat. Biotechnol. 36, 1197–1202 (2018).

    CAS  Article  Google Scholar 

  114. 114.

    Hardwick, S. A., Joglekar, A., Flicek, P., Frankish, A. & Tilgner, H. U. Getting the entire message: progress in isoform sequencing. Front. Genet. 10, 709 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  115. 115.

    Pan, X. & Shen, H.-B. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinforma. 18, 1–14 (2017).

    Article  CAS  Google Scholar 

  116. 116.

    Pan, X., Rijnbeek, P., Yan, J. & Shen, H.-B. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics 19, 1–11 (2018).

    Article  CAS  Google Scholar 

  117. 117.

    Pan, X. & Shen, H.-B. Predicting RNA–protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 34, 3427–3436 (2018).

    CAS  PubMed  Article  Google Scholar 

  118. 118.

    Yu, H., Wang, J., Sheng, Q., Liu, Q. & Shyr, Y. beRBP: binding estimation for human RNA-binding proteins. Nucleic Acids Res. 47, e26–e26 (2018).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  119. 119.

    Zhang, Z. et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods 16, 307–310 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  120. 120.

    Wen, M., Cong, P., Zhang, Z., Lu, H. & Li, T. DeepMirTar: a deep-learning approach for predicting human miRNA targets. Bioinformatics 34, 3781–3787 (2018).

    CAS  PubMed  Article  Google Scholar 

  121. 121.

    Kang, Q., Meng, J., Cui, J., Luan, Y. & Chen, M. PmliPred: a method based on hybrid model and fuzzy decision for plant miRNA–lncRNA interaction prediction. Bioinformatics 36, 2986–2992 (2020).

    CAS  PubMed  Article  Google Scholar 

  122. 122.

    Park, C. Y. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  123. 123.

    Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

    CAS  PubMed  Article  Google Scholar 

  124. 124.

    Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).

    CAS  PubMed  Article  Google Scholar 

  125. 125.

    Rehm, H. L. et al. ClinGen—the clinical genome resource. N. Engl. J. Med. 372, 2235–2242 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  126. 126.

    Harrison, S. M. et al. Clinical laboratories collaborate to resolve differences in variant interpretations submitted to ClinVar. Genet. Med. 19, 1096–1104 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  127. 127.

    Stenson, P. D. et al. The human gene mutation database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 1197–1207 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  128. 128.

    Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255 (2017).

    PubMed  Article  Google Scholar 

  129. 129.

    Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  130. 130.

    Oughtred, R. et al. The BioGRID database: a comprehensive biomedical resource of curated protein, genetic and chemical interactions. Protein Sci. 30, 187–200 (2021).

    CAS  PubMed  Article  Google Scholar 

  131. 131.

    Gelman, H. et al. Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation. Genome Med. 11, 85 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  132. 132.

    Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2015). This paper presents a pathogenicity scoring method, which is a deep learning (CNN) version of CADD, for coding and non-coding variant fitness impact.

    CAS  PubMed  Article  Google Scholar 

  133. 133.

    Davis, C. A. et al. The Encyclopedia of DNA Elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018).

    CAS  PubMed  Article  Google Scholar 

  134. 134.

    Gronau, I., Arbiza, L., Mohammed, J. & Siepel, A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol. Biol. Evol. 30, 1159–1171 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  135. 135.

    Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017). This paper uses a pathogenicity scoring method, LINSIGHT, to predict fitness consequences of non-coding human variation using linear modelling of functional genomic data with a probabilistic model of molecular evolution.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  136. 136.

    Ernst, C. et al. Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics. BMC Med. Genomics 11, 35 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  137. 137.

    Hart, S. N. et al. Comprehensive annotation of BRCA1 and BRCA2 missense variants by functionally validated sequence-based computational prediction models. Genet. Med. 21, 71–80 (2019).

    CAS  PubMed  Article  Google Scholar 

  138. 138.

    Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  139. 139.

    Kim, S. S. et al. Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease. Nat. Commun. 11, 6258 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  140. 140.

    GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  CAS  Google Scholar 

  141. 141.

    Consortium, G. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  CAS  Google Scholar 

  142. 142.

    Deutsch, E. W. et al. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 45, D1100–D1106 (2017).

    CAS  PubMed  Article  Google Scholar 

  143. 143.

    Schwenk, J. M. et al. The human plasma proteome draft of 2017: building on the human plasma peptideatlas from mass spectrometry and complementary assays. J. Proteome Res. 16, 4299–4310 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  144. 144.

    Huttenhower, C. et al. Exploring the human genome with functional maps. Genome Res. 19, 1093–1106 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  145. 145.

    Troyanskaya, O. G., Dolinski, K., Owen, A. B., Altman, R. B. & Botstein, D. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl Acad. Sci. USA 100, 8348–8353 (2003).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  146. 146.

    Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C. & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9, S4 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  147. 147.

    Snel, B., Lehmann, G., Bork, P. & Huynen, M. A. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 28, 3442–3444 (2000).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  148. 148.

    Greene, C. S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  149. 149.

    Wong, A. K., Krishnan, A. & Troyanskaya, O. G. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res. 46, W65–W70 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  150. 150.

    Pierson, E. et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol. 11, e1004220 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  151. 151.

    Keller, M. P. et al. A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility. Genome Res. 18, 706–716 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  152. 152.

    Dobrin, R. et al. Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol. 10, R55 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  153. 153.

    Magger, O., Waldman, Y. Y., Ruppin, E. & Sharan, R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput. Biol. 8, e1002690 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  154. 154.

    Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol.36, 1091–1099 (2018).

    CAS  Article  Google Scholar 

  155. 155.

    Roussarie, J.-P. et al. Selective neuronal vulnerability in Alzheimer’s disease: a network-based analysis. Neuron 107, 821–835.e12 (2020).

    CAS  PubMed  Article  Google Scholar 

  156. 156.

    Goya, J. et al. FNTM: a server for predicting functional networks of tissues in mouse. Nucleic Acids Res. 43, W182–W187 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  157. 157.

    Ledo, J. H. et al. Lack of a site-specific phosphorylation of Presenilin 1 disrupts microglial gene networks and progenitors during development. PLoS ONE 15, e0237773 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  158. 158.

    Huang, J. K. et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495.e5 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  159. 159.

    Kamburov, A., Wierling, C., Lehrach, H. & Herwig, R. ConsensusPathDB—a database for integrating human functional interaction networks. Nucleic Acids Res. 37, D623–D628 (2009).

    CAS  PubMed  Article  Google Scholar 

  160. 160.

    Califano, A., Butte, A. J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat. Genet. 44, 841–847 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  161. 161.

    Schaefer, R. J. et al. Integrating coexpression networks with GWAS to prioritize causal genes in maize. Plant. Cell 30, 2922–2942 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  162. 162.

    Novarino, G. et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343, 506–511 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  163. 163.

    Leiserson, M. D. M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).

    CAS  PubMed  Article  Google Scholar 

  164. 164.

    Ruffalo, M., Koyutürk, M. & Sharan, R. Network-based integration of disparate omic data to identify ‘silent players’ in cancer. PLoS Comput. Biol. 11, e1004595 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  165. 165.

    Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).

    CAS  PubMed  Article  Google Scholar 

  166. 166.

    Reyna, M. A., Leiserson, M. D. M. & Raphael, B. J. Hierarchical HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 34, i972–i980 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  167. 167.

    Creixell, P. et al. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell 163, 202–217 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  168. 168.

    Creixell, P. et al. Pathway and network analysis of cancer genomes. Nat. Methods 12, 615–621 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  169. 169.

    Horn, H. et al. NetSig: network-based discovery from cancer genomes. Nat. Methods 15, 61–66 (2018).

    CAS  PubMed  Article  Google Scholar 

  170. 170.

    Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  171. 171.

    Vanunu, O., Magger, O., Ruppin, E., Shlomi, T. & Sharan, R. Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6, e1000641 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  172. 172.

    Aerts, S. et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537–544 (2006).

    CAS  PubMed  Article  Google Scholar 

  173. 173.

    Köhler, S., Bauer, S., Horn, D. & Robinson, P. N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82, 949–958 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  174. 174.

    Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  175. 175.

    Lage, K. et al. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc. Natl Acad. Sci. USA 105, 20870–20875 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  176. 176.

    Winter, E. E., Goodstadt, L. & Ponting, C. P. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14, 54–61 (2004).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  177. 177.

    Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  178. 178.

    Chikina, M. D. & Troyanskaya, O. G. Accurate quantification of functional analogy among close homologs. PLoS Comput. Biol. 7, e1001074 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  179. 179.

    Guan, Y., Ackert-Bicknell, C. L., Kell, B., Troyanskaya, O. G. & Hibbs, M. A. Functional genomics complements quantitative genetics in identifying disease-gene associations. PLoS Comput. Biol. 6, e1000991 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  180. 180.

    Swarup, V. et al. Identification of evolutionarily conserved gene networks mediating neurodegenerative dementia. Nat. Med. 25, 152–164 (2019).

    CAS  PubMed  Article  Google Scholar 

  181. 181.

    Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 7 (2005).

    Article  Google Scholar 

  182. 182.

    Parikshak, N. N., Gandal, M. J. & Geschwind, D. H. Systems biology and gene networks in neurodevelopmental and neurodegenerative disorders. Nat. Rev. Genet. 16, 441–458 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  183. 183.

    Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  184. 184.

    Choobdar, S. et al. Assessment of network module identification across complex diseases. Nat. Methods 16, 843–852 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  185. 185.

    Lizio, M. et al. Update of the FANTOM web resource: expansion to provide additional transcriptome atlases. Nucleic Acids Res. 47, D752–D758 (2019).

    CAS  PubMed  Article  Google Scholar 

  186. 186.

    Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  187. 187.

    Lindsay, S. J. et al. HDBR expression: a unique resource for global and individual gene expression studies during early human brain development. Front. Neuroanat. 10, 86 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  188. 188.

    Zhang, Y. et al. Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach. BMC Genomics 19, 971 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  189. 189.

    Mishra, A. & Macgregor, S. VEGAS2: software for more flexible gene-based testing. Twin Res. Hum. Genet. 18, 86–91 (2015).

    PubMed  Article  Google Scholar 

Download references

Acknowledgements

This work is supported by National Institutes of Health (NIH)/National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) grants U24DK100845, UGDK114907 and U2CDK114886 and NIH grant UH3TR002158 to O.G.T. The authors thank C. Park for helpful discussions.

Author information

Affiliations

Authors

Contributions

The authors contributed to all aspects of the article.

Corresponding author

Correspondence to Olga G. Troyanskaya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Deep learning

A family of machine learning approaches involving multilayer models composed of interconnected nodes, where the output of each node in the model is a function of its inputs.

Variant effect

The biochemical or phenotypic impact of a genetic variant relative to a reference allele.

Deleterious

The attribute of an allele as it relates to phenotypic impact; this can be through decreased organismal fitness that is often associated with increased disease risk.

Support vector machine

(SVM). A standard supervised machine learning approach that identifies the hyperplane (dividing line in high-dimensional space) that optimally separates positive examples from negative examples.

k-mers

Short lengths of nucleic acid used in computational algorithms, oligomers of ‘k’ length, in bases.

Convolutional neural network

(CNN). A class of deep learning models that use structure in the input data (for example, adjacencies of pixels in an image or of base pairs in a sequence) to inform connections between nodes of the model. Successive outputs often model features at increasing spatial scales (for example, for sequence models: sequence → motifs → larger sequence contexts).

Sequence models

A deep learning framework that models the relationship between genetic sequences and properties influencing gene regulation.

Recurrent neural network

(RNN). A type of neural network in which learning of inputs is influenced by past instances of input examples, and so output varies depending on the sequence of inputs. For example, in speech recognition, applying the context of prior words is useful for determining the meaning of a new word. (Hidden layers pass weights to input information based on previously learned examples.)

Endophenotypic

An aspect of a complex trait that may be more experimentally measurable than the entire complex trait and may be closer to an underlying biological process. For example, educational attainment is an endophenotype examined in the study of autism genetics because it is a readily measurable trait associated with autism, and expression levels of insulin receptors are endophenotypes contributing to type 2 diabetes.

Non-coding genome

The portion of the genome that does not encode proteins, which comprises more than 98% of the total human genome length.

Probands

In a genetic study, individuals with the disease (typically, the particular affected individuals within families who are the starting points for genetic analyses).

Tissue-specific network

A network that captures relationships between genes that participate in similar biological processes for a particular tissue or cell type.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wong, A.K., Sealfon, R.S.G., Theesfeld, C.L. et al. Decoding disease: from genomes to networks to phenotypes. Nat Rev Genet (2021). https://doi.org/10.1038/s41576-021-00389-x

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing