Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, identify the changing functional roles of genes across tissues and illuminate relationships among diseases. We introduce NetWAS, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than a hundred human tissues and cell types.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Dynamic rewiring of biological activity across genotype and lineage revealed by context-dependent functional interactions
Genome Biology Open Access 29 June 2022
Communications Biology Open Access 22 June 2022
Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases
Journal of Neurodevelopmental Disorders Open Access 02 May 2022
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
D'Agati, V.D. The spectrum of focal segmental glomerulosclerosis: new insights. Curr. Opin. Nephrol. Hypertens. 17, 271–281 (2008).
Cai, J.J. & Petrov, D.A. Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol. Evol. 2, 393–409 (2010).
Winter, E.E., Goodstadt, L. & Ponting, C.P. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14, 54–61 (2004).
Lage, K. et al. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc. Natl. Acad. Sci. USA 105, 20870–20875 (2008).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Pandey, A.K., Lu, L., Wang, X., Homayouni, R. & Williams, R.W. Functionally enigmatic genes: a case study of the brain ignorome. PLoS ONE 9, e88889 (2014).
Huttenhower, C. et al. Exploring the human genome with functional maps. Genome Res. 19, 1093–1106 (2009).
Ju, W. et al. Defining cell-type specificity at the transcriptional level in human disease. Genome Res. 23, 1862–1873 (2013).
Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B. & Botstein, D. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl. Acad. Sci. USA 100, 8348–8353 (2003).
Myers, C.L. & Troyanskaya, O.G. Context-sensitive data integration and prediction of biological networks. Bioinformatics 23, 2322–2330 (2007).
Hibbs, M.A. et al. Directing experimental biology: a case study in mitochondrial biogenesis. PLoS Comput. Biol. 5, e1000322 (2009).
Park, C.Y. et al. Functional knowledge transfer for high-accuracy prediction of under-studied biological processes. PLoS Comput. Biol. 9, e1002957 (2013).
Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).
Lee, I., Date, S.V., Adai, A.T. & Marcotte, E.M. A probabilistic functional network of yeast genes. Science 306, 1555–1558 (2004).
Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C. & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9 (suppl. 1), S4 (2008).
Hwang, S., Rhee, S.Y., Marcotte, E.M. & Lee, I. Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network. Nat. Protoc. 6, 1429–1442 (2011).
Kofler, S., Nickel, T. & Weis, M. Role of cytokines in cardiovascular diseases: a focus on endothelial responses to inflammation. Clin. Sci. 108, 205–213 (2005).
Liu, J.Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).
Keshava Prasad, T.S. et al. Human Protein Reference Database—2009 update. Nucleic Acids Res. 37, D767–D772 (2009).
Gremse, M. et al. The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res. 39, D507–D513 (2011).
Britten, R.J. & Davidson, E.H. Gene regulation for higher cells: a theory. Science 165, 349–357 (1969).
Spitz, F. & Furlong, E.E.M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).
Graf, T. & Enver, T. Forcing cells to change lineages. Nature 462, 587–594 (2009).
Stadtfeld, M. & Hochedlinger, K. Induced pluripotency: history, mechanisms, and applications. Genes Dev. 24, 2239–2263 (2010).
Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).
Brunner, H.G. & van Driel, M.A. From syndrome families to functional genomics. Nat. Rev. Genet. 5, 545–551 (2004).
Arce, L., Yokoyama, N.N. & Waterman, M.L. Diversity of LEF/TCF action in development and disease. Oncogene 25, 7492–7504 (2006).
van Amerongen, R. & Nusse, R. Towards an integrated view of Wnt signaling in development. Development 136, 3205–3214 (2009).
Reya, T. et al. Wnt signaling regulates B lymphocyte proliferation through a LEF-1 dependent mechanism. Immunity 13, 15–24 (2000).
Park, S.-K., Son, Y. & Kang, C.-J. A strong promoter activity of pre–B cell stage-specific Crlz1 gene is caused by one distal LEF-1 and multiple proximal Ets sites. Mol. Cells 32, 67–76 (2011).
Gutierrez, A. et al. LEF-1 is a prosurvival factor in chronic lymphocytic leukemia and is expressed in the preleukemic state of monoclonal B-cell lymphocytosis. Blood 116, 2975–2983 (2010).
Erdfelder, F., Hertweck, M., Filipovich, A., Uhrmacher, S. & Kreuzer, K.-A. High lymphoid enhancer-binding factor-1 expression is associated with disease progression and poor prognosis in chronic lymphocytic leukemia. Hematol. Rep. 2, e3 (2010).
Gandhirajan, R.K. et al. Small molecule inhibitors of Wnt/β-catenin/lef-1 signaling induces apoptosis in chronic lymphocytic leukemia cells in vitro and in vivo. Neoplasia 12, 326–335 (2010).
Lee, J.E., Wu, S.-F., Goering, L.M. & Dorsky, R.I. Canonical Wnt signaling through Lef1 is required for hypothalamic neurogenesis. Development 133, 4451–4461 (2006).
Wang, X., Lee, J.E. & Dorsky, R.I. Identification of Wnt-responsive cells in the zebrafish hypothalamus. Zebrafish 6, 49–58 (2009).
Kahler, R.A. et al. Lymphocyte enhancer-binding factor 1 (Lef1) inhibits terminal differentiation of osteoblasts. J. Cell. Biochem. 97, 969–983 (2006).
Hoeppner, L.H. et al. Runx2 and bone morphogenic protein 2 regulate the expression of an alternative Lef1 transcript during osteoblast maturation. J. Cell. Physiol. 221, 480–489 (2009).
Noh, T. et al. Lef1 haploinsufficient mice display a low turnover and low bone mass phenotype in a gender- and age-specific manner. PLoS ONE 4, e5438 (2009).
Westendorf, J.J., Kahler, R.A. & Schroeder, T.M. Wnt signaling in osteoblasts and bone diseases. Gene 341, 19–39 (2004).
Cohen, M.M. Biology of RUNX2 and cleidocranial dysplasia. J. Craniofac. Surg. 24, 130–133 (2013).
Duan, D. et al. Submucosal gland development in the airway is controlled by lymphoid enhancer binding factor 1 (LEF1). Development 126, 4441–4453 (1999).
Driskell, R.R. et al. Wnt-responsive element controls Lef-1 promoter expression during submucosal gland morphogenesis. Am. J. Physiol. Lung Cell. Mol. Physiol. 287, L752–L763 (2004).
Driskell, R.R. et al. Wnt3a regulates Lef-1 expression during airway submucosal gland morphogenesis. Dev. Biol. 305, 90–102 (2007).
Verkman, A.S., Song, Y. & Thiagarajah, J.R. Role of airway surface liquid and submucosal glands in cystic fibrosis lung disease. Am. J. Physiol. Cell Physiol. 284, C2–C15 (2003).
Forno, L.S. Neuropathology of Parkinson's disease. J. Neuropathol. Exp. Neurol. 55, 259–272 (1996).
Veeriah, S. et al. Somatic mutations of the Parkinson's disease–associated gene PARK2 in glioblastoma and other human malignancies. Nat. Genet. 42, 77–82 (2010).
Denison, S.R. et al. Alterations in the common fragile site gene Parkin in ovarian and other cancers. Oncogene 22, 8370–8378 (2003).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
O'Seaghdha, C.M. & Fox, C.S. Genome-wide association studies of chronic kidney disease: what have we learned? Nat. Rev. Nephrol. 8, 89–99 (2012).
Ridker, P.M. et al. Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy American women. Clin. Chem. 54, 249–255 (2008).
Ho, J.E. et al. Discovery and replication of novel blood pressure genetic loci in the Women's Genome Health Study. J. Hypertens. 29, 62–69 (2011).
Oldham, P.D., Pickering, G., Roberts, J.A. & Sowry, G.S. The nature of essential hypertension. Lancet 1, 1085–1093 (1960).
Guyton, A.C. Blood pressure control—special role of the kidneys and body fluids. Science 252, 1813–1816 (1991).
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Wishart, D.S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34, D668–D672 (2006).
Thorn, C.F., Klein, T.E. & Altman, R.B. PharmGKB: the Pharmacogenomics Knowledge Base. Methods Mol. Biol. 1015, 311–320 (2013).
Qin, C. et al. Therapeutic target database update 2014: a resource for targeted therapeutics. Nucleic Acids Res. 42, D1118–D1123 (2014).
Davis, A.P. et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 41, D1104–D1114 (2013).
Bostock, M., Ogievetsky, V. & Heer, J. D3: Data-Driven Documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011).
Forrest, A.R.R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Hoffmann, R. & Valencia, A. Life cycles of successful genes. Trends Genet. 19, 79–81 (2003).
Köhler, S., Bauer, S., Horn, D. & Robinson, P.N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82, 949–958 (2008).
Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010).
Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, D816–D823 (2012).
Kerrien, S. et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846 (2012).
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).
Mewes, H.W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 27, 44–48 (1999).
Portales-Casamar, E. et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38, D105–D110 (2010).
Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Huber, B.R. & Bulyk, M.L. Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data. BMC Bioinformatics 7, 229 (2006).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2013).
Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).
Maglott, D., Ostell, J., Pruitt, K.D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 39, D52–D57 (2011).
Myers, C.L., Barrett, D.R., Hibbs, M.A., Huttenhower, C. & Troyanskaya, O.G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).
Bossi, A. & Lehner, B. Tissue specificity and the human protein interaction network. Mol. Syst. Biol. 5, 260 (2009).
Ramsköld, D., Wang, E.T., Burge, C.B. & Sandberg, R. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput. Biol. 5, e1000598 (2009).
Burkard, T.R. et al. Initial characterization of the human central proteome. BMC Syst. Biol. 5, 17 (2011).
Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010).
Huttenhower, C., Schroeder, M., Chikina, M.D. & Troyanskaya, O.G. The Sleipnir library for computational functional genomics. Bioinformatics 24, 1559–1561 (2008).
Schmid, P.R., Palmer, N.P., Kohane, I.S. & Berger, B. Making sense out of massive data by going beyond differential expression. Proc. Natl. Acad. Sci. USA 109, 5594–5599 (2012).
Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp. 2001, 17–21 (2001).
Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).
Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).
Smyth, G.K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article3 (2004).
Meigs, J.B. et al. Genome-wide association with diabetes-related traits in the Framingham Heart Study. BMC Med. Genet. 8 (suppl. 1), S16 (2007).
Randall, J.C. et al. Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet. 9, e1003500 (2013).
Fritsche, L.G. et al. Seven new loci associated with age-related macular degeneration. Nat. Genet. 45, 433–439 (2013).
Mailman, M.D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007).
The first three authors are co-first authors and are listed alphabetically.
We sincerely thank Y. Lee and D. Gorenshteyn for help in curating disease associations and L. Bongo and M. Homilius for help in processing expression data. We are grateful to all members of the Troyanskaya laboratory for help in curating specific GO biological processes and for valuable discussions.
This work was primarily supported by US National Institutes of Health (NIH) grants R01 GM071966 and R01 HG005998 to O.G.T. and U54 HL117798 to G.A.F. C.S.G. was supported in part by US NIH grants T32 CA009528 and P20 GM103534. A.K.W. was supported in part by US NIH grant T32 HG003284. This work was supported in part by US NIH grant P50 GM071508 and by US NIH contract HHSN272201000054C. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR).
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Integrating the entire data compendium with hierarchy-aware tissue-specific knowledge generates networks that better capture tissue-specific interactions than limiting the integration to tissue-specific data (P = 1.3 × 10−9).
For each tissue, two networks—one integrating the entire data compendium and the other integrating only tissue-labeled data—were generated, and their performance was measured using area under the receiver operator curve (AUC) on the basis of cross-validation. The scatterplot shows that the performance for 64 tissues (points) with tissue-labeled data (x axis) and all data (y axis), with 62 of 64 performing better with all data (above the diagonal line; P = 3.2 × 10−12). The remaining 80 tissues did not have sufficient tissue-specific data (fewer than 5 data sets) available to perform a tissue-restricted integration. The performance of our Bayesian integration for these tissues is shown on the disconnected axis.
Supplementary Figure 2 The blood vessel and cardiovascular system networks show the best correspondence with the experiment, over and above the tissue-naive network and the bulk of other unrelated tissue networks.
For each network, genes were ranked on the basis of their connectivity to IL-1β in that network. Then, at each rank, the precision of the predictions up to that rank was calculated as the fraction of genes that are differentially expressed in the experiment. Plotted in each of the three graphs is the precision (y axis) at incremental sets of top-ranked genes (1–100; x axis). The precision for the blood vessel and tissue-naive networks is plotted in solid blue and dashed dark gray, respectively. The median precision at each rank for the cardiovascular system of tissues and all tissues are plotted in dotted blue and gray, respectively. Further, the gray band around the all_tissues median represents the interquartile range of precision values at each rank calculated across all tissues. The three plots correspond to different choices of differentially expressed genes (DEGs) from the microarray, with (a) 500 genes, (b) 250 genes and (c) 1,000 genes. The results in the main text are based on choosing genes from (a) at rank 20.
Supplementary Figure 3 We analyzed publicly available gene expression data sets that included IL-1β treatment and found that genes connected to IL-1β in tissue-specific networks for the corresponding tissue responded significantly to treatment.
Each plot shows the mean log2 fold change after IL-1β treatment of the 20 genes most tightly connected to IL-1β in the network listed on the x axis, and error bars represent the standard error (s.e.). Also plotted alongside as controls are the mean and s.e. of 20 random genes from the data set. The first plot (GSE59671) corresponds to the blood vessel experiment elaborated in the main text (Fig. 2). The cell type and GEO identifier of the other data sets from which gene expression data were extracted is listed above the plot. Of these data sets, only GSE7216 (epidermal keratinocytes) is included in the data compendium used for integration. The rest are independent of the integration.
(a) LEF-1’s functional network neighborhood in B lymphocytes. (b) The functional enrichment of LEF-1’s neighborhood. (c) A table of the most connected genes to LEF-1 in the network.
Supplementary Figure 5 This stacked bar plot shows the results of our blinded literature evaluation.
Only 10% of randomly selected diseases were associated strongly to Parkinson’s disease in the literature, while more than 75% of disease map–selected diseases were associated.
(a) Alzheimer’s disease in the temporal lobe network (z score ≥ 2.25), (b) glycogen metabolism disorder in liver (z score ≥ 1.75) and (c) glomerulonephritis in renal glomerulus (z score ≥ 1.5). The appropriate tissue network was chosen on the basis of connectivity of diseases in their relevant tissues (see “Network connectivity in tissue-specific processes” in the Online Methods).
Supplementary Figure 7 Relevant tissue networks show the best performance in reprioritizing hypertension GWAS and are enriched with targets of antihypertensive drugs.
To evaluate the choice of tissues for reprioritization, we evaluated all tissue networks (along with the tissue-naive network) in the same setting we used for the kidney network. (a) The distribution of performance (measured using AUC) shows that the right tissue network, kidney, and other relevant tissues, heart and liver, are among the best, while the tissue-naive network sits amidst tissue networks that provide an average performance. (b) Top-ranked genes by NetWAS are significantly enriched with targets of antihypertensive drugs. Drug targets were obtained from four databases—DrugBank, TTD, PharmGKB and CTD—which curate this information using different criteria. We evaluated both the original GWAS (gray) and NetWAS using the kidney network (dark red) for enrichment of drug targets from each of these sources among the top-ranked genes. Enrichment was measures using z scores (Online Methods), with higher scores indicating greater enrichment near the top of the list. In nearly all cases—target data sources and phenotypic end points—NetWAS reprioritization resulted in significant top ranking of therapeutic targets, over the original GWAS.
Supplementary Figure 8 NetWAS reprioritization is effective across studies, phenotypes and relevant networks.
Each bar shows the performance of NetWAS reprioritization as measured by the area under the curve (AUC) of documented disease associations with the disease specified in the label above the plot. The horizontal axis shows relevant networks (colored bars) and GWAS alone (gray bars), and the horizontal axis label describes the GWAS phenotype from which associations were obtained.
Supplementary Figures 1–8 and Supplementary Note. (PDF 6544 kb)
Tissue model weights of expression data sets. (XLSX 2437 kb)
Pathways known to be specifically active in a tissue are tightly connected in the corresponding tissue network. This table provides the list of tissues, their organ system categories (tissue-slim) and attributes of tissue-specific pathways mapped to those tissues. (XLSX 45 kb)
Top 20 genes tightly connected to IL1B in the blood vessel network. (XLSX 29 kb)
NetWAS results for combined hypertension phenotypes. (XLSX 1856 kb)
Many lines of evidence in the literature link the top predicted genes to hypertension via mechanistic relationship to known disease genes and pathways or association with hypertension risk factors. (XLSX 41 kb)
Expert-curated GO terms used to generate a global functional interaction standard. (XLSX 37 kb)
HPRD tissues were linked by direct text matching to terms in the BTO. (XLSX 41 kb)
This table contains the pruned BTO terms. (XLSX 55 kb)
We used simple text mining followed by manual curation to map biological process (BP) terms in GO to tissue terms in the BTO. (XLSX 136 kb)
About this article
Cite this article
Greene, C., Krishnan, A., Wong, A. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 47, 569–576 (2015). https://doi.org/10.1038/ng.3259
This article is cited by
Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data
Genome Biology (2022)
Biology of Sex Differences (2022)
BMC Bioinformatics (2022)
Dynamic rewiring of biological activity across genotype and lineage revealed by context-dependent functional interactions
Genome Biology (2022)
Genome Biology (2022)