Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases

Journal name:
Nature Methods
Volume:
13,
Pages:
366–370
Year published:
DOI:
doi:10.1038/nmeth.3799
Received
Accepted
Published online

Abstract

Mapping perturbed molecular circuits that underlie complex diseases remains a great challenge. We developed a comprehensive resource of 394 cell type– and tissue-specific gene regulatory networks for human, each specifying the genome-wide connectivity among transcription factors, enhancers, promoters and genes. Integration with 37 genome-wide association studies (GWASs) showed that disease-associated genetic variants—including variants that do not reach genome-wide significance—often perturb regulatory modules that are highly specific to disease-relevant cell types or tissues. Our resource opens the door to systematic analysis of regulatory programs across hundreds of human cell types and tissues (http://regulatorycircuits.org).

At a glance

Figures

  1. Regulatory circuit inference and GWAS analysis.
    Figure 1: Regulatory circuit inference and GWAS analysis.

    (a) Schematic showing how cell type– and tissue-specific regulatory circuits are derived from expression profiles of CAGE-defined enhancers and promoters from the FANTOM5 project23, 24. Weighted, tissue-specific links between TFs and regulatory elements (enhancers and promoters) are inferred using TF binding motifs and tissue-specific expression of target elements. Links between regulatory elements and target genes are inferred on the basis of genomic distance and joint expression in the given tissue. Different circuit configurations are shown schematically for five tissues. For clarity, only one enhancer and promoter are shown, but genes typically have multiple tissue-specific enhancers and promoters. (b) Coarse-grained TF-gene networks encapsulate the fine-grained circuitry of enhancers and promoters. (c) Schematic showing how the interconnectivity of genes that are perturbed by trait-associated variants was systematically assessed for a large panel of 37 GWASs.

  2. Assessment of regulatory circuits.
    Figure 2: Assessment of regulatory circuits.

    (a) Evaluation of different approaches for inferring edges between TFs and regulatory elements (enhancers and promoters): (1) standard network inference based on expression correlation between TFs and regulatory elements across samples; (2) presence of TF motifs within regulatory elements; (3) presence of TF motifs, weighted by expression correlation; and (4) presence of TF motifs, weighted by target-element expression in the given cell type (Online Methods). For each TF and cell type for which ChIP-seq data were available (159 samples, including 59 TFs and five cell lines), the area under the precision-recall curve (AUPR) was computed. As a reference, AUPR values were also computed for (i) random data and (ii) replicates of ChIP-seq experiments. Box plots show the distribution of AUPR values for each method. Right and left hinges correspond to the first and third quartiles; whiskers extend to the highest and lowest values within 1.5× the interquartile range of the hinges. (b) Assessment of different approaches for linking enhancers to target genes: (1) maximum expression correlation across tissues; (2) minimum genomic distance; and (3) joint tissue-specific activity weighted by genomic distance (Online Methods). The AUPR was evaluated for each method along with random predictions in 13 tissues for which eQTL data were available. (Of note, AUPR values are not comparable across panels (a,b) because the underlying gold standards are different.) Asterisks in a and b indicate retained methods. (c) Evaluation of whether trait-associated genes tend to cluster in modules for different types of networks and GWAS traits. Five types of networks are compared from left to right: cell type– and tissue-specific regulatory networks (the 32 high-level networks), four protein-protein interaction networks, 35 tissue-specific coexpression networks15, a global coexpression network inferred from the FANTOM5 data, and a global regulatory network based on ChIP-seq17 (Online Methods). The plot summarizes whether trait-associated genes are more densely interconnected than expected (maximum connectivity enrichment score) for each network type (column) and trait (row). False discovery rate (FDR) correction was performed separately for each network type to allow for fair comparison. Only traits passing the 5% FDR threshold (dashed lines) for at least one network are shown (full results are presented in Supplementary Fig. 31). Tissue-specific regulatory networks showed the strongest connectivity enrichment overall.

  3. Network-connectivity enrichment reveals disease-relevant cell types and tissues.
    Figure 3: Network-connectivity enrichment reveals disease-relevant cell types and tissues.

    Shown are connectivity enrichment scores across the 32 high-level networks (left; numbers in parentheses correspond to cluster indexes in Supplementary Fig. 14) and corresponding individual networks (right) for three GWAS traits. Similar results were obtained for the remaining traits (discussed in the text). Only top-ranking networks are shown (full results are presented in Supplementary Figs. 34, 37 and 47). (a) Schizophrenia showed the strongest clustering of associated genes in high-level regulatory networks of neural tissues (left), with top-ranking individual networks pinpointing specific, disease-relevant brain structures (right). (b) Inflammatory bowel disease (IBD) displayed connectivity enrichment in immune-related networks. IBD also showed enrichment in other high-level networks, including vascular cells that are involved in the inflammatory response and drive the signal, as shown to the right. (c) Age-related macular degeneration of the neovascular type showed the strongest connectivity enrichment in regulatory networks of vascular smooth muscle cells, followed by diverse tumors, which often induce vascularization to achieve growth.

References

  1. Marstrand, T.T. & Storey, J.D. Identifying and mapping cell-type-specific chromatin programming of gene expression. Proc. Natl. Acad. Sci. USA 111, E645E654 (2014).
  2. Pers, T.H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
  3. Roy, S. et al. A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res. 43, 86948712 (2015).
  4. Ward, L.D. & Kellis, M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 10951106 (2012).
  5. Parker, S.C.J. et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc. Natl. Acad. Sci. USA 110, 1792117926 (2013).
  6. Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124130 (2013).
  7. Faye, L.L., Machiela, M.J., Kraft, P., Bull, S.B. & Sun, L. Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification. PLoS Genet. 9, e1003609 (2013).
  8. Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559573 (2014).
  9. Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat. Genet. 46, 136143 (2014).
  10. Navlakha, S. & Kingsford, C. The power of protein interaction networks for associating genes with diseases. Bioinformatics 26, 10571063 (2010).
  11. Rossin, E.J. et al. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet. 7, e1001273 (2011).
  12. Chen, Y. et al. Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429435 (2008).
  13. Lee, I., Blom, U.M., Wang, P.I., Shim, J.E. & Marcotte, E.M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 11091121 (2011).
  14. Mäkinen, V.-P. et al. Integrative genomics reveals novel molecular pathways and gene networks for coronary artery disease. PLoS Genet. 10, e1004502 (2014).
  15. Pierson, E., Koller, D., Battle, A., Mostafavi, S. & the GTEx Consortium. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol. 11, e1004220 (2015).
  16. Greene, C.S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569576 (2015).
  17. Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91100 (2012).
  18. Marbach, D. et al. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res. 22, 13341349 (2012).
  19. Karczewski, K.J., Snyder, M., Altman, R.B. & Tatonetti, N.P. Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS Genet. 10, e1004122 (2014).
  20. Boyer, L.A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947956 (2005).
  21. Ciofani, M. et al. A validated regulatory network for Th17 cell specification. Cell 151, 289303 (2012).
  22. Chen, J.C. et al. Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks. Cell 159, 402414 (2014).
  23. The FANTOM Consortium & the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462470 (2014).
  24. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455461 (2014).
  25. Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800811 (2013).
  26. Kheradpour, P. & Kellis, M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 42, 29762987 (2014).
  27. Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796804 (2012).
  28. The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648660 (2015).
  29. Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317330 (2015).
  30. Lamparter, D., Marbach, D., Rico, R., Kutalik, Z. & Bergmann, S. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput. Biol. 12, e1004714 (2016).
  31. Perez-Costas, E., Melendez-Ferro, M. & Roberts, R.C. Basal ganglia pathology in schizophrenia: dopamine connections and anomalies. J. Neurochem. 113, 287302 (2010).
  32. Garrett, A. & Chang, K. The role of the amygdala in bipolar disorder development. Dev. Psychopathol. 20, 12851296 (2008).
  33. Cromer, W.E., Mathis, J.M., Granger, D.N., Chaitanya, G.V. & Alexander, J.S. Role of the endothelium in inflammatory bowel diseases. World J. Gastroenterol. 17, 578593 (2011).
  34. Swirski, F.K. et al. Identification of splenic reservoir monocytes and their deployment to inflammatory sites. Science 325, 612616 (2009).
  35. Chichlowski, M., Westwood, G.S., Abraham, S.N. & Hale, L.P. Role of mast cells in inflammatory bowel disease and inflammation-associated colorectal neoplasia in IL-10-deficient mice. PLoS ONE 5, e12220 (2010).
  36. Wright, H.L., Moots, R.J. & Edwards, S.W. The multifactorial role of neutrophils in rheumatoid arthritis. Nat. Rev. Rheumatol. 10, 593601 (2014).
  37. Iadecola, C. Neurovascular regulation in the normal brain and in Alzheimer's disease. Nat. Rev. Neurosci. 5, 347360 (2004).
  38. Hor, H. et al. Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat. Genet. 42, 786789 (2010).
  39. Goodrich, J.K. et al. Human genetics shape the gut microbiome. Cell 159, 789799 (2014).
  40. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  41. Kheradpour, P., Stark, A., Roy, S. & Kellis, M. Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res. 17, 19191931 (2007).
  42. Boyle, A.P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 17901797 (2012).
  43. Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA 107, 62866291 (2010).
  44. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421427 (2014).
  45. Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 13711379 (2013).
  46. Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 11501159 (2013).
  47. Boraska, V. et al. A genome-wide association study of anorexia nervosa. Mol. Psychiatry 19, 10851094 (2014).
  48. Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 14521458 (2013).
  49. Simón-Sánchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nat. Genet. 41, 13081312 (2009).
  50. International Multiple Sclerosis Genetics Consortium. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214219 (2011).
  51. Anderson, C.A. et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat. Genet. 43, 246252 (2011).
  52. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 42, 11181125 (2010).
  53. Stahl, E.A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508514 (2010).
  54. Rauch, A. et al. Genetic variation in IL28B is associated with chronic hepatitis C and treatment failure: a genome-wide association study. Gastroenterology 138, 13381345 (2010).
  55. Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333338 (2011).
  56. The International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103109 (2011).
  57. Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 12741283 (2013).
  58. Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707713 (2010).
  59. Prokopenko, I. et al. A central role for GRB10 in regulation of islet function in man. PLoS Genet. 10, e1004235 (2014).
  60. The DIAbetes Genetxzics Replication and Meta-analysis (DIAGRAM) Consortium. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981990 (2012).
  61. Scott, R.A. et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat. Genet. 44, 9911005 (2012).
  62. Soranzo, N. et al. Common variants at 10 genomic loci influence hemoglobin A1C levels via glycemic and nonglycemic pathways. Diabetes 59, 32293239 (2010).
  63. Dupuis, J. et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 42, 105116 (2010).
  64. Strawbridge, R.J. et al. Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2. Diabetes 60, 26242634 (2011).
  65. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832838 (2010).
  66. Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937948 (2010).
  67. The AMD Gene Consortium. Seven new loci associated with age-related macular degeneration. Nat. Genet. 45, 433439 (2013).
  68. Estrada, K. et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat. Genet. 44, 491501 (2012).
  69. Clarke, L. et al. The 1000 Genomes Project: data management and community access. Nat. Methods 9, 459462 (2012).
  70. Kondor, R.I. & Lafferty, J.D. Diffusion kernels on graphs and other discrete input spaces. in Proc. Nineteenth International Conference on Machine Learning 315322 (Morgan Kaufmann, 2002).
  71. Smola, A.J. & Kondor, R. Kernels and regularization on graphs. in Learning Theory and Kernel Machines (eds. Schölkopf, B. & Warmuth, M.K.) 144158 (Springer Berlin Heidelberg, 2003).
  72. Erten, S., Bebek, G., Ewing, R.M. & Koyutürk, M. DADA: degree-aware algorithms for network-based disease gene prioritization. BioData Min. 4, 19 (2011).
  73. Lage, K. et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat. Biotechnol. 25, 309316 (2007).
  74. Maglott, D., Ostell, J., Pruitt, K.D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 39, D52D57 (2011).
  75. Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470D478 (2015).
  76. Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 12121226 (2014).
  77. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 12741286 (2012).
  78. Derry, J.M.J. et al. Developing predictive molecular maps of human disease through community-based modeling. Nat. Genet. 44, 127130 (2012).

Download references

Author information

Affiliations

  1. Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

    • Daniel Marbach,
    • David Lamparter &
    • Sven Bergmann
  2. Swiss Institute of Bioinformatics, Lausanne, Switzerland.

    • Daniel Marbach,
    • David Lamparter,
    • Zoltán Kutalik &
    • Sven Bergmann
  3. Broad Institute, MIT, Cambridge, Massachusetts, USA.

    • Gerald Quon &
    • Manolis Kellis
  4. Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, USA.

    • Gerald Quon &
    • Manolis Kellis
  5. Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.

    • Zoltán Kutalik

Contributions

D.M. designed the study, performed analyses and prepared the manuscript. D.L. performed gene scoring and phenotype-label permutation. D.M., D.L., G.Q., M.K., Z.K. and S.B. conceived methods, discussed the results and implications, and commented on the manuscript at all stages.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,421 KB)

    Supplementary Figures 1–47

Excel files

  1. Supplementary Table 1 (111 KB)

    Sample annotation

  2. Supplementary Table 2 (48,953 KB)

    ENCODE ChIP-seq experiments

  3. Supplementary Table 3 (43,164 KB)

    GTEx tissues

  4. Supplementary Table 4 (47,296 KB)

    Roadmap Epigenomics RNA-seq data

  5. Supplementary Table 5 (17,181 KB)

    GWAS compendium

  6. Supplementary Table 6 (55,145 KB)

    Links to external repositories

Additional data